Interactive Visualization on Malaria Data

source: National Geography

Malaria is a fatal disease caused by parasite and is transmitted to humans and other animals through the bites of infected mosquitoes. People who are affected by malaria would typically experience high fever and chills. According to CDC, an estimated 229 million incidences of malaria were recorded in 2019 and 409,000 people died, where Africa accounted for most of the cases. In this blog, I create 3 interactive visualization on malaria.

Data

The data used for for this simple analysis is found in the tidytuesday open source project that contains a variety of dataset and tools. For the purpose of this project, I’m using the Malaria dataset, which includes 3 datasets at different summary-level:

  • malaria_deaths.csv — Malaria deaths by country for all ages across the world and time
  • malaria_deaths_age.csv — Malaria deaths by age across the world and time
  • malaria _inc.csv — Malaria incidence by country for all ages across the world and time

Tool

For visualization, I choose to use Plotly since I have built a dashboard with it. More importantly, plotly is simple to use. It’s built on top of plotly.js, which means it integrates well with Jupyter Lab, or can be viewed in HTML files or hosted online. Plotly provides a wide range of really flexible interactive graphing tools, which could be used to meet very complex visualization objectives.

Visualization

1. To visualize the number of lethal cases caused by Malaria, I’m using the malaria_death dataset. Since I want to explore its impact over time across the global, I decide that no data aggregation is needed and that a choropleth would be the best way to present the findings. Here is an illustration of what the graph looks like:

User may colour represents the number of deaths in each country, as indicated by the colour bar. Note that the dataset contains missing values for most of the countries. User may also hover the cursor over any country to get a tooltip that includes more information. At the bottom, there is a slider that user can use to pick a specific time (between 1990 and 2016). The following code snippet contains the implementation detail:

Note that I’m hiding the coastlines to make the graph look less clustered. In the meantime, I’m showing the latitude and longitude axises. Finally, I’m using the ‘natural earth’ projection type in plotly to show the entire globe.

2. Next, I’m interested in learning the death rate across different age groups. For this purpose, I use malaria_death_age dataset, which contains summaries over 5 age groups: Under 5, 5–14, 15–49, 50–69, 70 or older. Then, I calculate the mean number of deaths across all countries, which allows me to inspect the changes in death rate over time. The following graph shows what this looks like:

In this plot, each line represents the change in death rate for each age group. A user may move the cursor the panel to get the detailed information on death rates. Since Malaria seems to affect people who are under age of 5 the most, I decide to use the log scale for the y axis for better visualization. The code snippet is below:

3. Finally, since Africa is the most affect country with the highest occurrence of Malaria, I decide to focus on this continent and visualize the incidence rates in African countries. This is achieved with the malaria_inc data and scatter_geo graph function. The following figure shows what this looks like:

In the figure above, the size of each circle encodes the number of incidences. User may again hover over each circle to get more information and use the slider at the bottom to change time. To ensure only Africa, I’m narrow the scope to ‘africa’ only:

The complete implementation including data wrangling can be found in this notebook.

M.S. in Data Science candidate, 2022 @ Duke University | Biomedical Engineer | Workout Enthusiast