The purpose of this research is to ultimately prevent malaria through the use of raster data to generate a new way of calculating and quantifying malaria risk by specifying, estimating and validating a random forest model. Instead of relying on conditional logistic regression or chi-square tests, and relying on one dataset, this project will utilize random forest decision trees to take into account various, uncorrelated datasets. I seek to find a new metric that conveys information about many different numbers from various data sets.This work will build upon previous work by Tusting, Atieli, and Strano. I would like to pull ideas and datasets from all their studies, in order to increase the accuracy of malaria risk. Furthermore, I am researching new, potential datasets that could be used within the model. With increased accuracy, and a quantifiable measure of risk, non-profit organizations and governments will be able to interpret the data easier in order to distribute aid properly, and have a better understanding of the many factors that contribute to malaria. Additionally, travelers will understand the risk and potential danger of visiting a malaria-endemic region. This is important because “travelers from malaria-free areas to disease “hot spots” are especially vulnerable to the disease [14].” Both geospatial data sets and survey data will be utilized, to reveal each dataset’s relationship to malaria. Another goal of this research is to better understand these relationships. [1,4,6]
Malaria is a deadly disease that affects many people worldwide. The disease is caused by the parasite, Plasmodium Falciparum, which is transmitted to humans through night-biting Anopheles mosquitoes. Although there are medications to treat malaria, the drugs are usually not an effective treatment against the parasite. In fact, chloroquine, a malaria drug, has developed resistance to P. Falciparum [14]. Furthermore, the illness is most prevalent in less developed countries, such as in the region of Sub-Saharan Africa, where access to treatment is not available. Moreover, “60% of malaria deaths worldwide occur in the poorest 20% of the population [2].” With no access to hospitalization, many residents of malaria-prone regions, especially children and pregnant women, suffer from the severe symptoms, like fever, seizures, bleeding, etc. Additionally, malaria has placed huge financial burdens on the families and governments of these countries. Ultimately, malaria is a deadly disease. Since curing malaria is a difficult task, prevention is of the utmost importance.
In his article, “Development as Freedom”, Amartya Sen defines development as the process of expanding the real freedoms people enjoy. One major type of freedom is social opportunity, which encompasses people’s access to education and health care. Preventing malaria touches upon the human development dimension of life expectancy, as prevention allows an individual to live a longer, more fulfilling life. Increasing life expectancy is a major component of social opportunity. Additionally, if an individual is healthier through more social opportunities, they are able to increase their other freedoms. For example, a healthy individual can better their economic freedom, or political freedom, through having a voice about how the government should work.
My research project relates to sustainable development goal #3, good health and well-being. Prevention of malaria improves health, as P. Falciparum infection can be fatal if left untreated [14].
Malaria begins when a night-biting Anopheles mosquito becomes infected with the parasite, Plasmodium Falciparum. When the mosquito bites a non-infected person, the parasite enters their bloodstream and towards the person’s liver. After maturation, the parasites leave the liver and infect red blood cells. Although malaria is not contagious, it can still be spread from person to person through blood transfusions, sharing of needles, etc. Additionally, if a non-infected mosquito bites a person with malaria, the mosquito becomes infected and can easily spread the disease to other people. Thus, it is of the utmost importance to make a bigger effort in preventing the spread of malaria and infection. It’s important to note that the environment plays a considerable role in the spread of malaria. The environment of West Africa has high temperatures, humid climate, and semi-arid terrain - all of which provide a breeding ground suitable to parasite reproduction. “Environmental factors such as the presence of bushes and stagnant water around homes, rainfall, low altitude and high temperatures favor the breeding of malaria vectors [8].” Malaria is most prevalent in non-urban areas, as urbanization does not provide a suitable breeding ground. Specifically, malaria affects the more undeveloped villages and cropland areas in West Africa. Additionally, it is also common in many communities present along a road network system. In the 2000s, an international WHO/UNICEF conference was held, and a plan for primary health care was approved in West Africa. However, the health services that were provided ended up being costly. Afterwards, the hospital-centered system remained that mostly served urbanized areas. In smaller villages, medicine sellers can be found in small drug stores or kiosks. Although their services are inexpensive and quicker, there is concern over the appropriateness and doses of the drugs they are selling [7]. Another feature of malaria to note, is that it disproportionately affects young children and pregnant women. In brief, precision epidemiology is a crucial human development process that can help monitor the sustainable development goal of good health and well being.
The prevalence of malaria is very similar to a complex adaptive social/ economic system because both involve various factors, agents, interactions, and relationships that are working together, and simultaneously evolving. For example, households in communities work together by utilizing insecticide-treated nests, and mosquito sprays help more than just one individual. On a grander scale, the government provides money to support the health care system, while the healthcare system provides various jobs for people, and wealthier people are more likely to settle in urbanized areas, thus decreasing their risk of malaria.
The primary geospatial data science method I would like to utilize in my project will be the classification algorithm, Random Forest. Random Forest works through creating multiple “decision trees”, with each “branch” of the tree representing a possible decision/reaction. In the algorithm, there are various, uncorrelated trees operating like a committee, to produce a final output result [9]. Essentially, the trees work together like an ensemble, while each individual tree produces a prediction. “The [prediction]… with the most votes becomes our model’s prediction .” In my project, I could use the random forest model to assign a quantitative value of malaria risk to localized regions in West Africa. The various decision trees will produce predictions of malaria risk based upon uncorrelated datasets that impact malaria. These include temperature of the region, rainfall, land cover type, and insecticide bed net use.
Another method common to many of the sources is the Bayesian Statistical model, which can be used to predict between and within cluster variability [13]. Additionally, it can be used to calculate the conditional probability of an event. For example, it could be used for use of Insecticide treated nets (ITN), or prevalence of fever. This model may be helpful in my project, when evaluating the data sets and sources of ITN and fever, to assess malaria prevalence.
i. Data obtained from Climate-Data.org, which has contributions from OpenStreetMap, has data on average annual temperature and rainfall in cities, within a country, and for the entire country. Weather data is obtained from global weather stations, with a resolution of 30 arc seconds [11]. The datasets of temperature and rainfall will both be used to create decision trees that shall assess malaria risk based on these individual factors, as malaria may be more prevalent under specific conditions.
ii. Data obtained from the European Space Agency climate change initiative classifies global regions by land cover, with a 300 meter resolution. The data will be used to determine the topography of regions in West Africa, and create decision trees that will take into account the topography when assessing malaria risk [12]. This information is relevant as a land cover of cropland irrigated or post-flooding, is more likely to be a breeding ground for the Anopheles mosquitoes, compared to evergreen, needle leaved tree covered areas.
i. Survey data obtained from Demographic and Health Survey Program (DHS) on the percentage of the population with access to ITN, insecticide-treated nets, will be used to create decision trees. This is a secondary source, with references describing how the percentage was calculated based on previous surveys [10]. Insecticide-treated nets are very important to prevent nighttime-biting of anopheles mosquitoes.
I believe assignment 2 has helped me to bring my project proposal together. I’ve definitely identified a gap in the literature. Although many articles and studies assess malaria prevalence through algorithms, some even using random forest, they are primarily based on one data set. For example, an article from assignment 1 that I researched, “Housing improvements and Malaria Risk in Sub-Saharan Africa: A multi-country analysis of Survey Data”, used conditional logistic regression and chi-square tests to determine the malaria risk based on housing quality [4]. Although this article also determined the risk of malaria, it calculated it solely based on architectural data. Even though housing is an important factor, there are various other factors that are crucial to assessing the risk of malaria. I just believe that the calculated number that quantifies risk of malaria does not tell the “whole story”. It may be difficult to take into account every single factor that contributes to malaria, but a step in the right direction would be to use a raster data set. Thus, calculations would be based upon different layers of data - all of which are used to construct malaria risk. One of the most important articles that helped me realize the gap was, “Malaria prevalence metrics in low-and middle-income countries: an assessment of precision in nationally-representative surveys.” The study focuses on countries in Sub-Saharan Africa, and uses a Bayesian statistical model to assess variability of fever and malaria prevalence, and bed net use in children under five years old. Essentially, the study found that there is a trade-off between high precision and the currently administered national surveys [13]. The precision of identifying malaria-prone areas is very low. The article shows another reason why taking into account various factors would be beneficial to quantifying malaria risk, along with preference for geospatial data sets, as opposed to survey data.
I think I’ve made good progress with my project proposal, but I know I still have a lot to do. Assignment 2 has made me realize that generalizing this project to all of West Africa may be complicated. I think in assignment 3, I might need to narrow down my region to perhaps only a few countries in West Africa, like Nigeria and Ghana. Additionally, another thing I wanted to look into was including the data set of VIIRS, nighttime lights. I was hesitant to include it thus far, since I felt that it would be repetitive after including my land cover type data. The land cover type data indicates urbanized areas, and that is primarily what nighttime lights would identify as well. However, there may be improved accuracy if it is included. Moreover, I learned in assignment 2 that Africa has a very hospital-centered health care approach. I’m wondering if data on distance to the nearest hospital would be beneficial to include in the raster. After all this research and outlining my proposal, my evaluative research question is: What is the malaria risk in localized regions of Africa, based on various factors that contribute to malaria prevalence? Another question could be: What is a thorough and proper metric to represent malaria risk in localized regions in Africa, that conveys information on topography, climate, urbanization, insecticide-bed net use, etc?
[1] Strano, E., Viana, M. P., Sorichetta, A., & Tatem, A. J. (2018). Mapping road network communities for guiding disease surveillance and control strategies. Scientific Reports, 8(1). doi: 10.1038/s41598-018-22969-4
[2] Suh, K. N., Kain, K. C., & Keystone, J. S. (2004). Malaria. Canadian Medical Association Journal , 170(11), 1693–1702. doi: 10.1503/cmaj.1030418
[3] Tatem, A. J., Huang, Z., Narib, C., Kumar, U., Kandula, D., Pindolia, D. K., … Lourenço, C. (2014). Integrating rapid risk mapping and mobile phone call record data for strategic malaria elimination planning. Malaria Journal, 13(1). doi: 10.1186/1475-2875-13-52
[4] Tusting, L. S., Bottomley, C., Gibson, H., Kleinschmidt, I., Tatum, A. J., Lindsay, S. W., & Gething, P. W. (2017). Housing Improvements and Malaria Risk in Sub-Saharan Africa: A Multi-Country Analysis of Survey Data. PLoS Med, 14(2). doi: https://doi.org/10.1371/journal.pmed.1002234
[5] Reiner, R. C., Menach, A. L., Kunene, S., Ntshalintshali, N., Hsiang, M. S., Perkins, A. T., … Cohen, J. M. (2015). Mapping residual transmission for malaria elimination. ELife Sciences. doi: 10.7554/elife.09520.012
[6] Atieli, Harrysone E, et al. “Topography as a Modifier of Breeding Habitats and Concurrent Vulnerability to Malaria Risk in the Western Kenya Highlands.” Parasites & Vectors, vol. 4, no. 1, 2011, doi:10.1186/1756-3305-4-241.
[7] Dechambenoit, Gilbert. “Access to Health Care in Sub-Saharan Africa.” Surgical Neurology International, vol. 7, no. 1, 2016, p. 108., doi:10.4103/2152-7806.196631.
[8] Kimbi, Helen Kuokuo. “Environmental Factors and Preventive Methods against Malaria Parasite Prevalence in Rural Bomaka and Urban Molyko, Southwest Cameroon.” Journal of Bacteriology & Parasitology, vol. 04, no. 01, 2012, doi:10.4172/2155-9597.1000162.
[9] Yiu, Tony. Understanding Random Forest. 12 June 2019, towardsdatascience.com/understanding-random-forest-58381e0602d2.
[10] Guide to DHS Statistics. www.dhsprogram.com/pubs/pdf/DHSG1/Guide_to_DHS_Statistics_DHS-7.pdf.
[11] Climate Data for Cities Worldwide Select a Continent. en.climate-data.org/.
[12] “300 m Annual Global Land Cover Time Series from 1992 to 2015.” 300 m Annual Global Land Cover Time Series from 1992 to 2015 | ESA CCI Land Cover Website, www.esa-landcover-cci.org/?q=node/175. |
[13] Alegana, Victor A., et al. “Malaria Prevalence Metrics in Low- and Middle-Income Countries: an Assessment of Precision in Nationally-Representative Surveys.” Malaria Journal, vol. 16, no. 1, 2017, doi:10.1186/s12936-017-2127-y.
[14] Sanchez, J. D. (n.d.). | Malaria: General information. Retrieved from https://www.paho.org/hq/index.php?option=com_content&view=article&id=2573:general-information-malaria&Itemid=2060&lang=fr |