workshop

In the reading today (Stevens et al.) the authors use a technique to produce a high resolution description of the distribution of human populations across the globe. What is the name of the technique and describe in general and basic terms how it works?The random forest method used by the authors is a machine learning algorithm (ensemble method). In general terms, what is a machine learning algorithm? Within the context of this study what distinguishes a data science, machine learning method (such as random forest) from previous classical statistical approaches to describing and analyzing phenomenon and events?The authors’ results present a remarkable improvement over previous geospatial descriptions at very high resolution, of the distribution of the human population. Within the context of human development in LMICs, what is the significance of having a highly accurate description of where each person is located across planet earth? Within the context of human development in LMICs, what is the relevance to your area of investigation in having a highly accurate description of where each household and person is located across planet earth?

The name of the technique is called Random Forest. Random Forest is essentially an estimation technique used to generate a gridded prediction of population density from the census and geospatial data. It works through “bagging” - a term that references combining different learning models to increase classification accuracy. A machine learning alogrithm is different from a regular algorithm because it adjusts itself to become more accurate and precise after being exposed to more data. A random forest model adapts to different input. It is different from classical satistical approaches like describing and analzying phenomenon and events because the machine learning algorithm can take into account any potential changed in covariates or facotrs that may influence the product produced. It is important to have an accurate distribution of population distirbution for various reasons in LMICs. One of these could be for NGOs to deliver aid to people - they must know exactly where people are. Furthermore, in order to fully accomplish the sustainable development goals with the idea that no one gets left behind, it is important to know exactly where everyone is. Thus, one may measure if the success of a program that attempts to fulfill a sustainable development goal. Within the context of my own research, this algorithm would be very helpful because information population distirbution is crucial to epidemiology. If I know where people are, I could implement programs or policies to limit the spread of a certain pathogen. Additionally, it would be beneficial information for delivering medicines or setting up health clinics, so that people get more access to health care facilities.