Integrating Survey and Geospatial Data to Identify the Poor and Vulnerable Evidence from Malawi
Predictions are evaluated against a benchmark village welfare measure, constructed by imputing log per capita consumption from the 2016 integrated household survey into the 2018 household census using gradient boosting. Incorporating the hypothetical partial registry vastly improves the performance...
The World Bank
|World Bank E-Library Archive - Collection details see MPG.ReNa
|Predictions are evaluated against a benchmark village welfare measure, constructed by imputing log per capita consumption from the 2016 integrated household survey into the 2018 household census using gradient boosting. Incorporating the hypothetical partial registry vastly improves the performance of the predictions. When using the partial registry, the rank correlation between the predicted and benchmark welfare measures is 0.75, while those for the other three methods range from -0.02 to 0.2, and similar results are seen when examining the area under the curve. Doubling the size of the partial registry does little to improve predictive performance. The results are robust to using a linear post-Least Absolute Selection and Shrinkage Operator model instead of gradient boosting for prediction. However, predictions using both methods are less accurate when the benchmark welfare measure is derived from a linear post-Least Absolute Selection and Shrinkage Operator model.
Generating timely data to identify the poorest villages in developing countries remains a fundamental challenge for existing data systems. This paper investigates the accuracy of four alternative methods for predicting a measure of village economic welfare for approximately 4,500 villages in 10 poor Malawian districts: (1) proxy means test scores calculated from the 2017 social registry, (2) the Meta Relative Wealth Index, (3) predictions derived from a standard household survey and publicly available geospatial indicators, and (4) predictions derived from a two-step approach that first predicts welfare into a hypothetical partial registry of approximately 450 villages, and then predicts welfare into the remaining villages using geospatial indicators. Geospatial indicators include land coverage indicators, weather data, night light data, building patterns, distance to major roads, and population density.
Overall, the results strongly suggest that collecting partial registries of household-level poverty predictors in low-income contexts can vastly improve the performance of machine learning models that combine survey and satellite imagery for the purpose of village-level targeting