Applying Machine Learning and Geolocation Techniques to Social Media Data (Twitter) to Develop a Resource for Urban Planning

With all the recent attention focused on big data, it is easy to overlook that basic vital statistics remain difficult to obtain in most of the world. This project set out to test whether an openly available dataset (Twitter) could be transformed into a resource for urban planning and development. T...

Full description

Bibliographic Details
Main Author: Milusheva, Sveta
Other Authors: Legovini, Arianna, Resor, Elizabeth, Marty, Robert
Format: eBook
Language:English
Published: Washington, D.C The World Bank 2020
Series:World Bank E-Library Archive
Online Access:
Collection: World Bank E-Library Archive - Collection details see MPG.ReNa
LEADER 02273nmm a2200253 u 4500
001 EB002109940
003 EBX01000000000000001250030
005 00000000000000.0
007 cr|||||||||||||||||||||
008 221013 ||| eng
100 1 |a Milusheva, Sveta 
245 0 0 |a Applying Machine Learning and Geolocation Techniques to Social Media Data (Twitter) to Develop a Resource for Urban Planning  |h Elektronische Ressource  |c Sveta Milusheva 
260 |a Washington, D.C  |b The World Bank  |c 2020 
300 |a 40 pages 
700 1 |a Legovini, Arianna 
700 1 |a Resor, Elizabeth 
700 1 |a Marty, Robert 
041 0 7 |a eng  |2 ISO 639-2 
989 |b WOBA  |a World Bank E-Library Archive 
490 0 |a World Bank E-Library Archive 
028 5 0 |a 10.1596/1813-9450-9488 
856 4 0 |u http://elibrary.worldbank.org/doi/book/10.1596/1813-9450-9488  |x Verlag  |3 Volltext 
082 0 |a 330 
520 |a With all the recent attention focused on big data, it is easy to overlook that basic vital statistics remain difficult to obtain in most of the world. This project set out to test whether an openly available dataset (Twitter) could be transformed into a resource for urban planning and development. The hypothesis is tested by creating road traffic crash location data, which are scarce in most resource-poor environments but essential for addressing the number one cause of mortality for children over age five and young adults. The research project scraped 874,588 traffic-related tweets in Nairobi, Kenya, applied a machine learning model to capture the occurrence of a crash, and developed an improved geoparsing algorithm to identify its location. The project geolocated 32,991 crash reports in Twitter for 2012-20 and clustered them into 22,872 unique crashes to produce one of the first crash maps for Nairobi. A motorcycle delivery service was dispatched in real-time to verify a subset of crashes, showing 92 percent accuracy. Using a spatial clustering algorithm, portions of the road network (less than 1 percent) were identified where 50 percent of the geolocated crashes occurred. Even with limitations in the representativeness of the data, the results can provide urban planners useful information to target road safety improvements where resources are limited