Data Density Considerations for Crowd Sourced Population Estimations from Social Media

Samuel Lee Toepke

2017

Abstract

When using social media data for population estimations, data density is of primary concern. A high density of quality, crowd-sourced data in a specified geographic area leads to a more precise estimation. Nonetheless, data acquisition/storage has to be balanced against the provisioned cost/size constraints of the technical implementation and the ability to receive data in that area. This investigation compares hourly population estimations based on Tweet quantity, for several major west coast cities in the United States of America. An estimation baseline is established, and data is artificially removed from the estimation to explore the importance of data density. Experimental data is obtained and stored using an enterprise cloud solution, density observations/results are discussed, and follow-on work is described.

References

  1. Abdi, H. and Williams, L. (2010). Normalizing data. Encyclopedia of research design. Sage, Thousand Oaks, pages 935-938.
  2. Aubrecht, C., Ozceylan Aubrecht, D., Ungar, J., Freire, S., and Steinnocher, K. (2016). Vgdi-advancing the concept: Volunteered geo-dynamic information and its benefits for population dynamics modeling. Transactions in GIS.
  3. Boyd, D. and Crawford, K. (2012). Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, communication & society, 15(5):662-679.
  4. Chai, T. and Draxler, R. R. (2014). Root mean square error (rmse) or mean absolute error (mae)?-arguments against avoiding rmse in the literature. Geoscientific Model Development, 7(3):1247-1250.
  5. Coleman, D. J., Georgiadou, Y., Labonte, J., et al. (2009). Volunteered geographic information: The nature and motivation of produsers. International Journal of Spatial Data Infrastructures Research, 4(1):332-358.
  6. Elastic (2016). Geohash grid aggregation, elasticsearch reference 5.0.
  7. FEMA (2016). Cascadia rising 2016.
  8. Freire, S., Florczyk, A., and Ferri, S. (2015). Modeling dayand night-time population exposure at high resolution: Application to volcanic risk assessment in campi flegrei. In Proceedings of the Twelfth International Conference on Information Systems for Crisis Response and Management, Kristiansand, Norway.
  9. Gnip (2016). Gnip.
  10. Goodchild, M. F. (2007). Citizens as sensors: the world of volunteered geography. GeoJournal, 69(4):211-221.
  11. Goodchild, M. F., Aubrecht, C., and Bhaduri, B. (2016). New questions and a changing focus in advanced vgi research. Transactions in GIS.
  12. Haines, E. (1994). Point in polygon strategies. Graphics gems IV, 994:24-26.
  13. Haklay, M. (2010). How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets. Environment and Planning B: Planning and Design, 37(4):682-703.
  14. Haklay, M. and Weber, P. (2008). OpenStreetMap: UserGenerated Street Maps. IEEE Pervasive Computing, 7(4):12-18.
  15. Heaton, T. H. and Hartzell, S. H. (1987). Earthquake hazards on the cascadia subduction zone. Science, 236(4798):162-168.
  16. Hochman, H. M. and Rodgers, J. D. (1969). Pareto optimal redistribution. The American Economic Review, 59(4):542-557.
  17. Leong, L., Toombs, D., and Gill, B. (2015). Magic quadrant for cloud infrastructure as a service, worldwide. Analyst (s), 501:G00265139.
  18. Mennis, J. and Hultgren, T. (2006). Intelligent dasymetric mapping and its application to areal interpolation. Cartography and Geographic Information Science, 33(3):179-194.
  19. Miller, H. J. (2010). The data avalanche is here. shouldnt we be digging? Journal of Regional Science, 50(1):181- 201.
  20. Morstatter, F., Pfeffer, J., Liu, H., and Carley, K. M. (2013). Is the sample good enough? comparing data from twitter's streaming api with twitter's firehose. arXiv preprint arXiv:1306.5204.
  21. Moussalli, R., Srivatsa, M., and Asaad, S. (2015). Fast and flexible conversion of geohash codes to and from latitude/longitude coordinates. In Field-Programmable Custom Computing Machines (FCCM), 2015 IEEE 23rd Annual International Symposium on, pages 179- 186. IEEE.
  22. Octave, G. (2016). Gnu octave.
  23. Oracle (2016). Java software.
  24. PostGIS (2016). Postgis - spatial and geographic objects for postgresql.
  25. Sagl, G., Resch, B., Hawelka, B., and Beinat, E. (2012). From social sensor data to collective human behaviour patterns: Analysing and visualising spatio-temporal dynamics in urban environments. In Proceedings of the GI-Forum, pages 54-63.
  26. Services, A. W. (2015). Overview of amazon web services. Technical report. [Online; accessed 06-November2016].
  27. Stewart, R., Piburn, J., Webber, E., Urban, M., Morton, A., Thakur, G., and Bhaduri, B. (2015). Can social media play a role in developing building occupancy curves for small area estimation? In Proc. 13th Int. Conf. GeoComp.
  28. Suite, J. T. (2016). Jts topology suite.
  29. Toepke, S. (2016). Structure occupancy curve generation using geospatially enabled social media data. In 2nd International Geographical Information Systems Theory, Applications and Management, volume 1, pages 32-38.
  30. Toepke, S. L. and Starsman, R. S. (2015). Population distribution estimation of an urban area using crowd sourced data for disaster response.
Download


Paper Citation


in Harvard Style

Lee Toepke S. (2017). Data Density Considerations for Crowd Sourced Population Estimations from Social Media . In Proceedings of the 3rd International Conference on Geographical Information Systems Theory, Applications and Management - Volume 1: GISTAM, ISBN 978-989-758-252-3, pages 35-42. DOI: 10.5220/0006314300350042


in Bibtex Style

@conference{gistam17,
author={Samuel Lee Toepke},
title={Data Density Considerations for Crowd Sourced Population Estimations from Social Media},
booktitle={Proceedings of the 3rd International Conference on Geographical Information Systems Theory, Applications and Management - Volume 1: GISTAM,},
year={2017},
pages={35-42},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006314300350042},
isbn={978-989-758-252-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Geographical Information Systems Theory, Applications and Management - Volume 1: GISTAM,
TI - Data Density Considerations for Crowd Sourced Population Estimations from Social Media
SN - 978-989-758-252-3
AU - Lee Toepke S.
PY - 2017
SP - 35
EP - 42
DO - 10.5220/0006314300350042