DISTL: Distributed In-Memory Spatio-Temporal Event-based Storyline Categorization Platform in Social Media

Manu Shukla, Ray Dos Santos, Andrew Fong, Chang-Tien Lu

2016

Abstract

Event analysis in social media is challenging due to endless amount of information generated daily. While current research has put a strong focus on detecting events, there is no clear guidance on how those storylines should be processed such that they would make sense to a human analyst. In this paper, we present DISTL, an event processing platform which takes as input a set of storylines (a sequence of entities and their relationships) and processes them as follows: (1) uses different algorithms (LDA, SVM, information gain, rule sets) to identify events with different themes and allocates storylines to them; and (2) combines the events with location and time to narrow down to the ones that are meaningful in a specific scenario. The output comprises sets of events in different categories. DISTL uses in-memory distributed processing that scales to high data volumes and categorizes generated storylines in near real-time. It uses Big Data tools, such as Hadoop and Spark, which have shown to be highly efficient in handling millions of tweets concurrently.

References

  1. Agarwal, C. and Subbian, K. (2012). Event detection in social streams. SDM, pages 624-635.
  2. Apache and Spark (2015a). https://spark.apache.org/docs /latest/mllib-clustering.html#latent-dirichletallocation-lda.
  3. Apache and Spark (2015b). https://spark.apache.org/ docs/latest/mllib-linear-methods.html#linear-supportvector-machines-svms.
  4. Apache and Spark (2015c). Spark programming guide. http://spark.apache.org/docs/latest/programmingguide.html.
  5. Apache, Spark, and Packages (2015). https://github.com/ wxhc3sc6opm8m1hxbomy/spark-mrmr-featureselection.
  6. Chae, J., Thom, D., Bosch, H., Jang, Y., Maciejewski, R., Ebert, D., and Ertl, T. (2012). Spatiotemporal social media analytics for abnormal event detection and examination using seasonal-trend decomposition. In Visual Analytics Science and Technology (VAST), 2012 IEEE Conference on, pages 143-152.
  7. Chen, F. and Neill, D. B. (2014). Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graph. In ACM SIGKDD, pages 1166-1175.
  8. Cunningham, H., Maynard, D., Bontcheva, K., and Tablan, V. (2014). Developing language processing components with gate version 8. University of Sheffield Department of Computer Science.
  9. Dean, J. and Ghemawat, S. (2008). Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107-113.
  10. Jingshu Weng, B.-S. L. (2011). Event detection in twitter. AAAI, pages 401-408.
  11. Keskisärkkä, R. and Blomqvist, E. (2013). Semantic complex event processing for social media monitoring-a survey. In Proceedings of Social Media and Linked Data for Emergency Response (SMILE) Co-located with the 10th Extended Semantic Web Conference, Montpellier, France. CEUR workshop proceedings (May 2013).
  12. Lappas, T., Vieira, M. R., Gunopulos, D., and Tsotras, V. J. (2012). On the spatiotemporal burstiness of terms. Proceedings of the VLDB Endowment, 5(9):836-847.
  13. Leetaru, K. and Schrodt, P. A. (2013). GDELT: Global Database of Events, Language, and Tone. In ISA Annual Convention.
  14. Li, C., Sun, A., and Datta, A. (2012a). Twevent: Segmentbased event detection from tweets. In (Conference on Information and knowledge Management, pages 155- 164.
  15. Li, R., Lei, K. H., Khadiwala, R., and Chang, K. (2012b). Tedas: A twitter-based event detection and analysis system. In Proc. 28th IEEE Conference on Data Engineering (ICDE), pages 1273-1276.
  16. Petrovic, S., Osborne, M., McCreadie, R., Macdonald, C., Ounis, I., and Shrimpton, L. (2013). Can twitter replace newswire for breaking news? In 7th International AAAI Conference On Weblogs And Social Media (ICWSM).
  17. Pohl, D., Bouchachia, A., and Hellwagner, H. (2012). Automatic sub-event detection in emergency management using social media. In Proceedings of the 21st International Conference Companion on World Wide Web, WWW 7812 Companion, pages 683-686, New York, NY, USA. ACM.
  18. Radinsky, K. and Horvitz, E. (2013). Mining the web to predict future events. WSDM, pages 255-264.
  19. Reuter, T., Buza, L. D. K., and Schmidt-thieme, L. (2011). Scalable event-based clustering of social media via record linkage techniques. In ICWSM.
  20. Saha, A. and Sindhwani, V. (2012). Learning evolving and emerging topics in social media: A dynamic nmf approach with temporal regularization. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM 7812, pages 693-702, New York, NY, USA. ACM.
  21. Sakaki, T., Okazaki, M., and Matsuo, Y. (2010). Earthquake shakes twitter users: Real-time event detection by social sensors. WWW, pages 851-860.
  22. Shukla, M., Santos, R. D., Chen, F., and Lu, C.-T. (2015). Discrn: A distributed storytelling framework for intelligence analysis. Virginia Tech Computer Science Technical Report http://hdl.handle.net/10919/53944.
  23. Walther, M. and Kaisser, M. (2013). Geo-spatial event detection in the twitter stream. In Advances in Information Retrieval, volume 7814 of Lecture Notes in Computer Science, pages 356-367. Springer Berlin Heidelberg.
  24. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M. J., Shenker, S., and Stoica, I. (2012). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), pages 15-28, San Jose, CA. USENIX.
  25. Zhao, L., Chen, F., Dai, J., Lu, C.-T., and Ramakrishnan, N. (2014). Unsupervised spatial events detection in targeted domains with applications to civil unrest modeling. PLOS One, page e110206.
  26. Zhao, L., Chen, F., Lu, C.-T., and Ramakishnan, N. (2015). Spatiotemporal event forecasting in social media. In SDM, pages 963-971.
  27. Zhou, X. and Chen, L. (2014). Event detection over twitter social media streams. The VLDB Journal, 23(3):381- 400.
Download


Paper Citation


in Harvard Style

Shukla M., Dos Santos R., Fong A. and Lu C. (2016). DISTL: Distributed In-Memory Spatio-Temporal Event-based Storyline Categorization Platform in Social Media . In Proceedings of the 2nd International Conference on Geographical Information Systems Theory, Applications and Management - Volume 1: GISTAM, ISBN 978-989-758-188-5, pages 39-50. DOI: 10.5220/0005831200390050


in Bibtex Style

@conference{gistam16,
author={Manu Shukla and Ray Dos Santos and Andrew Fong and Chang-Tien Lu},
title={DISTL: Distributed In-Memory Spatio-Temporal Event-based Storyline Categorization Platform in Social Media},
booktitle={Proceedings of the 2nd International Conference on Geographical Information Systems Theory, Applications and Management - Volume 1: GISTAM,},
year={2016},
pages={39-50},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005831200390050},
isbn={978-989-758-188-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Geographical Information Systems Theory, Applications and Management - Volume 1: GISTAM,
TI - DISTL: Distributed In-Memory Spatio-Temporal Event-based Storyline Categorization Platform in Social Media
SN - 978-989-758-188-5
AU - Shukla M.
AU - Dos Santos R.
AU - Fong A.
AU - Lu C.
PY - 2016
SP - 39
EP - 50
DO - 10.5220/0005831200390050