DISTL: Distributed In-Memory Spatio-Temporal Event-based Storyline Categorization Platform in Social Media
Manu Shukla, Ray Dos Santos, Andrew Fong, Chang-Tien Lu
2016
Abstract
Event analysis in social media is challenging due to endless amount of information generated daily. While current research has put a strong focus on detecting events, there is no clear guidance on how those storylines should be processed such that they would make sense to a human analyst. In this paper, we present DISTL, an event processing platform which takes as input a set of storylines (a sequence of entities and their relationships) and processes them as follows: (1) uses different algorithms (LDA, SVM, information gain, rule sets) to identify events with different themes and allocates storylines to them; and (2) combines the events with location and time to narrow down to the ones that are meaningful in a specific scenario. The output comprises sets of events in different categories. DISTL uses in-memory distributed processing that scales to high data volumes and categorizes generated storylines in near real-time. It uses Big Data tools, such as Hadoop and Spark, which have shown to be highly efficient in handling millions of tweets concurrently.
References
- Agarwal, C. and Subbian, K. (2012). Event detection in social streams. SDM, pages 624-635.
- Apache and Spark (2015a). https://spark.apache.org/docs /latest/mllib-clustering.html#latent-dirichletallocation-lda.
- Apache and Spark (2015b). https://spark.apache.org/ docs/latest/mllib-linear-methods.html#linear-supportvector-machines-svms.
- Apache and Spark (2015c). Spark programming guide. http://spark.apache.org/docs/latest/programmingguide.html.
- Apache, Spark, and Packages (2015). https://github.com/ wxhc3sc6opm8m1hxbomy/spark-mrmr-featureselection.
- Chae, J., Thom, D., Bosch, H., Jang, Y., Maciejewski, R., Ebert, D., and Ertl, T. (2012). Spatiotemporal social media analytics for abnormal event detection and examination using seasonal-trend decomposition. In Visual Analytics Science and Technology (VAST), 2012 IEEE Conference on, pages 143-152.
- Chen, F. and Neill, D. B. (2014). Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graph. In ACM SIGKDD, pages 1166-1175.
- Cunningham, H., Maynard, D., Bontcheva, K., and Tablan, V. (2014). Developing language processing components with gate version 8. University of Sheffield Department of Computer Science.
- Dean, J. and Ghemawat, S. (2008). Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107-113.
- Jingshu Weng, B.-S. L. (2011). Event detection in twitter. AAAI, pages 401-408.
- Keskisärkkä, R. and Blomqvist, E. (2013). Semantic complex event processing for social media monitoring-a survey. In Proceedings of Social Media and Linked Data for Emergency Response (SMILE) Co-located with the 10th Extended Semantic Web Conference, Montpellier, France. CEUR workshop proceedings (May 2013).
- Lappas, T., Vieira, M. R., Gunopulos, D., and Tsotras, V. J. (2012). On the spatiotemporal burstiness of terms. Proceedings of the VLDB Endowment, 5(9):836-847.
- Leetaru, K. and Schrodt, P. A. (2013). GDELT: Global Database of Events, Language, and Tone. In ISA Annual Convention.
- Li, C., Sun, A., and Datta, A. (2012a). Twevent: Segmentbased event detection from tweets. In (Conference on Information and knowledge Management, pages 155- 164.
- Li, R., Lei, K. H., Khadiwala, R., and Chang, K. (2012b). Tedas: A twitter-based event detection and analysis system. In Proc. 28th IEEE Conference on Data Engineering (ICDE), pages 1273-1276.
- Petrovic, S., Osborne, M., McCreadie, R., Macdonald, C., Ounis, I., and Shrimpton, L. (2013). Can twitter replace newswire for breaking news? In 7th International AAAI Conference On Weblogs And Social Media (ICWSM).
- Pohl, D., Bouchachia, A., and Hellwagner, H. (2012). Automatic sub-event detection in emergency management using social media. In Proceedings of the 21st International Conference Companion on World Wide Web, WWW 7812 Companion, pages 683-686, New York, NY, USA. ACM.
- Radinsky, K. and Horvitz, E. (2013). Mining the web to predict future events. WSDM, pages 255-264.
- Reuter, T., Buza, L. D. K., and Schmidt-thieme, L. (2011). Scalable event-based clustering of social media via record linkage techniques. In ICWSM.
- Saha, A. and Sindhwani, V. (2012). Learning evolving and emerging topics in social media: A dynamic nmf approach with temporal regularization. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM 7812, pages 693-702, New York, NY, USA. ACM.
- Sakaki, T., Okazaki, M., and Matsuo, Y. (2010). Earthquake shakes twitter users: Real-time event detection by social sensors. WWW, pages 851-860.
- Shukla, M., Santos, R. D., Chen, F., and Lu, C.-T. (2015). Discrn: A distributed storytelling framework for intelligence analysis. Virginia Tech Computer Science Technical Report http://hdl.handle.net/10919/53944.
- Walther, M. and Kaisser, M. (2013). Geo-spatial event detection in the twitter stream. In Advances in Information Retrieval, volume 7814 of Lecture Notes in Computer Science, pages 356-367. Springer Berlin Heidelberg.
- Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M. J., Shenker, S., and Stoica, I. (2012). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), pages 15-28, San Jose, CA. USENIX.
- Zhao, L., Chen, F., Dai, J., Lu, C.-T., and Ramakrishnan, N. (2014). Unsupervised spatial events detection in targeted domains with applications to civil unrest modeling. PLOS One, page e110206.
- Zhao, L., Chen, F., Lu, C.-T., and Ramakishnan, N. (2015). Spatiotemporal event forecasting in social media. In SDM, pages 963-971.
- Zhou, X. and Chen, L. (2014). Event detection over twitter social media streams. The VLDB Journal, 23(3):381- 400.
Paper Citation
in Harvard Style
Shukla M., Dos Santos R., Fong A. and Lu C. (2016). DISTL: Distributed In-Memory Spatio-Temporal Event-based Storyline Categorization Platform in Social Media . In Proceedings of the 2nd International Conference on Geographical Information Systems Theory, Applications and Management - Volume 1: GISTAM, ISBN 978-989-758-188-5, pages 39-50. DOI: 10.5220/0005831200390050
in Bibtex Style
@conference{gistam16,
author={Manu Shukla and Ray Dos Santos and Andrew Fong and Chang-Tien Lu},
title={DISTL: Distributed In-Memory Spatio-Temporal Event-based Storyline Categorization Platform in Social Media},
booktitle={Proceedings of the 2nd International Conference on Geographical Information Systems Theory, Applications and Management - Volume 1: GISTAM,},
year={2016},
pages={39-50},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005831200390050},
isbn={978-989-758-188-5},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 2nd International Conference on Geographical Information Systems Theory, Applications and Management - Volume 1: GISTAM,
TI - DISTL: Distributed In-Memory Spatio-Temporal Event-based Storyline Categorization Platform in Social Media
SN - 978-989-758-188-5
AU - Shukla M.
AU - Dos Santos R.
AU - Fong A.
AU - Lu C.
PY - 2016
SP - 39
EP - 50
DO - 10.5220/0005831200390050