Authors:
Manu Shukla
1
;
Ray Dos Santos
2
;
Andrew Fong
1
and
Chang-Tien Lu
3
Affiliations:
1
Omniscience Corporation, United States
;
2
U.S. Army Corps of Engineers - ERDC - GRL, United States
;
3
Virginia Tech, United States
Keyword(s):
Event Categorization, In-Memory Distribution, Big Data.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Pattern Recognition
;
Web Applications
Abstract:
Event analysis in social media is challenging due to endless amount of information generated daily. While current research has put a strong focus on detecting events, there is no clear guidance on how those storylines should be processed such that they would make sense to a human analyst. In this paper, we present DISTL, an event processing platform which takes as input a set of storylines (a sequence of entities and their relationships) and processes them as follows: (1) uses different algorithms (LDA, SVM, information gain, rule sets) to identify events with different themes and allocates storylines to them; and (2) combines the events with location and time to narrow down to the ones that are meaningful in a specific scenario. The output comprises sets of events in different categories. DISTL uses in-memory distributed processing that scales to high data volumes and categorizes generated storylines in near real-time. It uses Big Data tools, such as Hadoop and Spark, which have sho
wn to be highly efficient in handling millions of tweets concurrently.
(More)