Authors:
Ferdaous Jenhani
;
Mohamed Salah Gouider
and
Lamjed Bensaid
Affiliation:
Institut Superieur de Gestion de Tunis, Tunisia
Keyword(s):
Hadoop, Social Data, Twitter, Information Extraction, Drug Abuse.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Information Extraction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Symbolic Systems
Abstract:
Social data analysis becomes a real business requirement regarding the frequent use of social media as a new business strategy. However, their volume, velocity and variety are challenging their storage and processing. In a previous contribution [11, 12], we proposed an events extraction system in which we focused only on data variety and we did not handle volume and velocity dimensions. So, our solution cannot be considered a big data system.
In this work, we port previously proposed system to a parallel and distributed framework in order to reduce the complexity of task and scale up to larger volumes of data continuously growing. We propose two loosely coupled Hadoop clusters for entity recognition and events extraction. In experiments, we carried time test and accuracy test to check the performance of the system on extracting drug abuse behavioral events from 1000000 tweets. Hadoop-based system achieves better performance compared to old system.