Authors:
Rui Portocarrero Sarmento
1
;
Mário Cordeiro
1
;
Pavel Brazdil
2
and
João Gama
2
Affiliations:
1
University of Porto, Portugal
;
2
LIAAD-INESC TEC, Portugal
Keyword(s):
Automatic Keyword Extraction, Incremental PageRank, Data Streams, Text Mining, Incremental TextRank.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Artificial Intelligence and Decision Support Systems
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Evolutionary Programming
;
Information Systems Analysis and Specification
;
Natural Language Interfaces to Intelligent Systems
;
Performance Evaluation and Benchmarking
;
Sensor Networks
;
Signal Processing
;
Soft Computing
;
Software Engineering
;
Strategic Decision Support Systems
Abstract:
Text Mining and NLP techniques are a hot topic nowadays. Researchers thrive to develop new and faster
algorithms to cope with larger amounts of data. Particularly, text data analysis has been increasing in interest
due to the growth of social networks media. Given this, the development of new algorithms and/or the upgrade
of existing ones is now a crucial task to deal with text mining problems under this new scenario. In this paper,
we present an update to TextRank, a well-known implementation used to do automatic keyword extraction
from text, adapted to deal with streams of text. In addition, we present results for this implementation and
compare them with the batch version. Major improvements are lowest computation times for the processing of
the same text data, in a streaming environment, both in sliding window and incremental setups. The speedups
obtained in the experimental results are significant. Therefore the approach was considered valid and useful
to the research c
ommunity.
(More)