Authors:
Pavel Král
1
and
Václav Rajtmajer
2
Affiliations:
1
Faculty of Applied Sciences, University of West Bohemia, NTIS - New Technologies for the Information Society, Faculty of Applied Sciences and University of West Bohemia, Czech Republic
;
2
Faculty of Applied Sciences and University of West Bohemia, Czech Republic
Keyword(s):
Czech, Data, Harvesting, Social Media, Twitter.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Artificial Intelligence
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Knowledge Engineering and Ontology Development
;
Knowledge-Based Systems
;
Natural Language Processing
;
Pattern Recognition
;
Sensor Networks
;
Signal Processing
;
Soft Computing
;
Symbolic Systems
Abstract:
This paper deals with automatic analysis of Czech social media. The main goal is to propose an approach to harvest interesting messages from Twitter in Czech language with high download speed. This method uses user lists to discover potentially interesting tweets to download. It is motivated by the fact that only about 20% of Twitter users are posting informative messages, whereas the remaining 80% not and that it is possible to identify the "important" users by the user lists. The experimental results show that the proposed method is very efficient because it harvests about 6 times more data than the other approaches. This approach should be integrated into an experimental system for the Czech News Agency to monitor the current data-flow on Twitter, download messages in real-time, analyze them and extract relevant events.