Authors:
Randa Benkhelifa
and
Fatima Zohra Laallam
Affiliation:
Univ. Ouargla, Algeria
Keyword(s):
Facebook Posts, Text Classification, Pre-processing, Machine Learning Algorithms, Internet Slang.
Related
Ontology
Subjects/Areas/Topics:
Social Media Analytics
;
Society, e-Business and e-Government
;
Web Information Systems and Technologies
Abstract:
Facebook is one of the most used socials networking sites. It is more than a simple website, but a popular tool of communication. Social networking users communicate between them exchanging a several kinds of content including a free text, image and video. Today, the social media users have a special way to express themselves. They create a new language known as “internet slang”, which crosses the same meaning using different lexical units. This unstructured text has its own specific characteristics, such as, massive, noisy and dynamic, while it requires novel preprocessing methods adapted to those characteristics in order to ease and make the process of the classification algorithms effective. Most of previous works about social media text classification eliminate Stopwords and classify posts based on their topic (e.g. politics, sport, art, etc). In this paper, we propose to classify them in a lower level into diverse pre-chosen classes using three machine learning algorithms SVM, N
aïve Bayes and K-NN. To improve our classification, we propose a new preprocessing approach based on the Stopwords, Internet slang and other specific lexical units. Finally, we compared between all results for each classifier, then between classifiers results.
(More)