Authors:
Mahdi Washha
1
;
Dania Shilleh
2
;
Yara Ghawadrah
2
;
Reem Jazi
2
and
Florence Sedes
1
Affiliations:
1
University of Toulouse, France
;
2
Birzeit University
Keyword(s):
Twitter, Social Networks, Spam.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Enterprise Information Systems
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Society, e-Business and e-Government
;
Software Agents and Internet Computing
;
Symbolic Systems
;
User Profiling and Recommender Systems
;
Web 2.0 and Social Networking Controls
;
Web Information Systems and Technologies
Abstract:
Online social networks (OSNs) provide data valuable for a tremendous range of applications such as search engines and recommendation systems. However, the easy-to-use interactive interfaces and low barriers of publications have exposed various information quality (IQ) problems, decreasing the quality of user-generated content (UGC) in such networks. The existence of a particular kind of ill-intentioned users, so-called social spammers, imposes challenges to maintain an acceptable level of information quality. Social spammers simply misuse all services provided by social networks to post spam contents in an automated way. As a natural reaction, various detection methods have been designed, which inspect individual posts or accounts for the existence of spam. The major limitations of these methods are supervised learning-based requiring ground truth data-sets. Moreover, the account-based detection methods are not practical for processing ”crawled” large collections of social posts, req
uiring months to process such collections. Post-level detection methods also have another drawback in adapting robustly the dynamic behavior of spammers because of the weakness of features in discriminating among spam and non-spam, although of applicability of such methods in regards of time. Hence, in this paper, we introduce a design of an unsupervised learning approach dedicated for detecting spam accounts (or users) existing in large collections of trending topics, from a collective perspective point of view. More precisely, our method leverages the available simple meta-data about users and the published posts (tweets) related to a topic, as heuristic information, to find any correlation among spam users acting as a spam campaign. Compared to the supervised learning methods, our experimental evaluation demonstrates the efficiency of predicting spam accounts (users) in terms of accuracy, precision, recall, and F-measure performance metrics.
(More)