and spam precision increase as long as the number
of communities increases. The justification for this
behavior is that the experimented 100 trending topics
had been attacked by many uncorrelated spam cam-
paigns. Subsequently, increasing the number of com-
munities allows to detect spam campaigns precisely
and accurately. At spam recall level, the behavior in-
versely correlates with the number of communities.
Using higher number of communities than the real
(unknown) number of spam campaigns leads to se-
parate those campaigns on more communities. As
spam communities might contain non-spam users (ac-
counts), in such a case, the values of features decre-
ase, classifying those communities as ”non-spam”.
False Positive v.s. High Quality. In email spam fil-
tering, great efforts are directed toward the false po-
sitive problem that occurs when a truly ”non-spam”
email is classified as ”spam”. However, in the context
of social spam, the false positive problem is less im-
portant because of the availability of large-scale data
collections, meaning that classifying non-spam user
as spam is not a serious problem. Thus, the attention
is turned in OSNs context to increase the quality of
data where a wide range of Twitter-based applications
(e.g. tweet summarization) has high priority to work
on noise free collections. Also, the computational
time aspect is significant when targeting large-scale
collections. Hence, our method is completely suita-
ble to process large-scale collections with providing
high quality collections. For instance, the time requi-
red to process our Twitter data-set is no more than
one day. At last, as various experiments are given for
different ∆ values where no optimal value can satisfy
all performance metrics, the selection is mainly de-
pendent on the desired requirements of the final col-
lection. For instance, low ∆ value is recommended
to have too high quality collection with having high
probability to lose not noisy information.
6 CONCLUSION AND FUTURE
DIRECTIONS
In this paper, we have designed an unsupervised ap-
proach for filtering out spam users (accounts) ex-
isting in large-scale collections of trending topics.
Our method takes the collective perspective in de-
tecting spam users through discovering the correla-
tions among them. Our work brings two additional
benefits to the information quality field: (i) filtering
out spam users without needing for annotated data-
sets; (ii) and performing the filtration process in a fast
way because of the dependency on the available meta-
data only, without needing for retrieving information
from the Twitter’s servers. With this new idea, we
plan as a future work to study the impact of perfor-
ming collaboration with other social networks to im-
prove the current results. Also, we intend to design
more collective-based robust features such as the sen-
timent of tweets.
REFERENCES
Agarwal, N. and Yiliyasi, Y. (2010). Information quality
challenges in social media. In International Confe-
rence on Information Quality (ICIQ), pages 234–248.
Benevenuto, F., Magno, G., Rodrigues, T., and Almeida, V.
(2010). Detecting spammers on twitter. In In Colla-
boration, Electronic messaging, Anti-Abuse and Spam
Conference (CEAS), page 12.
Cao, C. and Caverlee, J. (2015). Detecting spam urls in
social media via behavioral analysis. In Advances in
Information Retrieval, pages 703–714. Springer.
Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A li-
brary for support vector machines. ACM Transactions
on Intelligent Systems and Technology, 2:27:1–27:27.
Software available at http://www.csie.ntu.edu.tw/ cj-
lin/libsvm.
Chu, Z., Gianvecchio, S., Wang, H., and Jajodia, S. (2012a).
Detecting automation of twitter accounts: Are you a
human, bot, or cyborg? Dependable and Secure Com-
puting, IEEE Transactions on, 9(6):811–824.
Chu, Z., Widjaja, I., and Wang, H. (2012b). Detecting social
spam campaigns on twitter. In Applied Cryptography
and Network Security, pages 455–472. Springer.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reute-
mann, P., and Witten, I. H. (2009). The weka data
mining software: An update. SIGKDD Explor. Newsl.,
11(1):10–18.
Hu, X., Tang, J., and Liu, H. (2014). Online social spammer
detection. In AAAI, pages 59–65.
Hu, X., Tang, J., Zhang, Y., and Liu, H. (2013). Social
spammer detection in microblogging. In IJCAI, vo-
lume 13, pages 2633–2639. Citeseer.
Lee, K., Caverlee, J., and Webb, S. (2010). Uncovering so-
cial spammers: Social honeypots + machine learning.
In Proceedings of the 33rd International ACM SIGIR
Conference on Research and Development in Informa-
tion Retrieval, SIGIR ’10, pages 435–442, New York,
NY, USA. ACM.
Manning, C. D., Raghavan, P., and Sch
¨
utze, H. (2008). In-
troduction to Information Retrieval. Cambridge Uni-
versity Press, New York, NY, USA.
Martinez-Romo, J. and Araujo, L. (2013). Detecting mali-
cious tweets in trending topics using a statistical ana-
lysis of language. Expert Systems with Applications,
40(8):2992–3000.
McCord, M. and Chuah, M. (2011). Spam detection on twit-
ter using traditional classifiers. In Proceedings of the
8th International Conference on Autonomic and Trus-
ted Computing, ATC’11, pages 175–186. Springer-
Verlag.
ICEIS 2017 - 19th International Conference on Enterprise Information Systems
674