Information Quality in Social Networks: Predicting Spammy Naming Patterns for Retrieving Twitter Spam Accounts

Mahdi Washha, Aziz Qaroush, Manel Mezghani, Florence Sèdes

Abstract

The popularity of social networks is mainly conditioned by the integrity and the quality of contents generated by users as well as the maintenance of users’ privacy. More precisely, Twitter data (e.g. tweets) are valuable for a tremendous range of applications such as search engines and recommendation systems in which working on a high quality information is a compulsory step. However, the existence of ill-intentioned users in Twitter imposes challenges to maintain an acceptable level of data quality. Spammers are a concrete example of ill-intentioned users. Indeed, they have misused all services provided by Twitter to post spam content which consequently leads to serious problems such as polluting search results. As a natural reaction, various detection methods have been designed which inspect individual tweets or accounts for the existence of spam. In the context of large collections of Twitter users, applying these conventional methods is time consuming requiring months to filter out spam accounts in such collections. Moreover, Twitter community cannot apply them either randomly or sequentially on each user registered because of the dynamicity of Twitter network. Consequently, these limitations raise the need to make the detection process more systematic and faster. Complementary to the conventional detection methods, our proposal takes the collective perspective of users (or accounts) to provide a searchable information to retrieve accounts having high potential for being spam ones. We provide a design of an unsupervised automatic method to predict spammy naming patterns, as searchable information, used in naming spam accounts. Our experimental evaluation demonstrates the efficiency of predicting spammy naming patterns to retrieve spam accounts in terms of precision, recall, and normalized discounted cumulative gain at different ranks.

References

  1. Abascal-Mena, R., Lema, R., and Sèdes, F. (2014). From tweet to graph: Social network analysis for semantic information extraction. In IEEE 8th International Conference on Research Challenges in Information Science, RCIS 2014, Marrakech, Morocco, May 28- 30, 2014, pages 1-10.
  2. Abascal-Mena, R., Lema, R., and Sèdes, F. (2015). Detecting sociosemantic communities by applying social network analysis in tweets. Social Network Analysis and Mining, 5(1):1-17.
  3. Abascal-Mena, R., Lema, R., and Sèdes, F. (2015). Detecting sociosemantic communities by applying social network analysis in tweets. Social Netw. Analys. Mining, 5(1):38:1-38:17.
  4. Amleshwaram, A. A., Reddy, N., Yadav, S., Gu, G., and Yang, C. (2013). Cats: Characterizing automation of twitter spammers. In Communication Systems and Networks (COMSNETS), 2013 Fifth International Conference on, pages 1-10. IEEE.
  5. Benevenuto, F., Magno, G., Rodrigues, T., and Almeida, V. (2010). Detecting spammers on twitter. In In Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS, page 12.
  6. Cao, C. and Caverlee, J. (2015). Detecting spam urls in social media via behavioral analysis. In Advances in Information Retrieval, pages 703-714. Springer.
  7. Chu, Z., Gianvecchio, S., Wang, H., and Jajodia, S. (2012a). Detecting automation of twitter accounts: Are you a human, bot, or cyborg? Dependable and Secure Computing, IEEE Transactions on, 9(6):811-824.
  8. Chu, Z., Widjaja, I., and Wang, H. (2012b). Detecting social spam campaigns on twitter. In Applied Cryptography and Network Security, pages 455-472. Springer.
  9. Freeman, D. M. (2013). Using naive bayes to detect spammy names in social networks. In Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security, AISec 7813, pages 3-12, New York, NY, USA. ACM.
  10. Hu, X., Tang, J., and Liu, H. (2014). Online social spammer detection. In AAAI, pages 59-65.
  11. Hu, X., Tang, J., Zhang, Y., and Liu, H. (2013). Social spammer detection in microblogging. In IJCAI, volume 13, pages 2633-2639. Citeseer.
  12. Lee, K., Caverlee, J., and Webb, S. (2010). Uncovering social spammers: Social honeypots + machine learning. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 7810, pages 435-442, New York, NY, USA. ACM.
  13. Manning, C. D., Raghavan, P., and Sch ütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA.
  14. Martinez-Romo, J. and Araujo, L. (2013). Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Systems with Applications, 40(8):2992-3000.
  15. McCord, M. and Chuah, M. (2011). Spam detection on twitter using traditional classifiers. In Proceedings of the 8th International Conference on Autonomic and Trusted Computing, ATC'11, pages 175-186. Springer-Verlag.
  16. Meda, C., Bisio, F., Gastaldo, P., and Zunino, R. (2014). A machine learning approach for twitter spammers detection. In 2014 International Carnahan Conference on Security Technology (ICCST), pages 1-6. IEEE.
  17. Mezghani, M., On-at, S., Péninou, A., Canut, M., Zayani, C. A., Amous, I., and Sèdes, F. (2015). A case study on the influence of the user profile enrichment on buzz propagation in social media: Experiments on delicious. In New Trends in Databases and Information Systems - ADBIS 2015 Short Papers and Workshops, BigDap, DCSA, GID, MEBIS, OAIS, SW4CH, WISARD, Poitiers, France, September 8-11, 2015. Proceedings, pages 567-577.
  18. Mezghani, M., Zayani, C. A., Amous, I., Péninou, A., and Sèdes, F. (2014). Dynamic enrichment of social users' interests. In IEEE 8th International Conference on Research Challenges in Information Science, RCIS 2014, Marrakech, Morocco, May 28-30, 2014, pages 1-11.
  19. On-at, S., Quirin, A., Péninou, A., Baptiste-Jessel, N., Canut, M., and Sèdes, F. (2016). Taking into account the evolution of users social profile: Experiments on twitter and some learned lessons. In Tenth IEEE International Conference on Research Challenges in Information Science, RCIS 2016, Grenoble, France, June 1-3, 2016, pages 1-12.
  20. Santos, I., Miambres-Marcos, I., Laorden, C., GalnGarca, P., Santamara-Ibirika, A., and Bringas, P. G. (2014). Twitter content-based spam filtering. In International Joint Conference SOCO'13-CISIS'13- ICEUTE'13, pages 449-458. Springer.
  21. Stringhini, G., Kruegel, C., and Vigna, G. (2010). Detecting Spammers on Social Networks. In Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC 7810, pages 1-9, New York, NY, USA. ACM.
  22. Twitter (2016). The twitter rules. https://support. twitter.com/articles/18311#. [Online; accessed 1- March-2016].
  23. Wang, A. H. (2010). Don't follow me: Spam detection in twitter. In Security and Cryptography (SECRYPT), Proceedings of the 2010 International Conference on, pages 1-10.
  24. Wang, Y., Wang, L., Li, Y., He, D., Chen, W., and Liu, T.-Y. (2013). A theoretical analysis of ndcg ranking measures. In Proceedings of the 26th Annual Conference on Learning Theory (COLT 2013).
  25. Yang, C., Harkreader, R., Zhang, J., Shin, S., and Gu, G. (2012). Analyzing spammers' social networks for fun and profit: A case study of cyber criminal ecosystem on twitter. In Proceedings of the 21st International Conference on World Wide Web, WWW 7812, pages 71-80, New York, NY, USA. ACM.
  26. Yang, C., Harkreader, R. C., and Gu, G. (2011). Die free or live hard? empirical evaluation and new design for fighting evolving twitter spammers. In Proceedings of the 14th International Conference on Recent Advances in Intrusion Detection, RAID'11, pages 318- 337, Berlin, Heidelberg. Springer-Verlag.
  27. Yang, J. and Leskovec, J. (2013). Overlapping community detection at scale: A nonnegative matrix factorization approach. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM 7813, pages 587-596, New York, NY, USA. ACM.
Download


Paper Citation


in Harvard Style

Washha M., Qaroush A., Mezghani M. and Sèdes F. (2017). Information Quality in Social Networks: Predicting Spammy Naming Patterns for Retrieving Twitter Spam Accounts . In Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-758-248-6, pages 610-622. DOI: 10.5220/0006314006100622


in Bibtex Style

@conference{iceis17,
author={Mahdi Washha and Aziz Qaroush and Manel Mezghani and Florence Sèdes},
title={Information Quality in Social Networks: Predicting Spammy Naming Patterns for Retrieving Twitter Spam Accounts},
booktitle={Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2017},
pages={610-622},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006314006100622},
isbn={978-989-758-248-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - Information Quality in Social Networks: Predicting Spammy Naming Patterns for Retrieving Twitter Spam Accounts
SN - 978-989-758-248-6
AU - Washha M.
AU - Qaroush A.
AU - Mezghani M.
AU - Sèdes F.
PY - 2017
SP - 610
EP - 622
DO - 10.5220/0006314006100622