Adaptive Filtering of Comment Spam in Multi-user Forums and Blogs

Marco Ramilli, Marco Prandini

Abstract

The influence of web-based user-interaction platforms, like forums, wikis and blogs, has extended its reach into the business sphere, where comments about products and companies can affect corporate values. Thus, guaranteeing the authenticity of the published data has become very important. In fact, these platforms have quickly become the target of attacks aiming at injecting false comments. This phenomenon is worrisome only when implemented by automated tools, which are able to massively influence the average tenor of com- ments. The research activity illustrated in this paper aims to devise a method to detect automatically-generated comments and filter them out. The proposed solution is completely server-based, for enhanced compatibility and user-friendliness. The core component leverages the flexibility of logic programming for building the knowledge base in a way that allows continuous, mostly unsupervised, learning of the rules used to classify comments for determining whether a comment is acceptable or not.

References

  1. Akismet comment spam and trackback spam stopper. http://akismet.com/.
  2. Inaccessibility of captcha - alternatives to visual turing tests on the web. w3c working group note. http://www.w3.org/TR/turingtest/.
  3. Trec 2006 spam evaluation kit. http://plg.uwaterloo.ca/ gvcormac/jig/.
  4. Wordpress - blog tool and weblog platform. http://wordpress.org/.
  5. Sentence recognition through hybrid neuro-markovian modeling. In ICDAR 7801: Proceedings of the Sixth International Conference on Document Analysis and Recognition, page 731, Washington, DC, USA, 2001. IEEE Computer Society.
  6. F. Belanger and C. V. Slyke. Abuse or learning? Commun. ACM, 45(1):64-65, 2002.
  7. J. W. Brian Chess, Yekaterina Tsipenyuk O'Neil. Javascript hijacking - fortify software white paper. http://www.fortifysoftware.com/servlet/downloads/public/JavaScript Hijacking.pdf, 2007.
  8. J. C. Brustoloni and R. Villamarín-Salomón. Improving security decisions with polymorphic and audited dialogs. In SOUPS 7807: Proceedings of the 3rd symposium on Usable privacy and security, pages 76-85, New York, NY, USA, 2007. ACM.
  9. G. V. Cormack, J. M. G. Hidalgo, and E. P. Sánz. Spam filtering for short messages. In CIKM 7807: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 313-320, New York, NY, USA, 2007. ACM.
  10. G. V. Cormack and T. R. Lynam. Trec 2005 spam track overview. In In Proc. 14th Text REtrieval Conference (TREC 2005, 2005.
  11. G. V. Cormack and T. R. Lynam. Online supervised spam filter evaluation. ACM Trans. Inf. Syst., 25(3):11, 2007.
  12. E. Denti, A. Omicini, and A. Ricci. tuProlog: A light-weight Prolog for Internet applications and infrastructures. In I. Ramakrishnan, editor, Practical Aspects of Declarative Languages, volume 1990 of LNCS, pages 184-198. Springer, 2001. 3rd International Symposium (PADL 2001), Las Vegas, NV, USA, 11-12 Mar. 2001. Proceedings.
  13. E. T. B. Dimitri do B. DeFigueiredo and S. F. Wu. Trust is in the eye of the beholder. UCDavis Technical Report CSE 2007-09.
  14. D. Fetterly, M. Manasse, and M. Najork. Spam, damn spam, and statistics: using statistical analysis to locate spam web pages. In WebDB 7804: Proceedings of the 7th International Workshop on the Web and Databases, pages 1-6, New York, NY, USA, 2004. ACM.
  15. M. Hu, A. Sun, and E.-P. Lim. Comments-oriented blog summarization by sentence extraction. In CIKM 7807: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 901-904, New York, NY, USA, 2007. ACM.
  16. N. Jindal and B. Liu. Review spam detection. In WWW 7807: Proceedings of the 16th international conference on World Wide Web, pages 1189-1190, New York, NY, USA, 2007. ACM.
  17. S. D. Kamvar, M. T. Schlosser, and H. Garcia-Molina. The eigentrust algorithm for reputation management in p2p networks. In WWW 7803: Proceedings of the 12th international conference on World Wide Web, pages 640-651, New York, NY, USA, 2003. ACM.
  18. Y.-R. Lin, H. Sundaram, Y. Chi, J. Tatemura, and B. L. Tseng. Splog detection using selfsimilarity analysis on blog temporal dynamics. In AIRWeb 7807: Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, pages 1-8, New York, NY, USA, 2007. ACM.
  19. R. D. Lins and P. Gonc¸alves. Automatic language identification of written texts. In SAC 7804: Proceedings of the 2004 ACM symposium on Applied computing, pages 1128-1133, New York, NY, USA, 2004. ACM.
  20. D. Lopresti. Leveraging the CAPTCHA Problem, pages 97-110. 2005.
  21. G. Mishne, D. Carmel, and R. Lempel. Blocking blog spam with language model disagreement. In Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web - AIRWeb 2005, pages 1-6. Lehigh University, Bethlehem, PA USA, 2005.
  22. G. Mori and J. Malik. Recognizing objects in adversarial clutter: Breaking a visual captcha. In CVPR (1), pages 134-144. IEEE Computer Society, 2003.
  23. D. Nguyen and B. Widrow. Neural networks for self-learning control systems. Control Systems Magazine, IEEE, 10(3):18-23, Apr 1990.
  24. A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly. Detecting spam web pages through content analysis. In WWW 7806: Proceedings of the 15th international conference on World Wide Web, pages 83-92, New York, NY, USA, 2006. ACM.
  25. D. Sculley and G. M. Wachman. Relaxed online svms for spam filtering. In SIGIR 7807: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 415-422, New York, NY, USA, 2007. ACM.
  26. J. Yan and A. S. E. Ahmad. Breaking visual captchas with naive pattern recognition algorithms. Computer Security Applications Conference, 2007. ACSAC 2007. Twenty-Third Annual, pages 279-291, 10-14 Dec. 2007.
Download


Paper Citation


in Harvard Style

Ramilli M. and Prandini M. (2008). Adaptive Filtering of Comment Spam in Multi-user Forums and Blogs . In Proceedings of the 6th International Workshop on Security in Information Systems - Volume 1: WOSIS, (ICEIS 2008) ISBN 978-989-8111-44-9, pages 122-131. DOI: 10.5220/0001740901220131


in Bibtex Style

@conference{wosis08,
author={Marco Ramilli and Marco Prandini},
title={Adaptive Filtering of Comment Spam in Multi-user Forums and Blogs},
booktitle={Proceedings of the 6th International Workshop on Security in Information Systems - Volume 1: WOSIS, (ICEIS 2008)},
year={2008},
pages={122-131},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001740901220131},
isbn={978-989-8111-44-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Workshop on Security in Information Systems - Volume 1: WOSIS, (ICEIS 2008)
TI - Adaptive Filtering of Comment Spam in Multi-user Forums and Blogs
SN - 978-989-8111-44-9
AU - Ramilli M.
AU - Prandini M.
PY - 2008
SP - 122
EP - 131
DO - 10.5220/0001740901220131