New Classification Models for Detecting Hate and Violence Web Content

Shuhua Liu, Thomas Forss

2015

Abstract

Today, the presence of harmful and inappropriate content on the web still remains one of the most primary concerns for web users. Web classification models in the early days are limited by the methods and data available. In our research we revisit the web classification problem with the application of new methods and techniques for text content analysis. Our recent studies have indicated the promising potential of combing topic analysis and sentiment analysis in web content classification. In this paper we further explore new ways and methods to improve and maximize classification performance, especially to enhance precision and reduce false positives, thorough examination and handling of the issues with class imbalance, and through incorporation of LDA topic models.

References

  1. Blei, D, Ng, A., and Jordan, M. I. 2003. Latent dirichlet allocation. Advances in neural information processing systems. 601-608.
  2. Blei, D. M. and J. D. McAuliffe, Supervised Topic Models, Neural Information Processing Systems 21, 2007
  3. Blei, D. 2012. Probabilistic topic models. Communications of the ACM, 55(4):77-84, 2012
  4. Bickel S., M. Bruckner and T. Scheffer, Discriminative Learning for Differing Training and Test Distributions (ICML 2007), Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, 2007.
  5. Broecheler M., L. Mihalkova, L. Getoor. Probabilistic similarity logic. In Proc. of Uncertainty in Artificial Intelligence, 2010
  6. Calado, P., Cristo, M., Goncalves, M. A., de Moura, E. S., Ribeiro-Neto, B., and Ziviani, N. 2006. Link-based similarity measures for the classification of web documents. Journal of the American Society for Information Science and Technology (57:2), 208-221.
  7. Chakrabarti, S., B. Dom and P. Indyk. 1998. Enhanced hypertext categorization using hyperlinks. Proceedings of ACM SIGMOD 1998.
  8. Chen, Z., Wu, O., Zhu, M., and Hu, W. 2006. A novel web page filtering system by combining texts and images. Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, 732- 735. Washington, DC IEEE Computer Society.
  9. Cohen, W. 2002. Improving a page classifier with anchor extraction and link analysis. In S. Becker, S. Thrun, and K. Obermayer (Eds.), Advances in Neural Information Processing Systems (Volume 15, Cambridge, MA: MIT Press) 1481-1488.
  10. Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V. and N. Bhamidipati, Hate Speech Detection with Comment Embeddings. In Proceedings of the 24th International World Wide Web Conference, May 2015
  11. Dumais, S. T., and Chen, H. 2000. Hierarchical classification of web content. Proceedings of SIGIR'00, 256-263.
  12. Elovici, Y., Shapira, B., Last, M., Zaafrany, O., Friedman, M., Schneider, M., and Kandel, A. 2005. Contentbased detection of terrorists browsing the web using an advanced terror detection system (ATDS), Intelligence and Security Informatics (Lecture Notes in Computer Science Volume 3495, 2005), 244-255.
  13. Fersini, E. and E. Messina, Web page classification through probabilistic relational models. Int. Journal of Pattern Recognition and Artificial Intelligence, 27(04), 2013
  14. Fersini, E., Messina, E., & Pozzi, F. A. (2014). Sentiment analysis: Bayesian Ensemble Learning. Decision Support Systems, 68, 26-38.
  15. Getoor L., E. Segal, B. Taskar and D. Koller (2001), Probabilistic models of text and link structure for hypertext classification, in Proc. Int. Joint Conf. Artificial Intelligence, Workshop on Text Learning: Beyond Supervision, pp. 24-29.
  16. Hammami, M., Chahir, Y., and Chen, L. 2003. WebGuard: web based adult content detection and filtering system. Proceedings of the IEEE/WIC Inter. Conf. on Web Intelligence (Oct. 2003), 574 - 578.
  17. He H. and E. A. Garcia, "Learning from Imbalanced Data," IEEE Trans. on Knowledge and Data Engineering, vol. 21, pp. 1263-1284, 2009.
  18. Kwok, I. and Y. Wang, Locate the Hate: Detecting Tweets against Blacks. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, June 2013
  19. Last, M., Shapira, B., Elovici, Y., Zaafrany, O., and Kandel, A. 2003. Content-Based Methodology for Anomaly Detection on the Web. Advances in Web Intelligence, Lecture Notes in Computer Science (Vol. 2663, 2003), 113-123.
  20. Liu, B. 2012. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers 2012
  21. Liu S. and T. Forss, Improving Web Content Classification based on Topic and Sentiment Analysis of Text, Proceedings of KDIR2014, the 6th Int. Conf. on Knowledge Discovery and Information Retrieval, October 21-24, 2014 Rome, Italy
  22. Liu S. and T. Forss, Combining N-gram based Similarity Analysis with Sentiment Analysis in Web Content Classification, Proceedings of KDIR2014 Special Session on Text Mining (SSTM), October 21-24, 2014 Rome, Italy
  23. Pang, B., and Lee, L. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1-2), 1-135, July 2008
  24. Qi, X., and Davidson, B. 2007. Web Page Classification: Features and Algorithms. Technical Report LU-CSE07-010, Dept. of Computer Science and Engineering, Lehigh University, Bethlehem, PA, 18015
  25. Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J., Celebi, A., Dimitrov, S., and Zhang, Z. 2004a. MEAD-a platform for multidocument multilingual text summarization. Proeedings of the 4th LREC Conference (Lisbon, Portugal, May 2004)
  26. Rocha A. and S. Goldenstein, Multiclass from Binary: Expanding One-vs-All, One-vs-One and ECOC-based Approaches. IEEE Transactions on Neural Networks and Learning Systems, August 2013
  27. Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., and Kappas, A. 2010. Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12), 2544- 2558.
  28. Thelwall, M., Buckley, K., and Paltoglou, G. 2012. Sentiment strength detection for the social Web. Journal of the American Society for Information Science and Technology, 63(1), 163-173.
  29. Warner, W. and J. Hirschberg. Detecting hate speech on the world wide web, Proceedings of the 2nd Workshop on Language in Social Media (pp. 19-26). Association of Computational Linguistics, June 2012
  30. Yu, H., Han, J., and Chang, K. C.-C. 2004. PEBL: Web Page Classification without Negative Examples. IEEE Trans. on Knowledge and Data Eng. (16:1), 70-81.
  31. Yu H., Jiawei Han, and Kevin Chen-Chuan Chang, PEBL: Web Page Classification without Negative Examples IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 1, January 2004
Download


Paper Citation


in Harvard Style

Liu S. and Forss T. (2015). New Classification Models for Detecting Hate and Violence Web Content . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015) ISBN 978-989-758-158-8, pages 487-495. DOI: 10.5220/0005636704870495


in Bibtex Style

@conference{kdir15,
author={Shuhua Liu and Thomas Forss},
title={New Classification Models for Detecting Hate and Violence Web Content},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)},
year={2015},
pages={487-495},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005636704870495},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)
TI - New Classification Models for Detecting Hate and Violence Web Content
SN - 978-989-758-158-8
AU - Liu S.
AU - Forss T.
PY - 2015
SP - 487
EP - 495
DO - 10.5220/0005636704870495