New Classification Models for Detecting Hate and Violence Web Content
Shuhua Liu, Thomas Forss
2015
Abstract
Today, the presence of harmful and inappropriate content on the web still remains one of the most primary concerns for web users. Web classification models in the early days are limited by the methods and data available. In our research we revisit the web classification problem with the application of new methods and techniques for text content analysis. Our recent studies have indicated the promising potential of combing topic analysis and sentiment analysis in web content classification. In this paper we further explore new ways and methods to improve and maximize classification performance, especially to enhance precision and reduce false positives, thorough examination and handling of the issues with class imbalance, and through incorporation of LDA topic models.
References
- Blei, D, Ng, A., and Jordan, M. I. 2003. Latent dirichlet allocation. Advances in neural information processing systems. 601-608.
- Blei, D. M. and J. D. McAuliffe, Supervised Topic Models, Neural Information Processing Systems 21, 2007
- Blei, D. 2012. Probabilistic topic models. Communications of the ACM, 55(4):77-84, 2012
- Bickel S., M. Bruckner and T. Scheffer, Discriminative Learning for Differing Training and Test Distributions (ICML 2007), Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, 2007.
- Broecheler M., L. Mihalkova, L. Getoor. Probabilistic similarity logic. In Proc. of Uncertainty in Artificial Intelligence, 2010
- Calado, P., Cristo, M., Goncalves, M. A., de Moura, E. S., Ribeiro-Neto, B., and Ziviani, N. 2006. Link-based similarity measures for the classification of web documents. Journal of the American Society for Information Science and Technology (57:2), 208-221.
- Chakrabarti, S., B. Dom and P. Indyk. 1998. Enhanced hypertext categorization using hyperlinks. Proceedings of ACM SIGMOD 1998.
- Chen, Z., Wu, O., Zhu, M., and Hu, W. 2006. A novel web page filtering system by combining texts and images. Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, 732- 735. Washington, DC IEEE Computer Society.
- Cohen, W. 2002. Improving a page classifier with anchor extraction and link analysis. In S. Becker, S. Thrun, and K. Obermayer (Eds.), Advances in Neural Information Processing Systems (Volume 15, Cambridge, MA: MIT Press) 1481-1488.
- Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V. and N. Bhamidipati, Hate Speech Detection with Comment Embeddings. In Proceedings of the 24th International World Wide Web Conference, May 2015
- Dumais, S. T., and Chen, H. 2000. Hierarchical classification of web content. Proceedings of SIGIR'00, 256-263.
- Elovici, Y., Shapira, B., Last, M., Zaafrany, O., Friedman, M., Schneider, M., and Kandel, A. 2005. Contentbased detection of terrorists browsing the web using an advanced terror detection system (ATDS), Intelligence and Security Informatics (Lecture Notes in Computer Science Volume 3495, 2005), 244-255.
- Fersini, E. and E. Messina, Web page classification through probabilistic relational models. Int. Journal of Pattern Recognition and Artificial Intelligence, 27(04), 2013
- Fersini, E., Messina, E., & Pozzi, F. A. (2014). Sentiment analysis: Bayesian Ensemble Learning. Decision Support Systems, 68, 26-38.
- Getoor L., E. Segal, B. Taskar and D. Koller (2001), Probabilistic models of text and link structure for hypertext classification, in Proc. Int. Joint Conf. Artificial Intelligence, Workshop on Text Learning: Beyond Supervision, pp. 24-29.
- Hammami, M., Chahir, Y., and Chen, L. 2003. WebGuard: web based adult content detection and filtering system. Proceedings of the IEEE/WIC Inter. Conf. on Web Intelligence (Oct. 2003), 574 - 578.
- He H. and E. A. Garcia, "Learning from Imbalanced Data," IEEE Trans. on Knowledge and Data Engineering, vol. 21, pp. 1263-1284, 2009.
- Kwok, I. and Y. Wang, Locate the Hate: Detecting Tweets against Blacks. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, June 2013
- Last, M., Shapira, B., Elovici, Y., Zaafrany, O., and Kandel, A. 2003. Content-Based Methodology for Anomaly Detection on the Web. Advances in Web Intelligence, Lecture Notes in Computer Science (Vol. 2663, 2003), 113-123.
- Liu, B. 2012. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers 2012
- Liu S. and T. Forss, Improving Web Content Classification based on Topic and Sentiment Analysis of Text, Proceedings of KDIR2014, the 6th Int. Conf. on Knowledge Discovery and Information Retrieval, October 21-24, 2014 Rome, Italy
- Liu S. and T. Forss, Combining N-gram based Similarity Analysis with Sentiment Analysis in Web Content Classification, Proceedings of KDIR2014 Special Session on Text Mining (SSTM), October 21-24, 2014 Rome, Italy
- Pang, B., and Lee, L. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1-2), 1-135, July 2008
- Qi, X., and Davidson, B. 2007. Web Page Classification: Features and Algorithms. Technical Report LU-CSE07-010, Dept. of Computer Science and Engineering, Lehigh University, Bethlehem, PA, 18015
- Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J., Celebi, A., Dimitrov, S., and Zhang, Z. 2004a. MEAD-a platform for multidocument multilingual text summarization. Proeedings of the 4th LREC Conference (Lisbon, Portugal, May 2004)
- Rocha A. and S. Goldenstein, Multiclass from Binary: Expanding One-vs-All, One-vs-One and ECOC-based Approaches. IEEE Transactions on Neural Networks and Learning Systems, August 2013
- Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., and Kappas, A. 2010. Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12), 2544- 2558.
- Thelwall, M., Buckley, K., and Paltoglou, G. 2012. Sentiment strength detection for the social Web. Journal of the American Society for Information Science and Technology, 63(1), 163-173.
- Warner, W. and J. Hirschberg. Detecting hate speech on the world wide web, Proceedings of the 2nd Workshop on Language in Social Media (pp. 19-26). Association of Computational Linguistics, June 2012
- Yu, H., Han, J., and Chang, K. C.-C. 2004. PEBL: Web Page Classification without Negative Examples. IEEE Trans. on Knowledge and Data Eng. (16:1), 70-81.
- Yu H., Jiawei Han, and Kevin Chen-Chuan Chang, PEBL: Web Page Classification without Negative Examples IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 1, January 2004
Paper Citation
in Harvard Style
Liu S. and Forss T. (2015). New Classification Models for Detecting Hate and Violence Web Content . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015) ISBN 978-989-758-158-8, pages 487-495. DOI: 10.5220/0005636704870495
in Bibtex Style
@conference{kdir15,
author={Shuhua Liu and Thomas Forss},
title={New Classification Models for Detecting Hate and Violence Web Content},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)},
year={2015},
pages={487-495},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005636704870495},
isbn={978-989-758-158-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)
TI - New Classification Models for Detecting Hate and Violence Web Content
SN - 978-989-758-158-8
AU - Liu S.
AU - Forss T.
PY - 2015
SP - 487
EP - 495
DO - 10.5220/0005636704870495