classifier ensemble, as well as integration of text
based and image based classifiers.
ACKNOWLEDGEMENTS
This research is supported by the Tekes funded
DIGILE D2I research program, Arcada Research
Foundation, and our industry partners.
REFERENCES
Blei, D, Ng, A., and Jordan, M. I. 2003. Latent dirichlet
allocation. Advances in neural information processing
systems. 601-608.
Blei, D. M. and J. D. McAuliffe, Supervised Topic
Models, Neural Information Processing Systems 21,
2007
Blei, D. 2012. Probabilistic topic models.
Communications of the ACM, 55(4):77–84, 2012
Bickel S., M. Bruckner and T. Scheffer, Discriminative
Learning for Differing Training and Test Distributions
(ICML 2007), Proceedings of the 24th International
Conference on Machine Learning, Corvallis, OR,
2007.
Broecheler M., L. Mihalkova, L. Getoor. Probabilistic
similarity logic. In Proc. of Uncertainty in Artificial
Intelligence, 2010
Calado, P., Cristo, M., Goncalves, M. A., de Moura, E. S.,
Ribeiro-Neto, B., and Ziviani, N. 2006. Link-based
similarity measures for the classification of web
documents. Journal of the American Society for
Information Science and Technology (57:2), 208-221.
Chakrabarti, S., B. Dom and P. Indyk. 1998. Enhanced
hypertext categorization using hyperlinks.
Proceedings of ACM SIGMOD 1998.
Chen, Z., Wu, O., Zhu, M., and Hu, W. 2006. A novel web
page filtering system by combining texts and images.
Proceedings of the 2006 IEEE/WIC/ACM
International Conference on Web Intelligence, 732–
735. Washington, DC IEEE Computer Society.
Cohen, W. 2002. Improving a page classifier with anchor
extraction and link analysis. In S. Becker, S. Thrun,
and K. Obermayer (Eds.), Advances in Neural
Information Processing Systems (Volume 15,
Cambridge, MA: MIT Press) 1481–1488.
Djuric, N., Zhou, J., Morris, R., Grbovic, M.,
Radosavljevic, V. and N. Bhamidipati, Hate Speech
Detection with Comment Embeddings. In Proceedings
of the 24th International World Wide Web
Conference, May 2015
Dumais, S. T., and Chen, H. 2000. Hierarchical
classification of web content. Proceedings of
SIGIR'00, 256-263.
Elovici, Y., Shapira, B., Last, M., Zaafrany, O., Friedman,
M., Schneider, M., and Kandel, A. 2005. Content-
based detection of terrorists browsing the web using
an advanced terror detection system (ATDS),
Intelligence and Security Informatics (Lecture Notes
in Computer Science Volume 3495, 2005), 244-255.
Fersini, E. and E. Messina, Web page classification
through probabilistic relational models. Int. Journal of
Pattern Recognition and Artificial Intelligence, 27(04),
2013
Fersini, E., Messina, E., & Pozzi, F. A. (2014). Sentiment
analysis: Bayesian Ensemble Learning. Decision
Support Systems, 68, 26-38.
Getoor L., E. Segal, B. Taskar and D. Koller (2001),
Probabilistic models of text and link structure for
hypertext classification, in Proc. Int. Joint Conf.
Artificial Intelligence, Workshop on Text Learning:
Beyond Supervision, pp. 24-29.
Hammami, M., Chahir, Y., and Chen, L. 2003.
WebGuard: web based adult content detection and
filtering system. Proceedings of the IEEE/WIC Inter.
Conf. on Web Intelligence (Oct. 2003), 574 – 578.
He H. and E. A. Garcia, "Learning from Imbalanced
Data," IEEE Trans. on Knowledge and Data
Engineering, vol. 21, pp. 1263-1284, 2009.
Kwok, I. and Y. Wang, Locate the Hate: Detecting Tweets
against Blacks. In Proceedings of the Twenty-Seventh
AAAI Conference on Artificial Intelligence, June
2013
Last, M., Shapira, B., Elovici, Y., Zaafrany, O., and
Kandel, A. 2003. Content-Based Methodology for
Anomaly Detection on the Web. Advances in Web
Intelligence,
Lecture Notes in Computer Science (Vol.
2663, 2003), 113-123.
Liu, B. 2012. Sentiment Analysis and Opinion Mining.
Synthesis Lectures on Human Language Technologies,
Morgan & Claypool Publishers 2012
Liu S. and T. Forss, Improving Web Content
Classification based on Topic and Sentiment Analysis
of Text, Proceedings of KDIR2014, the 6th Int. Conf.
on Knowledge Discovery and Information Retrieval,
October 21-24, 2014 Rome, Italy
Liu S. and T. Forss, Combining N-gram based Similarity
Analysis with Sentiment Analysis in Web Content
Classification, Proceedings of KDIR2014 Special
Session on Text Mining (SSTM), October 21-24, 2014
Rome, Italy
Pang, B., and Lee, L. 2008. Opinion mining and sentiment
analysis. Foundations and Trends in Information
Retrieval 2(1-2), 1-135, July 2008
Qi, X., and Davidson, B. 2007. Web Page Classification:
Features and Algorithms. Technical Report LU-CSE-
07-010, Dept. of Computer Science and Engineering,
Lehigh University, Bethlehem, PA, 18015
Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J.,
Celebi, A., Dimitrov, S., and Zhang, Z. 2004a.
MEAD-a platform for multidocument multilingual text
summarization. Proeedings of the 4
th
LREC
Conference (Lisbon, Portugal, May 2004)
Rocha A. and S. Goldenstein, Multiclass from Binary:
Expanding One-vs-All, One-vs-One and ECOC-based
Approaches. IEEE Transactions on Neural Networks
and Learning Systems, August 2013