Authors:
Shuhua Liu
and
Thomas Forss
Affiliation:
Arcada University of Applied Sciences, Finland
Keyword(s):
Web Content Classification, Topic Extraction, Topic Similarity, Sentiment Analysis, Imbalanced Classes,
LDA Topic Models.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Clustering and Classification Methods
;
Computational Intelligence
;
Evolutionary Computing
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Mining Text and Semi-Structured Data
;
Soft Computing
;
Symbolic Systems
;
Web Mining
Abstract:
Today, the presence of harmful and inappropriate content on the web still remains one of the most primary
concerns for web users. Web classification models in the early days are limited by the methods and data
available. In our research we revisit the web classification problem with the application of new methods and
techniques for text content analysis. Our recent studies have indicated the promising potential of combing
topic analysis and sentiment analysis in web content classification. In this paper we further explore new
ways and methods to improve and maximize classification performance, especially to enhance precision and
reduce false positives, thorough examination and handling of the issues with class imbalance, and through
incorporation of LDA topic models.