Authors:
Giacomo Domeniconi
;
Gianluca Moro
;
Andrea Pagliarani
and
Roberto Pasolini
Affiliation:
University of Bologna, Italy
Keyword(s):
Transfer Learning, Language Heterogeneity, Sentiment Analysis, Cross-Domain, Big Data.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Computational Intelligence
;
Evolutionary Computing
;
Information Extraction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Mining Text and Semi-Structured Data
;
Soft Computing
;
Symbolic Systems
Abstract:
Cross-domain sentiment classification consists in distinguishing positive and negative reviews of a target domain by using knowledge extracted and transferred from a heterogeneous source domain.
Cross-domain solutions aim at overcoming the costly pre-classification of each new training set by human experts.
Despite the potential business relevance of this research thread, the existing ad hoc solutions are still not scalable with real large text sets.
Scalable Deep Learning techniques have been effectively applied to in-domain text classification, by training and categorising documents belonging to the same domain.
This work analyses the cross-domain efficacy of a well-known unsupervised Deep Learning approach for text mining, called Paragraph Vector, comparing its performance with a method based on Markov Chain developed ad hoc for cross-domain sentiment classification.
The experiments show that, once enough data is available for training, Paragraph Vector achieves accuracy equiv
alent to Markov Chain both in-domain and cross-domain, despite no explicit transfer learning capability.
The outcome suggests that combining Deep Learning with transfer learning techniques could be a breakthrough of ad hoc cross-domain sentiment solutions in big data scenarios.
This opinion is confirmed by a really simple multi-source experiment we tried to improve transfer learning, which increases the accuracy of cross-domain sentiment classification.
(More)