for pattern recognition
2
(version 4). The value of the
C parameter in the SVM was set to one.
5 RESULTS
We compare the accuracy of the proposed approach
with respect to the methods described in Section 2
in the movie review sentiment polarity classification
problem using the dataset described in the previous
section. The results shown in Table 1 reveal that
the proposed approach with the SVM outperforms the
previous state-of-the-art, by achieving an average ac-
curacy of 96.9%. Regarding the K-NN classifier, the
best accuracy (95.55%) was obtained with K = 7, al-
though the result for K = 1 (95.45%) is almost identi-
cal and also outperforms the previous state-of-the-art.
Finally, we also explored the random prototype se-
lection method proposed by Duin et al. (Duin and
Paclk, 2006); the results are shown in Figure 2. No-
tice that using only 10% of the prototypes for training
(180) we still get an accuracy of 95.0%, better than
the previous state-of-the-art.
Figure 2: Average accuracy in the dissimilarity space as a
function of the number of randomly prototypes.
6 CONCLUSIONS
In this paper, we have presented a new approach
for text sentiment analysis, based on an information-
theoretic dissimilarity measure, which is used to build
dissimilarity representations on which SVM and K-
NN classifiers are applied. The aim of our proposal
was mainly to show that this type of approach allows
achieving state-of-the-art results in hard text classifi-
cation problems, while involving virtually no human
intervention and no text preprocessing. We have il-
lustrated the approach on a benchmark dataset, where
the task is to perform movie review sentiment polar-
ity categorization. Our methods outperform previous
state-of-the-art results, although it is drastically sim-
pler and requires much less human intervention.
2
www.prtools.org/index.html
REFERENCES
Duin, R. P. W. and Paclk, P. (2006). Prototype selection
for dissimilarity-based classifiers. Pattern Recogni-
tion, 39:189–208.
Joachims, T. (1998). Text categorization with support vec-
tor machines: Learning with many relevant features.
Matsumoto, S., Takamura, H., and Okumura, M. (2005).
Sentiment classification using word sub-sequences
and dependency sub-trees. In Proceedings of the 9th
Pacific-Asia conference on Advances in Knowledge
Discovery and Data Mining, PAKDD’05, pages 301–
311, Berlin, Heidelberg. Springer-Verlag.
Pang, B. and Lee, L. (2004). A sentimental education:
Sentiment analysis using subjectivity summarization
based on minimum cuts. In In Proceedings of theACL,
pages 271–278.
Pang, B., Lee, L., and Vaithyanathan, S. (2002). Thumbs
up? sentiment classification using machine learning
techniques. In IN PROCEEDINGS OF EMNLP, pages
79–86.
Pekalska, E. and Duin, R. P. W. (2002). Dissimilarity repre-
sentations allow for building good classifiers. Pattern
Recognition Letters, 23(8):943–956.
Pekalska, E., Paclk, P., and Duin, R. P. W. (2001). A gen-
eralized kernel approach to dissimilarity-based clas-
sification. Journal of Machine Learning Research,
2:175–211.
Pereira Coutinho, D. and Figueiredo, M. (2005). Informa-
tion theoretic text classification using the Ziv-Merhav
method. 2nd Iberian Conference on Pattern Recogni-
tion and Image Analysis – IbPRIA’2005.
Salomon, D. and Motta, G.(2010). Handbook of Data Com-
pression (5. ed.). Springer.
Vinodhini, G. and Chandrasekaran, R. (2012). Sentiment
analysis and opinion mining: A survey. International
Journal of Advanced Research in Computer Science
and Software Engineering.
Whitelaw, C., Garg, N., and Argamon, S. (2005). Using ap-
praisal taxonomies for sentiment analysis. In Proceed-
ings of CIKM-05, the ACM SIGIR Conference on In-
formation and Knowledge Management, Bremen, DE.
Yessenalina, A., Yue, Y., and Cardie, C. (2010). Multi-level
structured models for document-level sentiment clas-
sification. In In Proceedings of the Conference on
Empirical Methods in Natural Language Processing
(EMNLP.
Ziv, J. and Lempel, A. (1977). A universal algorithm for
sequential data compression. IEEE Transactions on
Information Theory, 23(3):337–343.
Ziv, J. and Lempel, A. (1978). Compression of individ-
ual sequences via variable-rate coding. IEEE Trans-
actions on Information Theory, 24(5):530–536.
Ziv, J. and Merhav, N. (1993). A measure of relative en-
tropy between individual sequences with application
to universal classification. IEEE Transactions on In-
formation Theory, 39:1270–1279.
ICPRAM2013-InternationalConferenceonPatternRecognitionApplicationsandMethods
580