word filtering and stemming and the feature transfor-
mation based on term belonging to classes were con-
sidered. k-NN and SVM-FML were used as classifi-
cation algorithms.
In the paper the idea of voting with different
term weighting methods was proposed. The major-
ity vote of seven considered term weighting meth-
ods provides significant improvement of classification
effectiveness. After that the weighted voting based
on optimization with self-adjusting genetic algorithm
was investigated. The numerical results showed that
weighted voting provides additional improvement of
classification effectiveness. Especially significant im-
provement of the classification effectiveness is ob-
served with the feature transformation based on term
belonging to classes that reduces the dimensional-
ity radically; the dimensionality equals number of
classes.
REFERENCES
Akhmedova, S., Semenkin, E., and Sergienko, R. (2014).
Automatically generated classifiers for opinion min-
ing with different term weighting schemes. In
Informatics in Control, Automation and Robotics
(ICINCO), 2014 11th International Conference on,
volume 2, pages 845–850. IEEE.
Baharudin, B., Lee, L. H., and Khan, K. (2010). A review of
machine learning algorithms for text-documents clas-
sification. Journal of advances in information tech-
nology, 1(1):4–20.
Breiman, L. (1996). Bagging predictors. Machine learning,
24(2):123–140.
Debole, F. and Sebastiani, F. (2004). Supervised term
weighting for automated text categorization. In Text
mining and its applications, pages 81–97. Springer.
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., and
Lin, C.-J. (2008). Liblinear: A library for large lin-
ear classification. The Journal of Machine Learning
Research, 9:1871–1874.
Fox, C. (1989). A stop list for general text. In ACM SIGIR
Forum, volume 24, pages 19–21. ACM.
Gasanova, T., Sergienko, R., Akhmedova, S., Semenkin, E.,
and Minker, W. (2014). Opinion mining and topic
categorization with novel term weighting. In Pro-
ceedings of the 5th Workshop on Computational Ap-
proaches to Subjectivity, Sentiment and Social Media
Analysis, ACL 2014, pages 84–89.
Goutte, C. and Gaussier, E. (2005). A probabilistic interpre-
tation of precision, recall and f-score, with implication
for evaluation. In Advances in information retrieval,
pages 345–359. Springer.
Han, E.-H. S., Karypis, G., and Kumar, V. (2001). Text Cat-
egorization Using Weight Adjusted k-Nearest Neigh-
bor Classification. Springer.
Joachims, T. (2002). Learning to Classify Text Using Sup-
port Vector Machines: Methods, Theory and Algo-
rithms. Kluwer Academic Publishers.
Ko, Y. (2012). A study of term weighting schemes us-
ing class information for text classification. In Pro-
ceedings of the 35th international ACM SIGIR con-
ference on Research and development in information
retrieval, pages 1029–1030. ACM.
Kwon, O.-W. and Lee, J.-H. (2003). Text categorization
based on k-nearest neighbor approach for web site
classification. Information Processing & Manage-
ment, 39(1):25–44.
Lan, M., Tan, C. L., Su, J., and Lu, Y. (2009). Supervised
and traditional term weighting methods for automatic
text categorization. Pattern Analysis and Machine In-
telligence, IEEE Transactions on, 31(4):721–735.
Lee, C., Jung, S., Kim, S., and Lee, G. G. (2009). Example-
based dialog modeling for practical multi-domain di-
alog system. Speech Communication, 51(5):466–484.
Morariu, D. I., Vintan, L. N., and Tresp, V. (2005). Meta-
classification using svm classifiers for text documents.
Intl. Jrnl. of Applied Mathematics and Computer Sci-
ences, 1(1).
Porter, M. F. (2001). Snowball: A language for stemming
algorithms.
Salton, G. and Buckley, C. (1988). Term-weighting ap-
proaches in automatic text retrieval. Information pro-
cessing & management, 24(5):513–523.
Schapire, R. E. and Singer, Y. (2000). Boostexter: A
boosting-based system for text categorization. Ma-
chine learning, 39(2):135–168.
Sebastiani, F. (2002). Machine learning in automated
text categorization. ACM computing surveys (CSUR),
34(1):1–47.
Semenkin, E. and Semenkina, M. (2012). Self-configuring
genetic programming algorithm with modified uni-
form crossover. In 2012 IEEE Congress on Evolu-
tionary Computation.
Sergienko, R., Gasanova, T., Semenkin, E., and Minker, W.
(2014). Text categorization methods application for
natural language call routing. In Informatics in Con-
trol, Automation and Robotics (ICINCO), 2014 11th
International Conference on, volume 2, pages 827–
831. IEEE.
Sergienko, R., Muhammad, S., and Minker, W. (2016). A
comparative study of text preprocessing approaches
for topic detection of user utterances. In Proceed-
ings of the 10th edition of the Language Resources and
Evaluation Conference (LREC 2016).
Sergienko, R. and Semenkin, E. (2010). Competitive coop-
eration for strategy adaptation in coevolutionary ge-
netic algorithm for constrained optimization. In 2010
IEEE Congress on Evolutionary Computation.
Shafait, F., Reif, M., Kofler, C., and Breuel, T. M. (2010).
Pattern recognition engineering. In RapidMiner Com-
munity Meeting and Conference, volume 9. Citeseer.
Soucy, P. and Mineau, G. W. (2005). Beyond tfidf weighting
for text categorization in the vector space model. In
IJCAI, volume 5, pages 1130–1135.
Weighted Voting of Different Term Weighting Methods for Natural Language Call Routing
45