Weighted Voting of Different Term Weighting Methods for Natural Language Call Routing

Roman Sergienko, Iuliia Kamshilova, Eugene Semenkin, Alexander Schmitt

2016

Abstract

The text classification problem for natural language call routing was considered in the paper. Seven different term weighting methods were applied. As dimensionality reduction methods, the combination of stop-word filtering and stemming and the feature transformation based on term belonging to classes were considered. kNN and SVM-FML were used as classification algorithms. In the paper the idea of voting with different term weighting methods was proposed. The majority vote of seven considered term weighting methods provides significant improvement of classification effectiveness. After that the weighted voting based on optimization with self-adjusting genetic algorithm was investigated. The numerical results showed that weighted voting provides additional improvement of classification effectiveness. Especially significant improvement of the classification effectiveness is observed with the feature transformation based on term belonging to classes that reduces the dimensionality radically; the dimensionality equals number of classes. Therefore, it can be useful for real-time systems as natural language call routing.

References

  1. Akhmedova, S., Semenkin, E., and Sergienko, R. (2014). Automatically generated classifiers for opinion mining with different term weighting schemes. In Informatics in Control, Automation and Robotics (ICINCO), 2014 11th International Conference on, volume 2, pages 845-850. IEEE.
  2. Baharudin, B., Lee, L. H., and Khan, K. (2010). A review of machine learning algorithms for text-documents classification. Journal of advances in information technology, 1(1):4-20.
  3. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2):123-140.
  4. Debole, F. and Sebastiani, F. (2004). Supervised term weighting for automated text categorization. In Text mining and its applications, pages 81-97. Springer.
  5. Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., and Lin, C.-J. (2008). Liblinear: A library for large linear classification. The Journal of Machine Learning Research, 9:1871-1874.
  6. Fox, C. (1989). A stop list for general text. In ACM SIGIR Forum, volume 24, pages 19-21. ACM.
  7. Gasanova, T., Sergienko, R., Akhmedova, S., Semenkin, E., and Minker, W. (2014). Opinion mining and topic categorization with novel term weighting. In Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, ACL 2014, pages 84-89.
  8. Goutte, C. and Gaussier, E. (2005). A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. In Advances in information retrieval, pages 345-359. Springer.
  9. Han, E.-H. S., Karypis, G., and Kumar, V. (2001). Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification . Springer.
  10. Joachims, T. (2002). Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers.
  11. Ko, Y. (2012). A study of term weighting schemes using class information for text classification. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pages 1029-1030. ACM.
  12. Kwon, O.-W. and Lee, J.-H. (2003). Text categorization based on k-nearest neighbor approach for web site classification. Information Processing & Management, 39(1):25-44.
  13. Lan, M., Tan, C. L., Su, J., and Lu, Y. (2009). Supervised and traditional term weighting methods for automatic text categorization. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31(4):721-735.
  14. Lee, C., Jung, S., Kim, S., and Lee, G. G. (2009). Examplebased dialog modeling for practical multi-domain dialog system. Speech Communication, 51(5):466-484.
  15. Morariu, D. I., Vintan, L. N., and Tresp, V. (2005). Metaclassification using svm classifiers for text documents. Intl. Jrnl. of Applied Mathematics and Computer Sciences, 1(1).
  16. Porter, M. F. (2001). Snowball: A language for stemming algorithms.
  17. Salton, G. and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513-523.
  18. Schapire, R. E. and Singer, Y. (2000). Boostexter: A boosting-based system for text categorization. Machine learning, 39(2):135-168.
  19. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1):1-47.
  20. Semenkin, E. and Semenkina, M. (2012). Self-configuring genetic programming algorithm with modified uniform crossover. In 2012 IEEE Congress on Evolutionary Computation.
  21. Sergienko, R., Gasanova, T., Semenkin, E., and Minker, W. (2014). Text categorization methods application for natural language call routing. In Informatics in Control, Automation and Robotics (ICINCO), 2014 11th International Conference on, volume 2, pages 827- 831. IEEE.
  22. Sergienko, R., Muhammad, S., and Minker, W. (2016). A comparative study of text preprocessing approaches for topic detection of user utterances. In Proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC 2016).
  23. Sergienko, R. and Semenkin, E. (2010). Competitive cooperation for strategy adaptation in coevolutionary genetic algorithm for constrained optimization. In 2010 IEEE Congress on Evolutionary Computation.
  24. Shafait, F., Reif, M., Kofler, C., and Breuel, T. M. (2010). Pattern recognition engineering. In RapidMiner Community Meeting and Conference, volume 9. Citeseer.
  25. Soucy, P. and Mineau, G. W. (2005). Beyond tfidf weighting for text categorization in the vector space model. In IJCAI, volume 5, pages 1130-1135.
  26. Suhm, B., Bers, J., McCarthy, D., Freeman, B., Getty, D., Godfrey, K., and Peterson, P. (2002). A comparative study of speech in the call center: Natural language call routing vs. touch-tone menus. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems, pages 283-290. ACM.
  27. Xu, H. and Li, C. (2007). A novel term weighting scheme for automated text categorization. In Intelligent Systems Design and Applications, 2007. ISDA 2007. Seventh International Conference on, pages 759-764. IEEE.
  28. Yang, Y. and Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In ICML, volume 97, pages 412-420.
Download


Paper Citation


in Harvard Style

Sergienko R., Kamshilova I., Semenkin E. and Schmitt A. (2016). Weighted Voting of Different Term Weighting Methods for Natural Language Call Routing . In Proceedings of the 13th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO, ISBN 978-989-758-198-4, pages 38-46. DOI: 10.5220/0005956600380046


in Bibtex Style

@conference{icinco16,
author={Roman Sergienko and Iuliia Kamshilova and Eugene Semenkin and Alexander Schmitt},
title={Weighted Voting of Different Term Weighting Methods for Natural Language Call Routing},
booktitle={Proceedings of the 13th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,},
year={2016},
pages={38-46},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005956600380046},
isbn={978-989-758-198-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 13th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,
TI - Weighted Voting of Different Term Weighting Methods for Natural Language Call Routing
SN - 978-989-758-198-4
AU - Sergienko R.
AU - Kamshilova I.
AU - Semenkin E.
AU - Schmitt A.
PY - 2016
SP - 38
EP - 46
DO - 10.5220/0005956600380046