Experiences with the LVQ Algorithm in Multilabel Text Categorization

A. Montejo-Ráez, M. T. Martín-Valdivia, L. A. Ureña-López

Abstract

Text Categorization is an important information processing task. This paper presents a neural approach to a text classifier based on the Learning Vector Quantization (LVQ) algorithm. We focus on multilabel multiclass text categorization. Experiments were carried out using the High Energy Physics (HEP) text collection. The HEP collection is an highly unbalanced collection. The results obtained are very promising and show that our neural approach based on the LVQ algorithm behaves robustly over different parameters.

References

  1. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34 (2002) 1-47
  2. Montejo-Ráez, A.: Automatic Text Categorization of Documents in the High Energy Physics Domain. PhD thesis, University of Granada (2006)
  3. Montejo-Ráez, A., Uren˜a López, L.: Binary classifiers versus adaboost for labeling of digital documents. Sociedad Espan˜ola para el Procesamiento del Lenguaje Natural (2006) 319-326
  4. Montejo-Ráez, A., Uren˜a López, L.: Selection strategies for multi-label text categorization. Lecture Notes in Artificial Intelligence (2006) 585-592
  5. Vassilevskaya, L.A.: An approach to automatic indexing of scientific publications in high energy physics for database spires-hep. Master's thesis, Fachhochsule Potsdam, Institut fr Information und Dokumentation (2002)
  6. Montejo-Ráez, A., Steinberger, R., Uren˜a López, L.A.: Adaptive selection of base classifiers in one-against-all learning for large multi-labeled collections. In et al., V.J.L., ed.: Advances in Natural Language Processing: 4th International Conference, EsTAL 2004. Number 3230 in Lectures notes in artifial intelligence, Springer (2004) 1-12
  7. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In Nédellec, C., Rouveirol, C., eds.: Proceedings of ECML-98, 10th European Conference on Machine Learning. Number 1398, Chemnitz, DE, Springer Verlag, Heidelberg, DE (1998) 137-142
  8. Lewis, D.D., Schapire, R.E., Callan, J.P., Papka, R.: Training algorithms for linear text classifiers. In Frei, H.P., Harman, D., Schäuble, P., Wilkinson, R., eds.: Proceedings of SIGIR-96, 19th ACM International Conference on Research and Development in Information Retrieval, Zürich, CH, ACM Press, New York, US (1996) 298-306
  9. Yang, Y.: A study on thresholding strategies for text categorization. In Croft, W.B., Harper, D.J., Kraft, D.H., Zobel, J., eds.: Proceedings of SIGIR-01, 24th ACM International Conference on Research and Development in Information Retrieval, New Orleans, US, ACM Press, New York, US (2001) 137-145 Describes RCut, Scut, etc.
  10. Schapire, R.E., Singer, Y.: BoosTexter: A boosting-based system for text categorization. Machine Learning 39 (2000) 135-168
  11. Kohonen, T.: Self-organization and associative memory. 2 edn. Springer-Verlag (1995)
  12. Martín-Valdivia, M., García-Vega, M., García-Cumbreras, M., Uren˜a López, L.: Text categorization using the learning vector quantization algorithm. In: Proceedings of Intelligent Information Systems. New Trends in Intelligent Information Processing and Web Mining (IIS:IIPWM-04), Zakopane, Poland, Springer-Verlag (2004)
Download


Paper Citation


in Harvard Style

Montejo-Ráez A., T. Martín-Valdivia M. and A. Ureña-López L. (2007). Experiences with the LVQ Algorithm in Multilabel Text Categorization . In Proceedings of the 4th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2007) ISBN 978-972-8865-97-9, pages 213-221. DOI: 10.5220/0002419302130221


in Bibtex Style

@conference{nlpcs07,
author={A. Montejo-Ráez and M. T. Martín-Valdivia and L. A. Ureña-López},
title={Experiences with the LVQ Algorithm in Multilabel Text Categorization},
booktitle={Proceedings of the 4th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2007)},
year={2007},
pages={213-221},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002419302130221},
isbn={978-972-8865-97-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 4th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2007)
TI - Experiences with the LVQ Algorithm in Multilabel Text Categorization
SN - 978-972-8865-97-9
AU - Montejo-Ráez A.
AU - T. Martín-Valdivia M.
AU - A. Ureña-López L.
PY - 2007
SP - 213
EP - 221
DO - 10.5220/0002419302130221