A TEXT CLASSIFICATION METHOD BASED ON LATENT TOPICS

Yanshan Wang, In-Chan Choi

Abstract

Latent Dirichlet Allocation (LDA) is a generative model, which exhibits superiority over other topic modelling algorithms on latent topics of text data. Indexing by LDA is a new method in the context of LDA to provide a new definition of document probability vectors that can be applied as feature vectors. In this paper, we propose a joint process of text classification that combines DBSCAN, indexing with LDA and Support Vector Machine (SVM). DBSCAN algorithm is applied as a pre-processing for LDA to determine the number of topics, and then LDA document indexing features are employed for text classifier SVM.

References

  1. Aizerman, M. A., Braverman, E. M., and Rozono'er, L. I., 1964. Theoretical foundations of the potential function method in pattern recognition learning. Automat. Rem. Control, 25, pp.824-837.
  2. Blei, D. M., Ng, A. Y., and Jordan, M. I., 2003. Latent dirichlet allocation. Journal of Machine Learning Research, 3, pp.993-1022.
  3. Choi, I.C., and Lee, J. S., 2010. Document indexing by latent dirichlet allocation. Proceedings of The 2010 International Conference on Data Mining, pp.409- 414.
  4. Cortes, C., & Vapnik, V., 1995. Support vector networks. Machine Learning, 20(3), pp.273-297.
  5. Ester, M., Kriegel, H. P., Sander, J., and Xu, X., 1996. A density based algorithm for discovering clusters in large spatial databases. Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining, pp.226-231.
  6. Joachims, T., 1988. Text categorization with support vector machines: Learning with many relevant features. Proceedings of the 10th European conference on machine learning, pp.137-142.
  7. Hart, P., 1967. Nearest neighbor pattern classification. IEEE Transaction on Information Theory, 13(1), pp.21-27.
  8. Van Rijsbergen, C. J., 1979. Information Retrieval, Buttersworth. London, 2th edition.
  9. Vapnik, V., 1995. The nature of statistical learning theory, Springer. New York.
  10. Wiener, E., Pedersen, J. O., Weigend, A. S., 1995. A Neural Network Approach to Topic Spotting. SDAIR, pp.317-332.
Download


Paper Citation


in Harvard Style

Wang Y. and Choi I. (2012). A TEXT CLASSIFICATION METHOD BASED ON LATENT TOPICS . In Proceedings of the 1st International Conference on Operations Research and Enterprise Systems - Volume 1: ICORES, ISBN 978-989-8425-97-3, pages 212-214. DOI: 10.5220/0003740902120214


in Bibtex Style

@conference{icores12,
author={Yanshan Wang and In-Chan Choi},
title={A TEXT CLASSIFICATION METHOD BASED ON LATENT TOPICS},
booktitle={Proceedings of the 1st International Conference on Operations Research and Enterprise Systems - Volume 1: ICORES,},
year={2012},
pages={212-214},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003740902120214},
isbn={978-989-8425-97-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Operations Research and Enterprise Systems - Volume 1: ICORES,
TI - A TEXT CLASSIFICATION METHOD BASED ON LATENT TOPICS
SN - 978-989-8425-97-3
AU - Wang Y.
AU - Choi I.
PY - 2012
SP - 212
EP - 214
DO - 10.5220/0003740902120214