A TEXT CLASSIFICATION METHOD BASED ON LATENT TOPICS
Yanshan Wang, In-Chan Choi
2012
Abstract
Latent Dirichlet Allocation (LDA) is a generative model, which exhibits superiority over other topic modelling algorithms on latent topics of text data. Indexing by LDA is a new method in the context of LDA to provide a new definition of document probability vectors that can be applied as feature vectors. In this paper, we propose a joint process of text classification that combines DBSCAN, indexing with LDA and Support Vector Machine (SVM). DBSCAN algorithm is applied as a pre-processing for LDA to determine the number of topics, and then LDA document indexing features are employed for text classifier SVM.
References
- Aizerman, M. A., Braverman, E. M., and Rozono'er, L. I., 1964. Theoretical foundations of the potential function method in pattern recognition learning. Automat. Rem. Control, 25, pp.824-837.
- Blei, D. M., Ng, A. Y., and Jordan, M. I., 2003. Latent dirichlet allocation. Journal of Machine Learning Research, 3, pp.993-1022.
- Choi, I.C., and Lee, J. S., 2010. Document indexing by latent dirichlet allocation. Proceedings of The 2010 International Conference on Data Mining, pp.409- 414.
- Cortes, C., & Vapnik, V., 1995. Support vector networks. Machine Learning, 20(3), pp.273-297.
- Ester, M., Kriegel, H. P., Sander, J., and Xu, X., 1996. A density based algorithm for discovering clusters in large spatial databases. Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining, pp.226-231.
- Joachims, T., 1988. Text categorization with support vector machines: Learning with many relevant features. Proceedings of the 10th European conference on machine learning, pp.137-142.
- Hart, P., 1967. Nearest neighbor pattern classification. IEEE Transaction on Information Theory, 13(1), pp.21-27.
- Van Rijsbergen, C. J., 1979. Information Retrieval, Buttersworth. London, 2th edition.
- Vapnik, V., 1995. The nature of statistical learning theory, Springer. New York.
- Wiener, E., Pedersen, J. O., Weigend, A. S., 1995. A Neural Network Approach to Topic Spotting. SDAIR, pp.317-332.
Paper Citation
in Harvard Style
Wang Y. and Choi I. (2012). A TEXT CLASSIFICATION METHOD BASED ON LATENT TOPICS . In Proceedings of the 1st International Conference on Operations Research and Enterprise Systems - Volume 1: ICORES, ISBN 978-989-8425-97-3, pages 212-214. DOI: 10.5220/0003740902120214
in Bibtex Style
@conference{icores12,
author={Yanshan Wang and In-Chan Choi},
title={A TEXT CLASSIFICATION METHOD BASED ON LATENT TOPICS},
booktitle={Proceedings of the 1st International Conference on Operations Research and Enterprise Systems - Volume 1: ICORES,},
year={2012},
pages={212-214},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003740902120214},
isbn={978-989-8425-97-3},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 1st International Conference on Operations Research and Enterprise Systems - Volume 1: ICORES,
TI - A TEXT CLASSIFICATION METHOD BASED ON LATENT TOPICS
SN - 978-989-8425-97-3
AU - Wang Y.
AU - Choi I.
PY - 2012
SP - 212
EP - 214
DO - 10.5220/0003740902120214