News Classifications with Labeled LDA

Yiqi Bai, Jie Wang

Abstract

Automatically categorizing news articles with high accuracy is an important task in an automated quick news system. We present two classifiers to classify news articles based on Labeled Latent Dirichlet Allocation, called LLDA-C and SLLDA-C. To verify classification accuracy we compare classification results obtained by the classifiers with those by trained professionals. We show that, through extensive experiments, both LLDA-C and SLLDA-C outperform SVM (Support Vector Machine, our baseline classifier) on precisions, particularly when only a small training dataset is available. SSLDA-C is also much more efficient than SVM. In terms of recalls, we show that LLDA-C is better than SVM. In terms of average Macro-F1 and Micro-F1 scores, we show that LLDA classifiers are superior over SVM. To further explore classifications of news articles we introduce the notion of content complexity, and study how content complexity would affect classifications.

References

  1. Bai, Y., Yang, W., Zhang, H., Wang, J., Jia, M., Tong, R., and Wang, J. (2015). Kwb: An automated quick news system for chinese readers. page 110.
  2. Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. volume 3, pages 993-1022. JMLR. org.
  3. Chen, J., Huang, H., Tian, S., and Qu, Y. (2009). Feature selection for text classification with na ïve bayes. volume 36, pages 5432-5435. Elsevier.
  4. Chen, X., Xia, Y., Jin, P., and Carroll, J. (2015). Dataless text classification with descriptive lda. In TwentyNinth AAAI Conference on Artificial Intelligence.
  5. Darling, W. M. (2011). A theoretical and practical implementation tutorial on topic modeling and gibbs sampling. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, pages 642-647.
  6. Griffiths, T. L. and Steyvers, M. (2004). Finding scientific topics. volume 101, pages 5228-5235. National Acad Sciences.
  7. Lacoste-Julien, S., Sha, F., and Jordan, M. I. (2009). Disclda: Discriminative learning for dimensionality reduction and classification. In Advances in neural information processing systems, pages 897-904.
  8. Lakshminarayanan, B. and Raich, R. (2011). Inference in supervised latent dirichlet allocation. In Machine Learning for Signal Processing (MLSP), 2011 IEEE International Workshop on, pages 1-6. IEEE.
  9. Lee, S., Kim, J., and Myaeng, S.-H. (2015). An extension of topic models for text classification: A term weighting approach. In Big Data and Smart Computing (BigComp), 2015 International Conference on, pages 217- 224. IEEE.
  10. Lin, Y.-S., Jiang, J.-Y., and Lee, S.-J. (2014). A similarity measure for text classification and clustering. Knowledge and Data Engineering, IEEE Transactions on, 26(7):1575-1590.
  11. Mcauliffe, J. D. and Blei, D. M. (2008). Supervised topic models. In Advances in neural information processing systems, pages 121-128.
  12. Ramage, D., Hall, D., Nallapati, R., and Manning, C. D. (2009a). Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1, pages 248-256. Association for Computational Linguistics.
  13. Ramage, D., Heymann, P., Manning, C. D., and GarciaMolina, H. (2009b). Clustering the tagged web. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 54-63. ACM.
  14. Sebastiani, F. (2002). Machine learning in automated text categorization. volume 34, pages 1-47. ACM.
  15. Tong, S. and Koller, D. (2002). Support vector machine active learning with applications to text classification. volume 2, pages 45-66. JMLR. org.
Download


Paper Citation


in Harvard Style

Bai Y. and Wang J. (2015). News Classifications with Labeled LDA . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015) ISBN 978-989-758-158-8, pages 75-83. DOI: 10.5220/0005610600750083


in Bibtex Style

@conference{kdir15,
author={Yiqi Bai and Jie Wang},
title={News Classifications with Labeled LDA},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)},
year={2015},
pages={75-83},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005610600750083},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)
TI - News Classifications with Labeled LDA
SN - 978-989-758-158-8
AU - Bai Y.
AU - Wang J.
PY - 2015
SP - 75
EP - 83
DO - 10.5220/0005610600750083