Infinite Topic Modelling for Trend Tracking - Hierarchical Dirichlet Process Approaches with Wikipedia Semantic based Method

Yishu Miao, Chunping Li, Hui Wang, Lu Zhang

Abstract

The current affairs people concern closely vary in different periods and the evolution of trends corresponds to the reports of medias. This paper considers tracking trends by incorporating non-parametric Bayesian approaches with temporal information and presents two topic modelling methods. One utilizes an infinite temporal topic model which obtains the topic distribution over time by placing a time prior when discovering topics dynamically. In order to better organize the event trend, we present another progressive superposed topic model which simulates the whole evolutionary processes of topics, including new topics’ generation, stable topics’ evolution and old topics’ vanishment, via a series of superposed topics distribution generated by hierarchical Dirichlet process. Both of the two approaches aim at solving the real-world task while avoiding Markov assumption and breaking the number limitation of topics. Meanwhile, we employ Wikipedia based semantic background knowledge to improve the discovered topics and their readability. The experiments are carried out on the corpus of BBC news about American Forum. The results demonstrate better organized topics, evolutionary processes of topics over time and model effectiveness.

References

  1. Ahmed, A. and Xing, E. P. (2010). Timeline: A dynamic hierarchical dirichlet process model for recovering birth/death and evolution of topics in text stream. In UAI 7810.
  2. AlSumait, L., Barbara, D., and Domeniconi, C. (2008). On-line lda: Adaptive topic models for mining text streams with applications to topic detection and tracking. In ICDM 7808, pages 3-12.
  3. Balasubramanyan, R., Cohen, W. W., and Hurst, M. (2009). Modeling corpora of timestamped documents using semisupervised nonparametric topic models. In NIPS.
  4. Blei, D., Ng, A., Jordan, M., and Lafferty, J. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(993-1022).
  5. Ferguson, T. (1973). Bayesian analysis of some nonparametric problems. Annals of Statistics, 1:209-230.
  6. Heinrich, G. (2011). ”infinite lda”-implementing the hdp with minimum code complexity. Tecnical Note.
  7. Hofmann, T. (1999). Probabilistic latent semantic indexing. In SIGIR.
  8. Hong, L., Yin, D., Guo, J., and Davison, B. D. (2011). Tracking trends: Incorporating term volume into temporal topic models. In KDD.
  9. Kataria, S. S., Kumar, K. S., Rastogi, R., Sen, P., and Sengamedu, S. H. (2011). Entity disambiguation with hierarchical topic models. In KDD.
  10. Landauer, T. K.and Dumais, S. T. (1997). A solution to plato's problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(211-240).
  11. Lau, J. H., Grieser, K., Newman, D., and Baldwin, T. (2011). Automatic labelling of topic models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pages 1536-1545.
  12. Newman, D., Chemudugunta, C., and Smyth, P. (2006). Statistical entitytopic models. In KDD.
  13. Ni, X., Sun, J.-T., Hu, J., and Chen, Z. (2009). Mining multilingual topics from wikipedia. In WWW.
  14. Ren, L., Dunson, D. B., and Carin, L. (2008). The dynamic hierarchical dirichlet process. In ICML.
  15. Sudderth, E. B. (2006). Graphical models for visual object recognition and tracking. Doctoral Thesis, Massachusetts Institute of Technology.
  16. Teh, Y., Jordan, M., Beal, M., and Blei, D. (2006). Hierarchical dirichlet processes. Journal of the American Statistical Association, 101(1566-1581).
  17. Wang, C., Blei, D. M., and Heckerman, D. (2008). Continuous time dynamic topic models. In UAI 7808, pages 579-586.
  18. XueruiWang and McCallum, A. (2006). Topics over time: a non-markov continuous-time model of topical trends. In KDD.
  19. Zhang, J., Song, Y., Zhang, C., and Liu, S. (2010). Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora. In KDD.
Download


Paper Citation


in Harvard Style

Miao Y., Li C., Wang H. and Zhang L. (2012). Infinite Topic Modelling for Trend Tracking - Hierarchical Dirichlet Process Approaches with Wikipedia Semantic based Method . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012) ISBN 978-989-8565-29-7, pages 35-44. DOI: 10.5220/0004133300350044


in Bibtex Style

@conference{kdir12,
author={Yishu Miao and Chunping Li and Hui Wang and Lu Zhang},
title={Infinite Topic Modelling for Trend Tracking - Hierarchical Dirichlet Process Approaches with Wikipedia Semantic based Method},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012)},
year={2012},
pages={35-44},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004133300350044},
isbn={978-989-8565-29-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012)
TI - Infinite Topic Modelling for Trend Tracking - Hierarchical Dirichlet Process Approaches with Wikipedia Semantic based Method
SN - 978-989-8565-29-7
AU - Miao Y.
AU - Li C.
AU - Wang H.
AU - Zhang L.
PY - 2012
SP - 35
EP - 44
DO - 10.5220/0004133300350044