Twitter Topic Modeling for Breaking News Detection

Henning M. Wold, Linn Vikre, Jon Atle Gulla, Özlem Özgöbek, Xiaomeng Su


Social media platforms like Twitter have become increasingly popular for the dissemination and discussion of current events. Twitter makes it possible for people to share stories that they find interesting with their followers, and write updates on what is happening around them. In this paper we attempt to use topic models of tweets in real time to identify breaking news. Two different methods, Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Process (HDP) are tested with each tweet in the training corpus as a document by itself, as well as with all the tweets of a unique user regarded as one document. This second approach emulates Author-Topic modeling (AT-modeling). The evaluation of methods relies on manual scoring of the accuracy of the modeling by volunteered participants. The experiments indicate topic modeling on tweets in real-time is not suitable for detecting breaking news by itself, but may be useful in analyzing and describing news tweets.


  1. Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4):77-84.
  2. Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993-1022.
  3. Gulla, J. A., Fidjestøl, A. D., Su, X., and Castejon, H. (2014). Implicit user profiling in news recommender systems.
  4. Hong, L. and Davison, B. D. (2010). Empirical study of topic modeling in twitter. In Proceedings of the First Workshop on Social Media Analytics, pages 80-88. ACM.
  5. Hu, M., Liu, S., Wei, F., Wu, Y., Stasko, J., and Ma, K.-L. (2012). Breaking news on twitter. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 2751-2754. ACM.
  6. Ingvaldsen, J. E., Gulla, J. A., and Ozgöbek, O. (2015). User controlled news recommendations. In Proceedings of the Joint Workshop on Interfaces and Human Decision Making for Recommender Systems colocated with ACM Conference on Recommender Systems (RecSys 2015).
  7. Kwak, H., Lee, C., Park, H., and Moon, S. (2010). What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, pages 591-600. ACM.
  8. Mendoza, M., Poblete, B., and Castillo, C. (2010). Twitter under crisis: Can we trust what we rt? In Proceedings of the first workshop on social media analytics , pages 71-79. ACM.
  9. Meyer, B., Bryan, K., Santos, Y., and Kim, B. (2011). Twitterreporter: Breaking news detection and visualization through the geo-tagged twitter network. In CATA, pages 84-89.
  10. Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., and Steyvers, M. (2010). Learning author-topic models from text corpora. ACM Transactions on Information Systems (TOIS), 28(1):4.
  11. Rosen-Zvi, M., Griffiths, T., Steyvers, M., and Smyth, P. (2004). The author-topic model for authors and documents. In Proceedings of the 20th conference on Uncertainty in artificial intelligence , pages 487-494. AUAI Press.
  12. Sakaki, T., Okazaki, M., and Matsuo, Y. (2010). Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World wide web, pages 851-860. ACM.
  13. Sandhaus, E. (2008). The newyork times annotated corpus. Linguistic Data Consortium, Philadelphia, 6(12):e26752.
  14. Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. (2006). Hierarchical dirichlet processes. Journal of the american statistical association, 101(476).
  15. Titov, I. and McDonald, R. (2008). Modeling online reviews with multi-grain topic models. In Proceedings of the 17th international conference on World Wide Web, pages 111-120. ACM.
  16. Wang, C., Paisley, J. W., and Blei, D. M. (2011). Online variational inference for the hierarchical dirichlet process. In International Conference on Artificial Intelligence and Statistics, pages 752-760.
  17. Wang, X., Zhu, F., Jiang, J., and Li, S. (2013). Real time event detection in twitter. In Wang, J., Xiong, H., Ishikawa, Y., Xu, J., and Zhou, J., editors, Web-Age Information Management, volume 7923 of Lecture Notes in Computer Science, pages 502-513. Springer Berlin Heidelberg.
  18. Zhao, W. X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., and Li, X. (2011). Comparing twitter and traditional media using topic models. In Advances in Information Retrieval, pages 338-349. Springer.

Paper Citation

in Harvard Style

Wold H., Vikre L., Gulla J., Özgöbek Ö. and Su X. (2016). Twitter Topic Modeling for Breaking News Detection . In Proceedings of the 12th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST, ISBN 978-989-758-186-1, pages 211-218. DOI: 10.5220/0005801902110218

in Bibtex Style

author={Henning M. Wold and Linn Vikre and Jon Atle Gulla and Özlem Özgöbek and Xiaomeng Su},
title={Twitter Topic Modeling for Breaking News Detection},
booktitle={Proceedings of the 12th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,},

in EndNote Style

JO - Proceedings of the 12th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,
TI - Twitter Topic Modeling for Breaking News Detection
SN - 978-989-758-186-1
AU - Wold H.
AU - Vikre L.
AU - Gulla J.
AU - Özgöbek Ö.
AU - Su X.
PY - 2016
SP - 211
EP - 218
DO - 10.5220/0005801902110218