ing those topics to filter a new data set. The results
of this experiment show that the majority of tweets
fetched using this method is non news, achieving a
precision of only 0.103 in the best case.
Topic modeling itself is not likely to be sufficient
for detecting breaking news from Twitter. The tweets
are too short and too ambiguous to generate statistical
models of the necessary precision. As a supplement
to other techniques for news detection, they may how-
ever be useful, since they assume no knowledge of
location, time or author.
From a news aggregator perspective topic model-
ing is interesting also for clustering and summarizing
news content. Each tweet is associated with a number
of relevant topics or clusters, and each topic is again
described using a set of prominent word for that topic.
In the future we intend to further explore the cluster-
ing abilities of topic modeling to improve the user ex-
perience of our news aggregator. It allows us to struc-
ture news content along several dimensions and use
short labels to summarize sets of news stories.
REFERENCES
Blei, D. M. (2012). Probabilistic topic models. Communi-
cations of the ACM, 55(4):77–84.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent
dirichlet allocation. the Journal of machine Learning
research, 3:993–1022.
Gulla, J. A., Fidjestøl, A. D., Su, X., and Castejon, H.
(2014). Implicit user profiling in news recommender
systems.
Hong, L. and Davison, B. D. (2010). Empirical study of
topic modeling in twitter. In Proceedings of the First
Workshop on Social Media Analytics, pages 80–88.
ACM.
Hu, M., Liu, S., Wei, F., Wu, Y., Stasko, J., and Ma, K.-L.
(2012). Breaking news on twitter. In Proceedings of
the SIGCHI Conference on Human Factors in Com-
puting Systems, pages 2751–2754. ACM.
Ingvaldsen, J. E., Gulla, J. A., and
¨
Ozg
¨
obek,
¨
O. (2015).
User controlled news recommendations. In Proceed-
ings of the Joint Workshop on Interfaces and Hu-
man Decision Making for Recommender Systems co-
located with ACM Conference on Recommender Sys-
tems (RecSys 2015).
Kwak, H., Lee, C., Park, H., and Moon, S. (2010). What
is twitter, a social network or a news media? In
Proceedings of the 19th international conference on
World wide web, pages 591–600. ACM.
Mendoza, M., Poblete, B., and Castillo, C. (2010). Twitter
under crisis: Can we trust what we rt? In Proceedings
of the first workshop on social media analytics, pages
71–79. ACM.
Meyer, B., Bryan, K., Santos, Y., and Kim, B. (2011). Twit-
terreporter: Breaking news detection and visualiza-
tion through the geo-tagged twitter network. In CATA,
pages 84–89.
Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P.,
and Steyvers, M. (2010). Learning author-topic mod-
els from text corpora. ACM Transactions on Informa-
tion Systems (TOIS), 28(1):4.
Rosen-Zvi, M., Griffiths, T., Steyvers, M., and Smyth, P.
(2004). The author-topic model for authors and doc-
uments. In Proceedings of the 20th conference on
Uncertainty in artificial intelligence, pages 487–494.
AUAI Press.
Sakaki, T., Okazaki, M., and Matsuo, Y. (2010). Earthquake
shakes twitter users: real-time event detection by so-
cial sensors. In Proceedings of the 19th international
conference on World wide web, pages 851–860. ACM.
Sandhaus, E. (2008). The newyork times annotated cor-
pus. Linguistic Data Consortium, Philadelphia,
6(12):e26752.
Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M.
(2006). Hierarchical dirichlet processes. Journal of
the american statistical association, 101(476).
Titov, I. and McDonald, R. (2008). Modeling online re-
views with multi-grain topic models. In Proceedings
of the 17th international conference on World Wide
Web, pages 111–120. ACM.
Wang, C., Paisley, J. W., and Blei, D. M. (2011). Online
variational inference for the hierarchical dirichlet pro-
cess. In International Conference on Artificial Intelli-
gence and Statistics, pages 752–760.
Wang, X., Zhu, F., Jiang, J., and Li, S. (2013). Real time
event detection in twitter. In Wang, J., Xiong, H.,
Ishikawa, Y., Xu, J., and Zhou, J., editors, Web-Age
Information Management, volume 7923 of Lecture
Notes in Computer Science, pages 502–513. Springer
Berlin Heidelberg.
Zhao, W. X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H.,
and Li, X. (2011). Comparing twitter and traditional
media using topic models. In Advances in Information
Retrieval, pages 338–349. Springer.
WEBIST 2016 - 12th International Conference on Web Information Systems and Technologies
218