ing those topics to filter a new data set. The results
of this experiment show that the majority of tweets
fetched using this method is non news, achieving a
precision of only 0.103 in the best case.
Topic modeling itself is not likely to be sufficient
for detecting breaking news from Twitter. The tweets
are too short and too ambiguous to generate statistical
models of the necessary precision. As a supplement
to other techniques for news detection, they may how-
ever be useful, since they assume no knowledge of
location, time or author.
From a news aggregator perspective topic model-
ing is interesting also for clustering and summarizing
news content. Each tweet is associated with a number
of relevant topics or clusters, and each topic is again
described using a set of prominent word for that topic.
In the future we intend to further explore the cluster-
ing abilities of topic modeling to improve the user ex-
perience of our news aggregator. It allows us to struc-
ture news content along several dimensions and use
short labels to summarize sets of news stories.
