Table 8: Trend Prediction.
Model Number of Articles in each Trend
ITTM Articles 6 11 17 10 15 13 19 19 12 129 7 6 9 8 7
Related 5 4 8 5 8 8 12 3 8 60 4 4 3 5 4
STM Articles 7 9 16 6 85 8 15 96 17 13 6 6
Related 3 5 6 3 48 5 6 42 10 9 4 3
iLDA Articles 19 7 9 8 6 7 11 183 44 18
Related 8 4 3 3 3 3 5 83 18 11
son on topic inference between these models.
Then we organized clustered articles by the mod-
els, and made a manual evaluation (based on arti-
cle title and news description) as shown in Table 8.
We got the sum precision about trend prediction of
these models. Each of ITTM, STM and infinite LDA
is 0.4896, 0.5070 and 0.4519. Interestingly, we find
some trends contain much more articles than the other
ones. The reason is that in the middle of Jan. 2010,
a powerful earthquake rocks Haiti which triggered a
series of news reports on this disaster. Most articles
in these trends are concerning this event. After all,
both of ITTM and STM can predict a real world trend
successfully, even though on the booming event like
“Haiti earthquake”.
5 CONCLUSIONS AND FUTURE
WORK
In this paper, we present two approaches incorporat-
ing HDP and temporal information on real-world task
without Markov assumption. Meanwhile, a Wikipedia
semantic based approach has been exploited to im-
prove the results of topic modelling. Namely, the
models hold the complexity in a low level with suc-
cinct graphic representation. The experimental results
indicate the capability of tracking trend from news
media. As a significant finding, the ITTM simulates
the peak of event trend precisely but fails to handle
the multi-spikes situation. While the STM is capable
of tracking the trends with fluctuations and discov-
ering new topics, stable topics and vanished topics.
Because of the flexibility and no number limitation of
topics, the models can be easily extended to other sce-
narios. Our future work might focus on tracking the
user interest by incorporating propagation algorithms
based on proposed models. The combination of infi-
nite topic modelling and location factor is also under
our consideration.
REFERENCES
Ahmed, A. and Xing, E. P. (2010). Timeline: A dynamic
hierarchical dirichlet process model for recovering
birth/death and evolution of topics in text stream. In
UAI ’10.
AlSumait, L., Barbara, D., and Domeniconi, C. (2008).
On-line lda: Adaptive topic models for mining text
streams with applications to topic detection and track-
ing. In ICDM ’08, pages 3–12.
Balasubramanyan, R., Cohen, W. W., and Hurst, M. (2009).
Modeling corpora of timestamped documents using
semisupervised nonparametric topic models. In NIPS.
Blei, D., Ng, A., Jordan, M., and Lafferty, J. (2003). La-
tent dirichlet allocation. Journal of Machine Learning
Research, 3(993-1022).
Blei, D. M. and Lafferty, J. D. Dynamic topic models. In
ICML.
Ferguson, T. (1973). Bayesian analysis of some nonpara-
metric problems. Annals of Statistics, 1:209–230.
Heinrich, G. (2011). ”infinite lda”-implementing the hdp
with minimum code complexity. Tecnical Note.
Hofmann, T. (1999). Probabilistic latent semantic indexing.
In SIGIR.
Hong, L., Yin, D., Guo, J., and Davison, B. D. (2011).
Tracking trends: Incorporating term volume into tem-
poral topic models. In KDD.
Kataria, S. S., Kumar, K. S., Rastogi, R., Sen, P., and Sen-
gamedu, S. H. (2011). Entity disambiguation with hi-
erarchical topic models. In KDD.
Landauer, T. K.and Dumais, S. T. (1997). A solution to
plato’s problem: the latent semantic analysis theory
of acquisition, induction, and representation of knowl-
edge. Psychological Review, 104(211-240).
Lau, J. H., Grieser, K., Newman, D., and Baldwin, T.
(2011). Automatic labelling of topic models. In Pro-
ceedings of the 49th Annual Meeting of the Associa-
tion for Computational Linguistics, pages 1536–1545.
Newman, D., Chemudugunta, C., and Smyth, P. (2006). Sta-
tistical entitytopic models. In KDD.
Ni, X., Sun, J.-T., Hu, J., and Chen, Z. (2009). Mining mul-
tilingual topics from wikipedia. In WWW.
Ren, L., Dunson, D. B., and Carin, L. (2008). The dynamic
hierarchical dirichlet process. In ICML.
Sudderth, E. B. (2006). Graphical models for visual ob-
ject recognition and tracking. Doctoral Thesis, Mas-
sachusetts Institute of Technology.
Teh, Y., Jordan, M., Beal, M., and Blei, D. (2006). Hier-
archical dirichlet processes. Journal of the American
Statistical Association, 101(1566-1581).
Wang, C., Blei, D. M., and Heckerman, D. (2008). Contin-
uous time dynamic topic models. In UAI ’08, pages
579–586.
XueruiWang and McCallum, A. (2006). Topics over time: a
non-markov continuous-time model of topical trends.
In KDD.
Zhang, J., Song, Y., Zhang, C., and Liu, S. (2010). Evo-
lutionary hierarchical dirichlet processes for multiple
correlated time-varying corpora. In KDD.
KDIR2012-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval
44