extraction APIs, and wikipedia categories in order to
effectively mine user’s interests in Twitter. Through
the combination of machine learning, information re-
trieval, and knowledge bases, we were able to mit-
igate the obvious limitation of the small size of the
training data set and to extract not only keywords but
also keyphrases as possible interests.
Overall, the evaluation results showed that Wiki-
LDA clearly outperforms standard LDA in terms of
the meaningfulness and coherence of the extracted in-
terests as well as the accuracy of the classification of
the interests in related topics. Hence, this work pro-
vides a novel method for significantly improving in-
terest mining on Twitter data.
While our early results are encouraging for gen-
erating the interest profile of a Twitter user, there are
still a number of areas we would like to improve. The
first, and most important are is defining a large train-
ing corpus, which is crucial for a machine learning
task. We have crawled tweets from 3-4 user accounts
from Twitter for each abstract topic as training set. A
logical next step to improve is hence to gather many
more Tweets from more users, and improve the range
of possible abstract topics in order to classify more
latent words.
Moreover, the Wiki-LDA algorithm has still room
for improvement. One technical limitation of LDA
is the need to fix the possible number of topics K be-
fore learning. To improve on this one can consider the
possibility of letting K to be infinity in LDA and deter-
mine the number of topics through a separate learning
process.
Another important area to improve is our evalu-
ation. We plan to perform a larger scale experiment
in a real learning environment which will allow us to
thoroughly evaluate our interest mining approach.
ACKNOWLEDGEMENTS
The first author acknowledges the sup-
port of the Swiss National Science Founda-
tion through the MODERN Sinergia Project
(www.idiap.ch/project/modern).
REFERENCES
Baker, R. (2010). Data mining for education. International
Encyclopedia of Education, 7:112–118.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent
dirichlet allocation. Journal of Machine Learning Re-
search, 3:993–1022.
Brusilovsky, P. and Millan, E. (2007). User models for
adaptive hypermedia and adaptive educational sys-
tems. In Brusilovsky, P., Kobsa, A., and Nejdl, W., ed-
itors, The Adaptive Web, LNCS 4321, chapter 1, pages
3–53. Springer-Verlag Berlin Heidelberg.
Chatti, M. A. (2010). The laan theory. In Personalization
in Technology Enhanced Learning: A Social Software
Perspective, pages 19–42. Aachen, Germany: Shaker
Verlag.
Chatti, M. A., Dyckhoff, A. L., Schroeder, U., and Th
¨
us,
H. (2012). A reference model for learning analytics.
International Journal of Technology Enhanced Learn-
ing, 4(5/6):318–331.
Chatti, M. A., Lukarov, V., Th
¨
us, H., Muslim, A., Yousef,
A. M. F., Wahid, U., Greven, C., Chakrabarti, A., and
Schroeder, U. (2014). Learning analytics: Challenges
and future research directions. e-learning and educa-
tion journal (eleed), 10.
Kay, J. and Kummerfeld, B. (2011). Lifelong learner mod-
eling. In Durlach, P. J. and Lesgold, A. M., editors,
Adaptive Technologies for Training and Education,
pages 140–164. Cambridge University Press.
Mehrotra, R., Sanner, S., Buntine, W., and Xie, L. (2013).
Improving lda topic models for microblogs via tweet
pooling and automatic labeling. In Proceedings of
the 36th international ACM SIGIR conference on
Research and development in information retrieval,
pages 889–892. ACM.
Michelson, M. and Macskassy, S. A. (2010). Discovering
users’ topics of interest on twitter: a first look. In
Proceedings of the fourth workshop on Analytics for
noisy unstructured text data, pages 73–80. ACM.
Puniyani, K., Eisenstein, J., Cohen, S., and Xing, E. P.
(2010). Social links from latent topics in microblogs.
In Proceedings of the NAACL HLT 2010 Workshop on
Computational Linguistics in a World of Social Media,
pages 19–20. Association for Computational Linguis-
tics.
Quercia, D., Askham, H., and Crowcroft, J. (2012). Tweet-
lda: supervised topic classification and link prediction
in twitter. In Proceedings of the 3rd Annual ACM Web
Science Conference, pages 247–250. ACM.
Ramage, D., Dumais, S. T., and Liebling, D. J.
(2010). Characterizing microblogs with topic models.
ICWSM, 10:1–1.
Romero, C., Ventura, S., Pechenizkiy, M., and Baker, R. S.
(2010). Handbook of educational data mining. CRC
Press.
Zhao, W. X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H.,
and Li, X. (2011). Comparing twitter and traditional
media using topic models. In Advances in Information
Retrieval, pages 338–349. Springer.
Wiki-LDA: A Mixed-Method Approach for Effective Interest Mining on Twitter Data
433