REFERENCES
Atefeh, F. and Khreich, W. (2015). A survey of techniques
for event detection in twitter. Computational Intelli-
gence, 31(1):132–164.
Baralis, E., Cagliero, L., Jabeen, S., Fiori, A., and Shah,
S. (2013). Multi-document summarization based on
the yago ontology. Expert Systems with Applications,
40(17):6976–6984.
Bergstrom, K. (2011). Dont feed the troll: Shutting down
debate about community expectations on reddit. com.
First Monday, 16(8).
Cai, D., He, X., and Han, J. (2006). Tensor space model for
document analysis. In Proceedings of the 29th annual
international ACM SIGIR conference on Research and
development in information retrieval, pages 625–626.
ACM.
Duggan, M. and Smith, A. (2013). 6% of online adults are
reddit users. Pew Internet & American Life Project,
3:1–10.
Fabian, M., Gjergji, K., Gerhard, W., et al. (2007). Yago: A
core of semantic knowledge unifying wordnet and wi-
kipedia. In 16th International World Wide Web Con-
ference, WWW, pages 697–706.
Gilbert, E. (2013). Widespread underprovision on reddit.
In Proceedings of the 2013 conference on Computer
supported cooperative work, pages 803–808. ACM.
Hammouda, K. M., Matute, D. N., and Kamel, M. S.
(2005). Corephrase: Keyphrase extraction for docu-
ment clustering. In MLDM, volume 2005, pages 265–
274. Springer.
Hu, X. and Liu, H. (2012). Text analytics in social media.
Mining text data, pages 385–414.
Hubert, L. and Arabie, P. (1985). Comparing partitions.
Journal of classification, 2(1):193–218.
Kleinberg, J. M. (2003). An impossibility theorem for clus-
tering. In Advances in neural information processing
systems, pages 463–470.
Levy, O. and Goldberg, Y. (2014). Neural word embedding
as implicit matrix factorization. In Advances in neural
information processing systems, pages 2177–2185.
Li, X., Ramachandran, R., Movva, S., Graves, S., Plale,
B., and Vijayakumar, N. (2008). Storm clustering
for data-driven weather forecasting. In 24th Confe-
rence on International Institute of Professional Stu-
dies (IIPS). University of Alabama in Huntsville.
Meil
˘
a, M. (2007). Comparing clusteringsan information
based distance. Journal of multivariate analysis,
98(5):873–895.
Milligan, G. W. (1996). Clustering validation: results and
implications for applied analyses. In Clustering and
classification, pages 341–375. World Scientific.
M
¨
uller, C. and Gurevych, I. (2009). A study on the semantic
relatedness of query and document terms in informa-
tion retrieval. In Proceedings of the 2009 Conference
on Empirical Methods in Natural Language Proces-
sing: Volume 3-Volume 3, pages 1338–1347. Associa-
tion for Computational Linguistics.
Nagarajan, R. and Aruna, P. (2016). Construction of key-
word extraction using statistical approaches and docu-
ment clustering by agglomerative method. Internati-
onal Journal of Engineering Research and Applicati-
ons, 6(1):73–78.
Potts, L. and Harrison, A. (2013). Interfaces as rhetorical
constructions: reddit and 4chan during the boston ma-
rathon bombings. In Proceedings of the 31st ACM in-
ternational conference on Design of communication,
pages 143–150. ACM.
Rand, W. M. (1971). Objective criteria for the evaluation of
clustering methods. Journal of the American Statisti-
cal association, 66(336):846–850.
Sakaki, T., Okazaki, M., and Matsuo, Y. (2010). Earthquake
shakes twitter users: real-time event detection by so-
cial sensors. In Proceedings of the 19th international
conference on World wide web, pages 851–860. ACM.
Santos, J. M. and Embrechts, M. (2009). On the use of the
adjusted rand index as a metric for evaluating super-
vised classification. In International Conference on
Artificial Neural Networks, pages 175–184. Springer.
Sarstedt, M. and Mooi, E. (2014). Factor Analysis, pages
235–272. Springer Berlin Heidelberg, Berlin, Heidel-
berg.
Shah, N. and Mahajan, S. (2012). Semantic based document
clustering: A detailed review. International Journal of
Computer Applications, 52(5).
Shahnaz, F., Berry, M. W., Pauca, V. P., and Plemmons,
R. J. (2006). Document clustering using nonnegative
matrix factorization. Information Processing & Ma-
nagement, 42(2):373–386.
Singer, P., Fl
¨
ock, F., Meinhart, C., Zeitfogel, E., and Stroh-
maier, M. (2014). Evolution of reddit: from the front
page of the internet to a self-referential community?
In Proceedings of the 23rd International Conference
on World Wide Web, pages 517–522. ACM.
Song, W., Li, C. H., and Park, S. C. (2009). Genetic algo-
rithm for text clustering using ontology and evaluating
the validity of various semantic similarity measures.
Expert Systems with Applications, 36(5):9095–9104.
Steinley, D. (2004). Properties of the hubert-arable adjusted
rand index. Psychological methods, 9(3):386.
Strapparava, C., Valitutti, A., et al. (2004). Wordnet affect:
an affective extension of wordnet. In LREC, volume 4,
pages 1083–1086.
Wallace, M. (2007). Jawbone Java WordNet API.
Weng, J. and Lee, B.-S. (2011). Event detection in twitter.
ICWSM, 11:401–408.
Weninger, T., Zhu, X. A., and Han, J. (2013). An explo-
ration of discussion threads in social news sites: A
case study of the reddit community. In Proceedings of
the 2013 IEEE/ACM International Conference on Ad-
vances in Social Networks Analysis and Mining, pages
579–583. ACM.
Zamir, O., Etzioni, O., Madani, O., and Karp, R. M. (1997).
Fast and intuitive clustering of web documents. In
KDD, volume 97, pages 287–290.
Zheng, H.-T., Kang, B.-Y., and Kim, H.-G. (2009). Ex-
ploiting noun phrases and semantic relationships for
text document clustering. Information Sciences,
179(13):2249–2262.
KDIR 2018 - 10th International Conference on Knowledge Discovery and Information Retrieval
202