latedness using distributional and WordNet-based ap-
proaches. In Human Language Technologies: North
American Chapter of the Association for Computa-
tional Linguistics (NAACL), pages 19–27, Strouds-
burg, PA.
Baeza-Yates, R. and Ribeiro-Neto, B., editors (1999). Mod-
ern Information Retrieval. ACM Press Series/Addison
Wesley, Essex, UK.
Baroni, M., Dinu, G., and Kruszewski, G. (2014). Don’t
count, predict! a systematic comparison of context-
counting vs. context-predicting semantic vectors. In
Annual Meeting of the Association for Computational
Linguistics (ACL), pages 238–247, Baltimore, MD.
Bengio, Y., Ducharme, R., Vincent, P., and Janvin, C.
(2003). A neural probabilistic language model. Ma-
chine Learning Research (JMLR), 3:1137–1155.
Collobert, R., Weston, J., Bottou, L., Karlen, M.,
Kavukcuoglu, K., and Kuksa, P. (2011). Natural lan-
guage processing (almost) from scratch. Machine
Learning Research (JMLR), 12:2493–2537.
Cormen, T. H., Leiserson, C. H., Rivest, R. L., and
Stein, C. (1990). Introduction to Algorithms. MIT
Press/McGraw-Hill Book Company, Cambridge, MA.
Duda, R. O., Hart, P. E., and Stork, D. G. (2001). Unsu-
pervised learning and clustering. In Pattern Classifi-
cation, pages 517–601. Wiley, New York, NY.
Fu, R., Guo, J., Qin, B., Che, W., Wang, H., and Liu, T.
(2014). Learning semantic hierarchies via word em-
beddings. In Annual Meeting of the Association for
Computational Linguistics (ACL), pages 1199–1209,
Baltimore, MD.
Google (2008). Google Bible Text. https://sites.
google.com/site/ruwach/bibletext.
Guo, J., Che, W., and Wang, H., L. T. (2014). Revisiting
embedding features for simple semi-supervised learn-
ing. In Empirical Methods in Natural Language Pro-
cessing (EMNLP), pages 110–120, Doha, Qatar.
Hofmann, T. and Buhmann, J. (1995). Multidimensional
scaling and data clustering. In Advances in Neural
Information Processing Systems, pages 459–466. MIT
Press, Cambridge, MA.
Iyyer, M., Boyd-Graber, J., Claudino, L., Socher, R., and
Daume, H. (2014). A neural network for factoid ques-
tion answering over paragraphs. In Empirical Meth-
ods in Natural Language Processing (EMNLP), pages
633–644, Doha, Qatar.
Jiang, R., Liu, Y., and Xu, K. (2015). A general
framework for text semantic analysis and clustering
on Yelp reviews. http://cs229.stanford.edu/proj2015/
003 report.pdf.
Kaufman, L. and Rousseeuw, P. J., editors (1990). Finding
Groups in Data: An Introduction to Cluster Analysis.
Wiley, New York, NY.
Kim, Y. (2014). Convolutional neural networks for sentence
classification. In Empirical Methods in Natural Lan-
guage Processing (EMNLP), pages 1746–1751, Doha,
Qatar.
Le, Q. V. and Mikolov, T. (2014). Distributed rep-
resentations of sentences and documents. ArXiv
e-prints, 1405.4053. http://adsabs.harvard.edu/abs/
2014arXiv1405.4053L.
Manning, C. D., Raghavan, P., and Schutze, H. (2008). In-
troduction to Information Retrieval. Cambridge Uni-
versity Press, Cambridge, United Kingdom.
Mikolov, T., Chen, K., Corrado, G., and Dean, J.
(2013a). Efficient estimation of word representa-
tions in vector space. ArXiv e-prints, 1301.3781.
http://adsabs.harvard.edu/abs/2013arXiv1301.3781M.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and
Dean, J. (2013b). Distributed representations of words
and phrases and their compositionality. In Advances in
Neural Information Processing Systems, pages 3111–
3119. Curran Associates, Inc., Red Hook, NY.
Pennington, J., Socher, R., and Manning, C. D. (2014).
GloVe: Global vectors for word representation. In
Empirical Methods in Natural Language Processing
(EMNLP), pages 1532–1543, Doha, Qatar.
R (1997). R project for statistical computing. http://www.r-
project.org/.
Salton, G., Wong, A., and Yang, C. S. (1975). A vector
space model for automatic indexing. Communications
of the ACM, 18(11):613–620.
Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning,
C. D., Ng, A. Y., and Potts, C. (2013). Recursive deep
models for semantic compositionality over a senti-
ment treebank. In Empirical Methods in Natural Lan-
guage Processing (EMNLP), pages 1631–1642, Seat-
tle, WA.
Torgerson, W. S. (1958). Theory and Methods of Scaling.
John Wiley and Sons, New York, NY.
Turney, P. D. and Pantel, T. (2010). From frequency to
meaning: Vector space models of semantics. Artifi-
cial Intelligence Research (JAIR), 37:141–188.
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., and
Hovy, E. (2016). Hierarchical attention networks for
document classification. In Human Language Tech-
nologies: North American Chapter of the Association
for Computational Linguistics, pages 1480–1489, San
Diego, California.
A Hierarchical Book Representation of Word Embeddings for Effective Semantic Clustering and Search
163