with extremely low cosine similarities. By using au -
toencod ers in an unsupervised approach, we get more
compact representations with less noise which auto-
matically disambiguate the similarities between syn-
onyms. T he fine-tuning model also indicates th e po-
tential of representation s to be learned by deep neu-
ral networks with carefully designed loss functions
and knowledge graphs or lexical datasets. In this pa-
per, we set the loss function in a softmax form with
a dataset of WordNet synsets to calculate the poste-
rior probability, this method improves cosine simila-
rities between syno nym words and decreases similari-
ties of non-syn onym ones. Un like the idea in stacked
autoencoders, by encoding word embeddings in a su-
pervised way, the model only extracts useful sem antic
features for synonyms, and makes them closer.
Both of the models achieved significantly better
performance than word2 vec on measuring synonym
relatedness, shed light on exploiting word embed-
dings in a supervised or unsupervised way. But these
two models come up from different ideas and there is
still something confuses u s, th e deeper stack autoen-
coders we use, the loss wh e n converging will be big-
ger for each autoen c oder in the network, we will keep
studying o n this phenomenon in the future to probe
the features of autoencoders and word representati-
ons. For unsupervised learnin g, we plan to compre-
hensively evaluate the energy of autoencoders. We
will explore the changes in linguistic regularities of
latent repre sentations, and discover the patterns of se-
mantic and syntactic properties in embedding s. Au-
toencod er may be a good toolkit to clarify the mea-
ning of opaque vectors. Our future work will also fo-
cus on disambig uating entities types by setting a clas-
sifier on the top layer of the network.
ACKNOWLEDGMENT
This work was partially supp orted by NEDO (New
Energy and Industrial Te chnology Development Or-
ganization ).
REFERENCES
Yoshua Bengio, R´ejean Ducharme, Pascal Vincent, and
Christian Jauvin. A neural probabilistic language mo-
del. Journal of machine learning research, 3(Feb):
1137–1155, 2003.
David M Blei, Andrew Y Ng, and Michael I Jordan. La-
tent dirichlet allocation. Journal of machine Learning
research, 3(Jan):993–1022, 2003.
Danushka Bollegala, Alsuhaibani Mohammed, Takanori
Maehara, and Ken-ichi Kawarabayashi. Joint word
representation learning using a corpus and a semantic
lexicon. arXiv preprint arXiv:1511.06438, 2015.
John A Bullinaria and Joseph P Levy. E xtracti ng semantic
representations from word co-occurrence statistics: A
computational study. Behavior research methods, 39
(3):510–526, 2007.
Danqi Chen and Christopher D Manning. A fast and accu-
rate dependency parser using neural networks. In P ro-
ceedings of the 2014 Conference on Empirical Met-
hods i n Natural Language Processing (EMNLP), pa-
ges 740–750, 2014.
Ronan Collobert and Jason Weston. A unified architec-
ture for natural language processing: Deep neural net-
works with multit ask learning. In Proceedings of the
25th International Conference on Machine learning,
pages 160–167. ACM, 2008.
Ronan Collobert, Jason Weston, L´eon Bottou, Michael Kar-
len, Koray Kavukcuoglu, and Pavel Kuksa. Natural
language processing (almost) from scratch. Journal
of Machine Learning Research, 12(Aug):2493–2537,
2011.
George E Dahl, Dong Yu, Li Deng, and Alex Acero.
Context-dependent pre-trained deep neural networks
for large-vocabulary speech recognition. IEEE Tran-
sactions on Audio, Speech, and Language Processing,
20(1):30–42, 2012.
Geoffrey E Hinton and Ruslan R Salakhutdinov. Redu-
cing the dimensionality of data wit h neural networks.
Science, 313(5786):504–507, 2006.
Hongzhao Huang, Larry Heck, and Heng Ji. Leveraging
deep neural networks and knowledge graphs for entity
disambiguation. arXiv preprint arXiv:1504.07678,
2015.
Hanna Kamyshanska and Roland Memisevic. The potential
energy of an autoencoder. IEEE transactions on pat-
tern analysis and machine intelligence, 37(6):1261–
1273, 2015.
Diederik Kingma and Jimmy Ba. Adam: A method for sto-
chastic optimization. arXiv preprint arXiv:1412.6980,
2014.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton.
Imagenet classification with deep convolutional neu-
ral networks. In Advances in neural information pro-
cessing systems, pages 1097–1105, 2012.
Ra´ul Ernesto Men´endez-Mora and Ryutaro Ichise. Toward
simulating the human way of comparing concepts.
IEICE TRANSACTIONS on Information and Systems,
94(7):1419–1429, 2011.
Tomas Mikolov and J Dean. Distributed representations of
words and phrases and their compositionality. 2013b.
Tomas Mikolov, Stefan Kombrink, Luk´aˇs Burget, Jan
ˇ
Cernock`y, and Sanjeev Khudanpur. Extensions of re-
current neural network language model. In 2011 IEEE
International Conference on Acoustics, Speech and
Signal Processing (ICASSP), pages 5528–5531. IEEE ,
2011.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean.
Efficient estimation of word representations in vector
space. arXiv preprint arXiv:1301.3781, 2013a.
George A Miller. Wordnet: a lexical database for english.
Communications of the ACM, 38(11):39–41, 1995.