with extremely low cosine similarities. By using au -
toencod ers in an unsupervised approach, we get more
compact representations with less noise which auto-
matically disambiguate the similarities between syn-
onyms. T he fine-tuning model also indicates th e po-
tential of representation s to be learned by deep neu-
ral networks with carefully designed loss functions
and knowledge graphs or lexical datasets. In this pa-
per, we set the loss function in a softmax form with
a dataset of WordNet synsets to calculate the poste-
rior probability, this method improves cosine simila-
rities between syno nym words and decreases similari-
ties of non-syn onym ones. Un like the idea in stacked
autoencoders, by encoding word embeddings in a su-
pervised way, the model only extracts useful sem antic
features for synonyms, and makes them closer.
Both of the models achieved significantly better
performance than word2 vec on measuring synonym
relatedness, shed light on exploiting word embed-
dings in a supervised or unsupervised way. But these
two models come up from different ideas and there is
still something confuses u s, th e deeper stack autoen-
coders we use, the loss wh e n converging will be big-
ger for each autoen c oder in the network, we will keep
studying o n this phenomenon in the future to probe
the features of autoencoders and word representati-
ons. For unsupervised learnin g, we plan to compre-
hensively evaluate the energy of autoencoders. We
will explore the changes in linguistic regularities of
latent repre sentations, and discover the patterns of se-
mantic and syntactic properties in embedding s. Au-
toencod er may be a good toolkit to clarify the mea-
ning of opaque vectors. Our future work will also fo-
cus on disambig uating entities types by setting a clas-
sifier on the top layer of the network.
This work was partially supp orted by NEDO (New
Energy and Industrial Te chnology Development Or-
ganization ).
