“spring” is FOUNTAIN, not SEASON.
This paper proposes a novel unsupervised WSD
method that extends Basile’s method. While the
Basile’s method only considers words in a context
for WSD, our method also takes collocations into ac-
count to determine a sense of a given word. In addi-
tion to the ordinary collocation (adjacent words that
often appear together), we also define a dependency
collocation, which is a syntactic dependency relation
between a target word and another word in a sentence.
We also propose to change the way how to make
a context vector in the semantic space. In the origi-
nal research, context embedding is computed by av-
erage of word embedding of all words in a context.
However, not all words are related to a sense of a tar-
get word. Our method only considers words that are
highly related to the sense when the context embed-
ding is built.
The rest of the paper is organized as follows.
Section 2 provides a brief introduction about related
work. Section 3 describes the details of our pro-
posed method. Section 4 reports several experiments
to evaluate our method. Finally, Section 5 concludes
the paper.
2 RELATED WORK
There are three commonly used features in WSD. The
first one is words in the surroundings of the target
word. Part-of-speech (POS) tags of the neighboring
words are also widely used features. Local colloca-
tions represent another standard feature that captures
the ordered sequence of words which tend to appear
around the target word (Bazell, 1959).
Many unsupervised WSD methods are based on
calculation of similarity between word sense and its
context using some features. One of the most tra-
ditional methods for unsupervised WSD is Lesk al-
gorithm (Lesk, 1986). It is based on the assumption
that words in a given section of text will tend to share
a same topic. As already explained, it computes the
similarity between the sense definition of an ambigu-
ous word and the terms appearing in its neighborhood.
There are many measures to determine the similarity
between a sense and a context. Torres and Gelbukh
present a comparison of several similarity measures
applied to WSD by the Lesk algorithm (Torres and
Gelbukh, 2009). Since gloss sentences tend to be
short, several methods use external resources to get
additional information of the sense. Bhingardive et
al. try to use broad information of lexical database
related to the sense, such as hypernyms, hyponyms,
synonyms, and even example sentences in the dictio-
nary to construct vector representation of the sense in
order to identify the most frequent sense (Bhingardive
et al., 2015).
The most important paper related to this study is
(Basile et al., 2014). It utilizes semantic space, which
is geometrical space of words where vectors express
concepts of words. The proximity in the space can
measure semantic relatedness between words. Since
the gloss (definition) and the context are composed by
several terms, the vector of each set of terms is built
by adding the vector of every single words in the set.
Pre-trained word embedding is used to construct the
gloss and context vectors. The cosine similarity be-
tween gloss and context is used to choose the appro-
priate sense of the target word.
As already discussed in Section 1, this paper ex-
tends the Basile’s method in two directions. One is
to incorporate a mechanism to determine a sense us-
ing a collocation. Rules to determine a sense, which
are based on collocations, are automatically acquired
from a raw corpus, then these rules are integrated to
the Basile’s WSD model. The other is to propose a
better way to construct the context vector, since the
performance of WSD heavily relies on the quality of
it.
3 PROPOSED METHOD
Figure 1 shows an overview of the proposed sys-
tem. It accepts a sentence including a target word
as an input and proposes a sense for it as an output.
Our system consists of two modules: one is a rule
based WSD system, the other is a WSD system based
on Highly Related Word Embedding (hereafter, the
HRWE method in short). The first module uses the
database of collocation WSD rules, which determine
the sense by a collocation (word sequence). Briefly,
these rules determine the sense by a collocation as
collocation → sense. If a rule is hit for a collocation in
a given sentence, the sense is chosen by the rule, oth-
erwise the next module is applied. The second mod-
ule is similar to (Basile et al., 2014). It measures the
similarity between gloss sentences in a dictionary and
a context of a target word in a given sentence, then
chooses the sense whose gloss is the most similar to
the context of the target word. Since the rule-based
module is designed to achieve high precision in com-
pensation for low recall, it is applied first.
In the following subsections, the HRWE method
will be introduced first, since it is also used to con-
struct the sets of the collocation WSD rules. Then,
the rule based WSD system is described, especially
how to acquire WSD rules automatically.
Unsupervised Word Sense Disambiguation based on Word Embedding and Collocation
1219