6 SPECIFIC DOMAIN WORD
SENSE DISAMBIGUATION
In this section the related and local context are inte-
grated to disambiguate the words in a test document.
For each word w to be disambiguated in the docu-
ment, two graphs were built and evaluated indepen-
dently, one with the local context and other with the
related context, applying graph centrality algorithms
(Section 5), thus two vectors were obtained as a re-
sult of these evaluations, V
rt
= {x
1
, x
2
, · ·· , x
n
} and
V
lc
= {x
1
, x
2
, · ·· , x
m
}. These vectors are integrated
as shows Eq. 9 to produce a final sorted vector of
synsets.
V
final
=
V
rt
[x] ∗V
lc
[x]
V
rt
[x] +V
lc
[x]
(9)
where x is a item in the vector,V
rt
[x]∗V
lc
[x] is defined
as {V
rt
[x] ∗ V
lc
[x] | xεV
rt
∩ V
lc
}, and V
rt
[x] + V
lc
[x] is
defined as { V
rt
[x] +V
lc
[x] | xεV
rt
, xεV
lc
}. The synset
with the highest value is selected as the right sense for
the ambiguous word w.
7 EXPERIMENTS AND RESULTS
In this section the obtained results for WSD on a spe-
cific domain are presented.
7.1 Test Data
For the experiments, the gold standard dataset re-
leased by SemEval 2010 (Agirre et al., 2010) was
used. This dataset contains 1, 398 instances of am-
biguous words, 366 verbs, and 1032 nouns. For effi-
ciency reasons, in the experiments the related context
is formed by 5 semantic terms and local context as it
was described above (Subsection 3.2).
7.2 Analysis
Tables 1, 2, and 3 show the results obtained with the
algorithms used in the described approach. As can
be seen in tables, the Indegree measure obtained the
best results in the three scenarios: using only the se-
mantic similarity, using only the context information,
and combining both techniques. A best performance
was obtained in the third scenario. This fact motivates
us to consider in a future work to improve the disam-
biguation process following this approach.
The tested measures were selected after compar-
ing the PageRank algorithm (Brin and Page, 1998);
for such measure, first, a directed graph was proposed
but results were poor. Then, the graph was changed
to an undirected representation, which was the better
option, as show the results from Tables 1, 2, and 3.
We think that Indegree is better because benefit of a
large number of semantic relations, particularly of a
densely connected graph. On the other hand, DH im-
proved the results because, the context is not limited
to a window size, wich determine the context range.
Thus, a parser technique is used to describe the gram-
matical structure of the sentences, then the context of
each word encountered in the corpus was extracted
(see Section 3). This co-occurrence context-word is
weighted according to their frequency in the corpus.
We conclude that using the parser is better instead of
a neighborhood around the ambiguous word.
The Table 4 shows a comparison with the works
presented in the SemEval 2010 competition. The im-
plemented system got a better performance than other
systems, approximately 10% more on the precision
and recall. Moreover, the obtained results are slightly
low in comparison with the best one.
8 CONCLUSIONS AND FURTHER
WORK
In this research, an approach for WSD on specific
domain was presented. In such approach we have
suggested a new method that uses the local and re-
lated context to retrieve second order vectors from
WordNet to disambiguate combining both informa-
tion. The experimental results comparing with Se-
mEval 2010 showed promising results in precision
and recall. As further work we think that better re-
sults could be gained using some techniques to ex-
tract key terms from an additional corpus to increase
the size of the original corpus, as well as other seman-
tic similarity measures to extract related words for an
ambiguous word.
REFERENCES
Agirre, E., lopez de Lacalle, O., Fellbaum, C., Hsieh, S.,
Tesconi, M., Monachini, M., Vossen, P., and Segers,
R. (2010). Semeval-2010 task 17: All-words word
sense disambiguation on a specific domain. In In
Proceedings of the 5th International Workshop on Se-
mantic Evaluations (SemEval-2010), Association for
Computational Linguistics.
Agirre, E. and Soroa, A. (2009). Personalizing pagerank
for word sense disambiguation. In In Proc. of EACL,
pages 3341.
Brin, S. and Page, L. (1998). The anatomy of a large-scale
hypertextual web search engine. In Computer Net-
works 30, 107117.
CombiningLocalandRelatedContextforWordSenseDisambiguationonSpecificDomains
139