terpreted in a specialised piece of reality), which is
able to frame the translation of short sentences into
their correct context, hence providing the right sense
for each word to be translated. Once all words in
a phrase have been sense-tagged with ontology con-
cepts, the domain of discourse can be extracted from
it in a straightforward way. As a consequence, a clas-
sification by topic for all the sources being annotated
with those short texts can be provided for free.
Although our study is still in its infancy, we be-
lieve that what follows is able to provide a worthy
articulation of our approach. The paper is organised
as follows: Section 2 outlines the related work on
Domain Driven Word Sense Disambiguation and the
main differences with our work, Section 3 presents
the design and the workflow of our system, whereas
Section 4 discusses the evolutions envisioned for our
approach. Section 5 concludes.
2 RELATED WORK
Domain-Driven or Domain-Oriented Word Sense
Disambiguation (Navigli, 2009) is strongly focused
on providing the most appropriate sense label for a
word that is being used in domain-specific texts. The
peculiarity of this approach with respect to classical
Word Sense Disambiguation, according to Navigli,
lays in the paradigm “shift from linguistic understand-
ing to a domain-oriented type-based vision of sense
ambiguity”. This is especially true for cross-lingual
Word Sense Disambiguation, where the domain in-
formation of a phrase may result crucial for bringing
positive chances of a close translation.
A major source of domain information for the dis-
ambiguation of words has been in recent years the
WordNet lexical database, as witnessed by several
research studies (Gliozzo et al., 2004), (Cucchiarelli
and Velardi, 1998), (Buitelaar and Sacaleanu, 2001).
In these scenarios WordNet is used as a domain se-
mantic model, especially in its version where synsets
are tagged with domain labels
3
. Based on such mod-
els, score formulas are computed to determine the pre-
dominant sense of a word in a text. However, Word-
Net is not a proper domain ontology. Moreover, most
of these techniques rely on a trained corpus (Koeling
and McCarthy, 2008) (e.g. SemCor
4
and the like) as
a knowledge source, instead of a domain ontology.
Notably, a recent study (Agirre et al., 2009) en-
forces evidences in favour of knowledge-based meth-
ods (among which we include domain ontologies)
3
http://wndomains.fbk.eu/.
4
http://multisemcor.fbk.eu/semcor.php.
for boosting the disambiguation task in domain-
specific environments. The authors claim that, when
tagging domain-specific corpora, knowledge-based
Word Sense Disambiguation is performing better than
generic supervised Word Sense Disambiguation sys-
tems trained on generalistic corpora. The test was
conducted on 41 domain-related and highly polyse-
mous words in the two domains of Sports and Fi-
nance. The algorithm used is called Personalised Page
Rank and was applied to WordNet graph in order to
rank word senses.
These researches were conducted as a monolin-
gual task. In addition, very few attempts have been
made in the direction of developing Domain-Driven
Word Sense Disambiguation to real case applications.
The Omega ontology (Philpot et al., 2010) was
conceived as a synthesis of WordNet and Mikrokos-
mos (O’Hara et al., 1998), (Mahesh, 1996), a con-
ceptual resource properly designed to support trans-
lation. Besides the core concept base, Omega was
designed to connect with a range of auxiliary knowl-
edge sources, including domain ontologies, incorpo-
rated into the basic conceptual structure and represen-
tation.
In this paper we try to extend these directions of
research by exploiting ontologies conceived by do-
main experts as our knowledge source, and short texts
annotations of domain specific digital sources as our
target of disambiguation, translation and classifica-
tion tasks.
3 SYSTEM ARCHITECTURE
In this Section we will briefly depict the main steps of
our approach, and will give more details of the disam-
biguation and classification algorithm.
3.1 System Workflow
Figure 1 shows the main components, outputs and
data support sources of our system.
The purpose of our approach is the translation and
classification of a sentence in English into a sentence
in Italian by means of a domain ontology-driven word
sense disambiguation algorithm. The classification
by topic of the target sentence is obtained thanks to
the ontology that has been acknowledged to represent
the correct domain of both the source and the target
sentences after the execution of the domain driven
disambiguation procedure. The main steps of the
algorithm are depicted in the sequel. For sake of
clarity the sentence in English
TwoSidesofaCoin-TranslatewhileClassifyMultilanguageAnnotationswithDomainOntology-drivenWordSense
Disambiguation
359