UNSUPERVISED ALGORITHM FOR THE CONCEPT
DISAMBIGUATION IN ONTOLOGIES
Semantic Rules and Voting System to Determine Suitable Senses
Isaac Lera, Carlos Juiz and Ramon Puigjaner
Departament Matem
`
atiques i Inform
`
atica, Universitat de les Illes Balears, crt. Valldemossa 7.5km, Palma, Spain
Keywords:
Word sense disambiguation, Ontology, Semantic web, Ontology matching.
Abstract:
We present a new unsupervised algorithm which uses external resources and does not require any training to
determine the sense more suitable of an ontology concept. We try to find out lexical coincidences among terms
of both resources: ontology and WordNet. Through a voting system, we give weight each sense according to
measurable parameters and logic rules, in function of semantic role of each correspondence element.
1 INTRODUCTION
Word Sense Disambiguation (WSD) consist on “de-
termine which of the senses of an ambiguous word is
invoked in a particular use of the word”. These tech-
niques are used across several applications: machine
translation, data search, knowledge discover, data in-
tegration, etc., where one of their end goals is an ef-
fective processing of overwhelming amounts of data.
We have decided to discover the sense of ontol-
ogy concepts which are represented by Ontology Web
Language (OWL), standard of W3C. We focus on this
representation for the following reasons: one of them
is implicit in one goal of the Semantic Web (SW)
to relate the meaning with data in an unequivocal
way. OWL is a formal language with a logic structure
which is highly related with other ontology elements:
concepts, axioms, instances, properties, and individu-
als -it provides new mechanisms to relate elements-,
and other reason is the numerous projects, applica-
tions and resources based on SW that arise every day.
Ontology Matching (OM) is a vital subtask of Ontol-
ogy Engineering, numerous tasks depend on it: edi-
tion, query processing, ontology and data repository
managing, and visualization. OM are a set of tech-
niques to establish relationships among ontology el-
ements that they share some common meaning. For
that, when we map two elements is necessary to find
out their meaning applying WSD techniques.
2 RELATED WORK
We mention some works that serve us to introduce
this work. Liu et al.s approach (Liu et al., 2005) have
some overlaps with our approach. They exploit from
WordNet links as synonyms, hyponyms, hyperonyms
and synonyms’ definitions to determine the sense of
main query words. Their system checks 4x4 coin-
cidences across query words, instead our algorithm
considers 8x8 matches with extra semantic rules and
semantic mapping with ontologies. When they found
coincidences in the apparition of words with same
name then they assign the corresponding senses of
these words. We have thought up a voting system
where the elements and semantic of the correspon-
dence determine the weight of each sense.
Banek et al. (Banek et al., 2008) present a “CSD”
algorithm that increases the efficiency of Ontology
Matching process. They calculate a probability func-
tion comparing the taxonomy of the ontology with a
portion of taxonomy of WordNet, establishing a cor-
respondence among the name of ontology class and
WordNet noun. The neighbourhood of a class within
each portion of taxonomy is based on next links: di-
rect subclasses and superclasses, ranges of its own
properties, and ranges where property concept is part
of the range. In our case, we use most of ontology
relationships regard to the concept, i.e. equivalent
classes or transitive properties, individuals, all POS
categories of WordNet, among others.
Khelif et al. (Khelif et al., 2008) developed a algo-
rithm for CSD based on distance that they defined on
ontology data. This distance is computed by weight-
388
Lera I., Juiz C. and Puigjaner R..
UNSUPERVISED ALGORITHM FOR THE CONCEPT DISAMBIGUATION IN ONTOLOGIES - Semantic Rules and Voting System to Determine Suitable
Senses.
DOI: 10.5220/0003094403880391
In Proceedings of the International Conference on Knowledge Engineering and Ontology Development (KEOD-2010), pages 388-391
ISBN: 978-989-8425-29-4
Copyright
c
2010 SCITEPRESS (Science and Technology Publications, Lda.)
ing the edges of the path between both classes. This
path is a combination among hierarchical links, the
domain and range of object properties. They tested
their approach in the extraction of annotations from
the Unified Medical Language System (UMLS) ob-
taining an highest average precision. In our case,
we can apply our algorithm in any open discourse
with generic ontologies. Castano et al. (Castano
et al., 2003) presented an ontology matching algo-
rithm where the semantic affinity between two con-
cepts is evaluated in function of their relationships in
the thesaurus and in their contexts.
Most of WSD approaches are evaluated applying
a standard data set benchmark, called Senseval (Kil-
garriff, 1998). The second version of Senseval con-
tains a set of corpus, lemmas and instances where au-
thors can test the robustness and precision of theirs
algorithms. In our case, we can not apply this bench-
mark since our data source have to be represented
with OWL language.
3 DISAMBIGUATION CONCEPT
PROCESS
Our idea behind this algorithm is based on the ca-
pacity of finding correspondences between elements
of both structures, and later the weighting each cor-
respondence according a vote system that combines
following information giving an assessment: lexical
function among elements, semantic function of the
correspondence, and the ambiguity of both elements.
Thus, we have defined a category of elements that we
can find in WordNet and the ontology and also, the
voting system of each correspondence. Both tasks are
complemented establishing the context that it reduces
the search space and increases the accuracy and com-
putational cost effectiveness.
3.1 Elements
In an OWL ontology, we manage only concepts,
but the rest of elements (properties, axioms, and in-
stances) have an influence on voting system. An OWL
concept either can be simple like: dog and hotdog
or can be composed of two terms like: WarmTem-
perature and PizzaTopping. For that reason, a con-
cept has one o more possible equivalences in Word-
Net (iff it is right spelling). A tokenization process
which is based on capitalisation letters (i.e. PizzaTop-
ping, hotAir but not hotDog) and special characters
(i.e. hot dog, hot#Temperature) splits the words of a
concept. If the concept has two words then it will be
manage like two different words in the algorithm. All
user definitions of ontology elements go through a to-
kenization (concepts, individuals and properties) and
stemming process (properties), both tasks increase the
flexibility degree of our algorithm.
Anyway, a simple or composed concept is trans-
formed in a word with meaning in WordNet. Each
word in WordNet has a set of senses. Each
sense has a set of synonyms/antonyms, hyper-
onym/hyponyms, meronyms/holonyms, and one defi-
nition (gloss) which adds more terms. Glosses’ terms
are previously filtered and stemmed since we avoid
unnecessary analysis (i.e. articles, prepositions) and
loss by noun coincidences (i.e. plurals, verb forms,
etc.), respectively.
3.2 Nomenclature
To ease the reading, a concept from ontology is de-
fined by C, each concept in WordNet has a set of
simple words: C sW
i
. Each simple word has a
set of senses: wS
i
: S
i
. Hereafter, i-indices are inde-
pendent among elements. Each sense has a group of
gloss’ terms: wS
i
: S
i
: t
i
, that is equivalent to t
i
: S
i
:
wS
i
. Also, each sense has a set of semantic WordNet
constructors: S
i
: H pon
i
, H per
i
, Syn
i
, Ant
i
, Mer
i
, Hol
i
.
Thus, we could say that “:” means “has”, and i-index
means “element of set”. Each, H pon
i
, H per
i
, etc. can
be considerer like a sW
k
, being a recursive representa-
tion.
3.3 Correspondences
Elements can generate a correspondence when its lex-
ical coincide with other noun. We save this corre-
spondence: both elements and their semantic func-
tion. The function of each term is provided by the
semantic function in WordNet (syn., hyper., etc.) and
in the ontology (superclass, subclass, equivalent and
disjoint classes, and properties).
Only there will be a correspondence when we can
obtain at least one sense from both concepts. For ex-
ample, a matching between simpleWords sWi = sW k
is not useful for our case. Instead, in a term relation
sW
i
: S
i
: t
i
=t
j
: S
j
: sW
j
both senses are present. Also,
this case sW
i
: S
i
: Syn
i
= sW
j
is valid. Therefore, there
are 8x8 possible combinations for each two simple
words.
3.4 Voting System
Each correspondence is a vote and each vote has a dif-
ferent weight according to the semantic of elements
involved and the ambiguity of their senses. For ex-
ample, a concept (C
i
sW
i
) has a direct or indirect
UNSUPERVISED ALGORITHM FOR THE CONCEPT DISAMBIGUATION IN ONTOLOGIES - Semantic Rules and
Voting System to Determine Suitable Senses
389
superclass (C
i
C
j
) in the ontology. We can search
this superclass concept(its equivalence in WordNet
sW
j
) trough a route among all senses and their hyper-
onyms across the whole tree of WordNet sWi : {S
i
:
H per
i
}
recursive search
=?(Syn
j
|sW
j
) : C
j
. The ambigu-
ity of simple words has an important role in this vot-
ing system, since an concept that has a relationship
(i.e. equivalentClasses, disjointClasses) with another
concept that it has one sense then this sense either has
more weight or modifies all senses of that concept.
There are some type of cases that can simplify
the determination of senses. We describe them only
generically due to space limit:
Hierarchical Cases. When a concept has a par-
ent or a child, we can search the possible par-
ents/children of this concept in WordNet tree.
When we found this coincidence, we have iden-
tified the pair of senses that it complies this con-
dition.
Equivalent & Disjoint Cases. When a concept
has an equivalent class, then we can determine one
of them has only one sense and then put this one
sense in the other word.
Restrictions, Object & DataType Properties
Cases. We difference two kind of data: property
signature (property name, domain, and range) and
semantic function. Property signature works like
a frequency vote. In object properties, the seman-
tic function permits to define specific rules when
there are other correspondences among this con-
cepts. Only transitive properties have been con-
sidered since that can relate other concepts with
some information. Other restrictions or construc-
tors are unionOf and complemenfOf. Both votes
works like previous case of transitive property.
Individuals Cases. Sometimes ontology de-
signer introduces some words or entities like
individual of a generic class. Our approach
manages this type of cases where these correspon-
dence receives more weight since individuals are
some specific elements of something particular.
Normal Cases. Finally, we consider the rest
of correspondence like frequencies with differ-
ent weights in function of each simple word’s
ambiguity. For example, a term correspondence
sW
i
: S
i
: t
i
=t
j
: S
j
: sW
j
increases the weights of
both simple words with a constant value. If any
simple word is not ambiguous then this weight
is bigger than initial value. We consider that the
relationships with non ambiguous words defines
strong semantic links.Other cases of interest are:
synonym correspondence sW
i
: S
i
: Syn
i
=Syn
j
:
S
j
: sW
j
both words share same sense, we as-
sume that both concepts are equivalents; sW
i
: S
i
:
Ant
i
=Syn
j
: S
j
: sW
j
both concepts are disjoint and
this correspondence will not be dealt.
3.5 Search Space
Some ontology elements are more related than other
elements. These relationships make possible to dis-
cover quickly new correspondences and senses. Thus,
we have opted to split the process in two parts: one
part discovers the sense of the candidates concepts
(concepts more related) and second part, the senses
are discovered by the algorithm that it knows the
sense of previous elements. This selection of concepts
decreases the search space and reduces computer re-
sources consumption. Our selection of predominant
concepts is based on our previous work (Lera et al.,
2008), where we have modified the idea that is pre-
sented in that paper applying clustering algorithms in-
stead a simple formulas. The clustering parameters
are the depth relative (the highest depth and its depth
according with its children), the number of direct sub-
classes, the number of relationships with range on self
and the number of individuals.
4 DEVELOPMENT
To avoid the consumption of resources, we have de-
veloped a procedure where all concept are analysed
in function of its importance within context. Further-
more, during the analyse process we save data from
future possible concepts, avoiding unnecessary com-
putations. We have used a hash index that clusters
with coincident elements. When a new element ap-
pears we add the element in this structure, whether
proceed we create the correspondence or we mod-
ify the senses according rules avoiding unnecessary
searches around other branches.
When we try to find coincidences in the multiples
branches of hierarchies we apply a breadth first search
(BFS) algorithm with a threshold of depth. Also, the
algorithm of Stemming we have used the Lancaster
Algorithm
1
.
5 EVALUATION
To evaluate our algorithm, it is necessary the avail-
ability ontologies of reference, it is to say, that these
ontologies are handmade specifically for disambigua-
tion benchmarking. In the absence of that, we use
1
www.comp.lancs.ac.uk/computing/research/stemming/
KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development
390
80
213.88
1.02
Candidate Cache
Candidate
No Candidates
Rest
Concepts
Candidate
Concepts
Selection of
Candidates
Load Model
ms
80
114.52
60.16
112.28
61.08
97.66
80
First Time
A
B
C
Figure 1: Algorithm’s response time.
datasets from Ontology Alignment Evaluation Initia-
tive (OAEI) such as: anatomy, conference, directory,
etc. We have used the conference dataset
2
, it is a col-
lection of ontologies describing the domain of organ-
ising conferences. We have analysed all of them and
we have taken time response measures each part of
our algorithm. The accuracy is checked manually and
practically results are identical in both situations.
Instead, the mean of response times change
whether we compute the candidate concepts the first
time (fig. 1: A-row), or whether we have already
available this set of candidates (B-row). Also memory
consumption is lower in this case than in the others.
And finally, C-Row is a “normal” execution dealing
all concepts in the same way.
6 CONCLUSIONS
We have presented an algorithm that is able to de-
termine the sense of ontology concepts. To do that,
all OWL constructors are considered and are mapped
with WordNet structure to achieve a formal way to
compare elements. All correspondences are dealing
with the semantic function that have the elements in-
volved. Some rules are triggered in special cases,
for example, when we have equivalent concepts or
transitive properties. In comparison with other ap-
proaches we have incorporated more resources with
semantic knowledge to increase process efficiency in
terms of success and computational cost. We do not
be able to compare our results with another works
since our input data are ontologies instead documents
or other types of unstructured documents. We have
noticed that if both structure WordNet-ontology and
their names are similar, then the number of correspon-
dences and rules triggered decreases considerably the
2
http://nb.vse.cz/ svabo/oaei2010/
algorithm execution time. Results encourage us to ex-
ploit some specific OWL constructors as equivalents
complex axioms of Descriptive Logic and improving
individual rules of each case.
ACKNOWLEDGEMENTS
This work is partially supported by the project
TIN2007-60440 from Spanish Ministry of Science
and Innovation.
REFERENCES
Banek, M., Vrdoljak, B., and Tjoa, A. M. (2008). Word
Sense Disambiguation as the Primary Step of Ontol-
ogy Integration. In Proceedings of the 19th interna-
tional conference on Database and Expert Systems
Applications, pages 65–72. Springer-Verlag.
Castano, S., Ferrara, A., and Montanelli, S. (2003). H-
match: an algorithm for dynamically matching on-
tologies in peer-based systems. In In Proc. of the
1st Int. Workshop on Semantic Web and Databases
(SWDB) at VLDB 2003, pages 231–250.
Khelif, K., Gandon, F., Corby, O., and Dieng-Kuntz, R.
(2008). Using the Intension of Classes and Properties
Definition in Ontologies for Word Sense Disambigua-
tion. In Proceedings of the 16th international con-
ference on Knowledge Engineering, pages 188–197,
Berlin, Heidelberg. Springer-Verlag.
Kilgarriff, A. (1998). SENSEVAL: An Exercise in Evalu-
ating Word Sense Disambiguation Programs. In Pro-
ceedings of the First International Conference on Lan-
guage Resources and Evaluation, pages 581–588.
Lera, I., Juiz, C., and Puigjaner, R. (2008). Quick Ontol-
ogy Mapping Algorithm for distributed environments.
In Semantic Web and Web Services, volume 1, pages
107–113. CSREA Press.
Liu, S., Yu, C., and Meng, W. (2005). Word Sense Disam-
biguation in Queries. In Proceedings of the 14th ACM
international conference on Information and knowl-
edge management, pages 525–532. ACM.
UNSUPERVISED ALGORITHM FOR THE CONCEPT DISAMBIGUATION IN ONTOLOGIES - Semantic Rules and
Voting System to Determine Suitable Senses
391