UNSUPERVISED ALGORITHM FOR THE CONCEPT

DISAMBIGUATION IN ONTOLOGIES

Semantic Rules and Voting System to Determine Suitable Senses

Isaac Lera, Carlos Juiz and Ramon Puigjaner

Departament Matem

atiques i Inform

atica, Universitat de les Illes Balears, crt. Valldemossa 7.5km, Palma, Spain

Keywords:

Word sense disambiguation, Ontology, Semantic web, Ontology matching.

Abstract:

We present a new unsupervised algorithm which uses external resources and does not require any training to

determine the sense more suitable of an ontology concept. We try to ﬁnd out lexical coincidences among terms

of both resources: ontology and WordNet. Through a voting system, we give weight each sense according to

measurable parameters and logic rules, in function of semantic role of each correspondence element.

1 INTRODUCTION

Word Sense Disambiguation (WSD) consist on “de-

termine which of the senses of an ambiguous word is

invoked in a particular use of the word”. These tech-

niques are used across several applications: machine

translation, data search, knowledge discover, data in-

tegration, etc., where one of their end goals is an ef-

fective processing of overwhelming amounts of data.

We have decided to discover the sense of ontol-

ogy concepts which are represented by Ontology Web

Language (OWL), standard of W3C. We focus on this

representation for the following reasons: one of them

is implicit in one goal of the Semantic Web (SW)

to relate the meaning with data in an unequivocal

way. OWL is a formal language with a logic structure

which is highly related with other ontology elements:

concepts, axioms, instances, properties, and individu-

als -it provides new mechanisms to relate elements-,

and other reason is the numerous projects, applica-

tions and resources based on SW that arise every day.

Ontology Matching (OM) is a vital subtask of Ontol-

ogy Engineering, numerous tasks depend on it: edi-

tion, query processing, ontology and data repository

managing, and visualization. OM are a set of tech-

niques to establish relationships among ontology el-

ements that they share some common meaning. For

that, when we map two elements is necessary to ﬁnd

out their meaning applying WSD techniques.

2 RELATED WORK

We mention some works that serve us to introduce

this work. Liu et al.’s approach (Liu et al., 2005) have

some overlaps with our approach. They exploit from

WordNet links as synonyms, hyponyms, hyperonyms

and synonyms’ deﬁnitions to determine the sense of

main query words. Their system checks 4x4 coin-

cidences across query words, instead our algorithm

considers 8x8 matches with extra semantic rules and

semantic mapping with ontologies. When they found

coincidences in the apparition of words with same

name then they assign the corresponding senses of

these words. We have thought up a voting system

where the elements and semantic of the correspon-

dence determine the weight of each sense.

Banek et al. (Banek et al., 2008) present a “CSD”

algorithm that increases the efﬁciency of Ontology

Matching process. They calculate a probability func-

tion comparing the taxonomy of the ontology with a

portion of taxonomy of WordNet, establishing a cor-

respondence among the name of ontology class and

WordNet noun. The neighbourhood of a class within

each portion of taxonomy is based on next links: di-

rect subclasses and superclasses, ranges of its own

properties, and ranges where property concept is part

of the range. In our case, we use most of ontology

relationships regard to the concept, i.e. equivalent

classes or transitive properties, individuals, all POS

categories of WordNet, among others.

Khelif et al. (Khelif et al., 2008) developed a algo-

rithm for CSD based on distance that they deﬁned on

ontology data. This distance is computed by weight-

388

Lera I., Juiz C. and Puigjaner R..

UNSUPERVISED ALGORITHM FOR THE CONCEPT DISAMBIGUATION IN ONTOLOGIES - Semantic Rules and Voting System to Determine Suitable

Senses.

DOI: 10.5220/0003094403880391

In Proceedings of the International Conference on Knowledge Engineering and Ontology Development (KEOD-2010), pages 388-391

ISBN: 978-989-8425-29-4

 2010 SCITEPRESS (Science and Technology Publications, Lda.)

ing the edges of the path between both classes. This

path is a combination among hierarchical links, the

domain and range of object properties. They tested

their approach in the extraction of annotations from

the Uniﬁed Medical Language System (UMLS) ob-

taining an highest average precision. In our case,

we can apply our algorithm in any open discourse

with generic ontologies. Castano et al. (Castano

et al., 2003) presented an ontology matching algo-

rithm where the semantic afﬁnity between two con-

cepts is evaluated in function of their relationships in

the thesaurus and in their contexts.

Most of WSD approaches are evaluated applying

a standard data set benchmark, called Senseval (Kil-

garriff, 1998). The second version of Senseval con-

tains a set of corpus, lemmas and instances where au-

thors can test the robustness and precision of theirs

algorithms. In our case, we can not apply this bench-

mark since our data source have to be represented

with OWL language.

3 DISAMBIGUATION CONCEPT

PROCESS

Our idea behind this algorithm is based on the ca-

pacity of ﬁnding correspondences between elements

of both structures, and later the weighting each cor-

respondence according a vote system that combines

following information giving an assessment: lexical

function among elements, semantic function of the

correspondence, and the ambiguity of both elements.

Thus, we have deﬁned a category of elements that we

can ﬁnd in WordNet and the ontology and also, the

voting system of each correspondence. Both tasks are

complemented establishing the context that it reduces

the search space and increases the accuracy and com-

putational cost effectiveness.

3.1 Elements

In an OWL ontology, we manage only concepts,

but the rest of elements (properties, axioms, and in-

stances) have an inﬂuence on voting system. An OWL

concept either can be simple like: dog and hotdog

or can be composed of two terms like: WarmTem-

perature and PizzaTopping. For that reason, a con-

cept has one o more possible equivalences in Word-

Net (iff it is right spelling). A tokenization process

which is based on capitalisation letters (i.e. PizzaTop-

ping, hotAir but not hotDog) and special characters

(i.e. hot dog, hot#Temperature) splits the words of a

concept. If the concept has two words then it will be

manage like two different words in the algorithm. All

user deﬁnitions of ontology elements go through a to-

kenization (concepts, individuals and properties) and

stemming process (properties), both tasks increase the

ﬂexibility degree of our algorithm.

Anyway, a simple or composed concept is trans-

formed in a word with meaning in WordNet. Each

word in WordNet has a set of senses. Each

sense has a set of synonyms/antonyms, hyper-

onym/hyponyms, meronyms/holonyms, and one deﬁ-

nition (gloss) which adds more terms. Glosses’ terms

are previously ﬁltered and stemmed since we avoid

unnecessary analysis (i.e. articles, prepositions) and

loss by noun coincidences (i.e. plurals, verb forms,

etc.), respectively.

3.2 Nomenclature

To ease the reading, a concept from ontology is de-

ﬁned by C, each concept in WordNet has a set of

simple words: C ≡ sW

. Each simple word has a

set of senses: wS

: S

. Hereafter, i-indices are inde-

pendent among elements. Each sense has a group of

gloss’ terms: wS

: S

: t

, that is equivalent to t

: S

. Also, each sense has a set of semantic WordNet

constructors: S

: H pon

, H per

, Syn

, Ant

, Mer

, Hol

Thus, we could say that “:” means “has”, and i-index

means “element of set”. Each, H pon

, H per

, etc. can

be considerer like a sW

, being a recursive representa-

tion.

3.3 Correspondences

Elements can generate a correspondence when its lex-

ical coincide with other noun. We save this corre-

spondence: both elements and their semantic func-

tion. The function of each term is provided by the

semantic function in WordNet (syn., hyper., etc.) and

in the ontology (superclass, subclass, equivalent and

disjoint classes, and properties).

Only there will be a correspondence when we can

obtain at least one sense from both concepts. For ex-

ample, a matching between simpleWords sWi = sW k

is not useful for our case. Instead, in a term relation

: S

: t

: S

: sW

both senses are present. Also,

this case sW

: S

: Syn

= sW

is valid. Therefore, there

are 8x8 possible combinations for each two simple

words.

3.4 Voting System

Each correspondence is a vote and each vote has a dif-

ferent weight according to the semantic of elements

involved and the ambiguity of their senses. For ex-

ample, a concept (C

≡ sW

) has a direct or indirect

UNSUPERVISED ALGORITHM FOR THE CONCEPT DISAMBIGUATION IN ONTOLOGIES - Semantic Rules and

Voting System to Determine Suitable Senses

389

superclass (C

⊂ C

) in the ontology. We can search

this superclass concept(its equivalence in WordNet

) trough a route among all senses and their hyper-

onyms across the whole tree of WordNet sWi : {S

H per

}

recursive search

=?(Syn

|sW

) : C

. The ambigu-

ity of simple words has an important role in this vot-

ing system, since an concept that has a relationship

(i.e. equivalentClasses, disjointClasses) with another

concept that it has one sense then this sense either has

more weight or modiﬁes all senses of that concept.

There are some type of cases that can simplify

the determination of senses. We describe them only

generically due to space limit:

• Hierarchical Cases. When a concept has a par-

ent or a child, we can search the possible par-

ents/children of this concept in WordNet tree.

When we found this coincidence, we have iden-

tiﬁed the pair of senses that it complies this con-

dition.

• Equivalent & Disjoint Cases. When a concept

has an equivalent class, then we can determine one

of them has only one sense and then put this one

sense in the other word.

• Restrictions, Object & DataType Properties

Cases. We difference two kind of data: property

signature (property name, domain, and range) and

semantic function. Property signature works like

a frequency vote. In object properties, the seman-

tic function permits to deﬁne speciﬁc rules when

there are other correspondences among this con-

cepts. Only transitive properties have been con-

sidered since that can relate other concepts with

some information. Other restrictions or construc-

tors are unionOf and complemenfOf. Both votes

works like previous case of transitive property.

• Individuals Cases. Sometimes ontology de-

signer introduces some words or entities like

individual of a generic class. Our approach

manages this type of cases where these correspon-

dence receives more weight since individuals are

some speciﬁc elements of something particular.

• Normal Cases. Finally, we consider the rest

of correspondence like frequencies with differ-

ent weights in function of each simple word’s

ambiguity. For example, a term correspondence

: S

: t

: S

: sW

increases the weights of

both simple words with a constant value. If any

simple word is not ambiguous then this weight

is bigger than initial value. We consider that the

relationships with non ambiguous words deﬁnes

strong semantic links.Other cases of interest are:

synonym correspondence sW

: S

: Syn

=Syn

: sW

both words share same sense, we as-

sume that both concepts are equivalents; sW

: S

Ant

=Syn

: S

: sW

both concepts are disjoint and

this correspondence will not be dealt.

3.5 Search Space

Some ontology elements are more related than other

elements. These relationships make possible to dis-

cover quickly new correspondences and senses. Thus,

we have opted to split the process in two parts: one

part discovers the sense of the candidates concepts

(concepts more related) and second part, the senses

are discovered by the algorithm that it knows the

sense of previous elements. This selection of concepts

decreases the search space and reduces computer re-

sources consumption. Our selection of predominant

concepts is based on our previous work (Lera et al.,

2008), where we have modiﬁed the idea that is pre-

sented in that paper applying clustering algorithms in-

stead a simple formulas. The clustering parameters

are the depth relative (the highest depth and its depth

according with its children), the number of direct sub-

classes, the number of relationships with range on self

and the number of individuals.

4 DEVELOPMENT

To avoid the consumption of resources, we have de-

veloped a procedure where all concept are analysed

in function of its importance within context. Further-

more, during the analyse process we save data from

future possible concepts, avoiding unnecessary com-

putations. We have used a hash index that clusters

with coincident elements. When a new element ap-

pears we add the element in this structure, whether

proceed we create the correspondence or we mod-

ify the senses according rules avoiding unnecessary

searches around other branches.

When we try to ﬁnd coincidences in the multiples

branches of hierarchies we apply a breadth ﬁrst search

(BFS) algorithm with a threshold of depth. Also, the

algorithm of Stemming we have used the Lancaster

Algorithm

5 EVALUATION

To evaluate our algorithm, it is necessary the avail-

ability ontologies of reference, it is to say, that these

ontologies are handmade speciﬁcally for disambigua-

tion benchmarking. In the absence of that, we use

www.comp.lancs.ac.uk/computing/research/stemming/

KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development

390

213.88

1.02

Candidate Cache

Candidate

No Candidates

Rest

Concepts

Candidate

Concepts

Selection of

Candidates

Load Model

114.52

60.16

112.28

61.08

97.66

First Time

Figure 1: Algorithm’s response time.

datasets from Ontology Alignment Evaluation Initia-

tive (OAEI) such as: anatomy, conference, directory,

etc. We have used the conference dataset

, it is a col-

lection of ontologies describing the domain of organ-

ising conferences. We have analysed all of them and

we have taken time response measures each part of

our algorithm. The accuracy is checked manually and

practically results are identical in both situations.

Instead, the mean of response times change

whether we compute the candidate concepts the ﬁrst

time (ﬁg. 1: A-row), or whether we have already

available this set of candidates (B-row). Also memory

consumption is lower in this case than in the others.

And ﬁnally, C-Row is a “normal” execution dealing

all concepts in the same way.

6 CONCLUSIONS

We have presented an algorithm that is able to de-

termine the sense of ontology concepts. To do that,

all OWL constructors are considered and are mapped

with WordNet structure to achieve a formal way to

compare elements. All correspondences are dealing

with the semantic function that have the elements in-

volved. Some rules are triggered in special cases,

for example, when we have equivalent concepts or

transitive properties. In comparison with other ap-

proaches we have incorporated more resources with

semantic knowledge to increase process efﬁciency in

terms of success and computational cost. We do not

be able to compare our results with another works

since our input data are ontologies instead documents

or other types of unstructured documents. We have

noticed that if both structure WordNet-ontology and

their names are similar, then the number of correspon-

dences and rules triggered decreases considerably the

http://nb.vse.cz/ svabo/oaei2010/

algorithm execution time. Results encourage us to ex-

ploit some speciﬁc OWL constructors as equivalents

complex axioms of Descriptive Logic and improving

individual rules of each case.

ACKNOWLEDGEMENTS

This work is partially supported by the project

TIN2007-60440 from Spanish Ministry of Science

and Innovation.

REFERENCES

Banek, M., Vrdoljak, B., and Tjoa, A. M. (2008). Word

Sense Disambiguation as the Primary Step of Ontol-

ogy Integration. In Proceedings of the 19th interna-

tional conference on Database and Expert Systems

Applications, pages 65–72. Springer-Verlag.

Castano, S., Ferrara, A., and Montanelli, S. (2003). H-

match: an algorithm for dynamically matching on-

tologies in peer-based systems. In In Proc. of the

1st Int. Workshop on Semantic Web and Databases

(SWDB) at VLDB 2003, pages 231–250.

Khelif, K., Gandon, F., Corby, O., and Dieng-Kuntz, R.

(2008). Using the Intension of Classes and Properties

Deﬁnition in Ontologies for Word Sense Disambigua-

tion. In Proceedings of the 16th international con-

ference on Knowledge Engineering, pages 188–197,

Berlin, Heidelberg. Springer-Verlag.

Kilgarriff, A. (1998). SENSEVAL: An Exercise in Evalu-

ating Word Sense Disambiguation Programs. In Pro-

ceedings of the First International Conference on Lan-

guage Resources and Evaluation, pages 581–588.

Lera, I., Juiz, C., and Puigjaner, R. (2008). Quick Ontol-

ogy Mapping Algorithm for distributed environments.

In Semantic Web and Web Services, volume 1, pages

107–113. CSREA Press.

Liu, S., Yu, C., and Meng, W. (2005). Word Sense Disam-

biguation in Queries. In Proceedings of the 14th ACM

international conference on Information and knowl-

edge management, pages 525–532. ACM.

UNSUPERVISED ALGORITHM FOR THE CONCEPT DISAMBIGUATION IN ONTOLOGIES - Semantic Rules and

Voting System to Determine Suitable Senses

391