A CORRESPONDENCE REPAIR ALGORITHM BASED ON WORD

SENSE DISAMBIGUATION AND UPPER ONTOLOGIES

Angela Locoro

DIBE, Biophysical and Electronic Engineering Department, University of Genoa, Via Opera Pia 11/A, Genova, Italy

Viviana Mascardi

DISI, Computer Science Department, University of Genoa, Via Dodecaneso 35, Genova, Italy

Keywords:

Ontology matching, Upper ontologies, Correspondences repair, Word sense disambiguation.

Abstract:

In an ideal world, an ontology matching algorithm should return all the correct correspondences (it should be

complete) and should return no wrong correspondences (it should be correct). In the real world, no imple-

mented ontology matching algorithm is both correct and complete. For this reason, repairing wrong corre-

spondences in an ontology alignment is a very pressing need to obtain more accurate alignments. This paper

discusses an automatic correspondence repair method that exploits both upper ontologies to provide infor-

mative context to concepts c ∈ o and c

∈ o

belonging to an alignment a, and a context-based word sense

disambiguation algorithm to assign c and c

their correct meaning. This meaning is used to decide whether

c and c

are related, and to either keep or discard the correspondence < c, c

>∈ a, namely, to repair a. The

experiments carried on are presented and the obtained results are provided. The advantages of the approach

we propose are conﬁrmed by a total average gain of 11,5% in precision for the alignments repaired against a

2% total average error.

1 INTRODUCTION

Ontology matching is the process that, given two on-

tologies o and o

together with a set of parameters and

optional inputs, returns a set of correspondences (an

“alignment”) between entities of o and entities of o

A “correspondence repair algorithm” takes an

alignment a as input (with additional inputs and pa-

rameters if it is the case) and returns a repaired ver-

sion of a, namely a

, that has fewer wrong correspon-

dences than a, hence achieving a higher precision

Deciding that a correspondence < c, c

> is wrong

is an hard task. To this aim, the correspondence repair

algorithm may exploit the structure of the two ontolo-

gies o and o

from which c and c

are drawn and a

originates, or the constraints over c ∈ o and c

∈ o

(such as disjointness, equivalence, etc), or the seman-

tics of c and c

seen as meaningful pieces of informa-

tion.

Precision is the number of correctly found correspon-

dences with respect to the reference alignment (true pos-

itives), divided by the total number of found correspon-

dences (true positives and false positives) (Do et al., 2002).

The correspondence repair algorithm that we dis-

cuss in this paper adopts the latter approach. In par-

ticular, the way we give a semantics to concepts c and

, in order to decide if they have the same meaning,

is based on computational linguistic techniques.

According to (Agirre and Edmonds, 2007), “in the

ﬁeld of computational linguistics, word sense disam-

biguation (WSD) is deﬁned as the problem of com-

putationally determining which “sense” of a word is

activated by the use of the word in a particular con-

text”. The idea upon which our proposal roots is that

concepts in an ontology can be given a sense in al-

most the same way as words in a text. The problem

for making this idea work is ﬁnding a context for the

concepts whose sense must be identiﬁed.

In our previous work, we implemented different

algorithms that use upper ontologies

for boosting the

ontology matching process (Mascardi et al., 2009).

Quoting Wikipedia (accessed on June, the 25th, 2009),

an upper ontology (also named top-level ontology, or foun-

dation ontology) is “an attempt to create an ontology which

describes very general concepts that are the same across all

domains”.

239

Locoro A. and Mascardi V. (2009).

A CORRESPONDENCE REPAIR ALGORITHM BASED ON WORD SENSE DISAMBIGUATION AND UPPER ONTOLOGIES.

In Proceedings of the International Conference on Knowledge Engineering and Ontology Development, pages 239-246

DOI: 10.5220/0002302802390246

 SciTePress

We run experiments with SUMO-OWL, a restricted

version of SUMO translated into OWL (Niles and

Pease, 2001), OpenCyc, the open version of the com-

mercial Cyc ontology (Lenat and Guha, 1990), and

DOLCE (Gangemi et al., 2002), and we demonstrated

that upper ontologies can be proﬁtably used for en-

hancing either the precision or the recall of the ob-

tained alignment, depending on the algorithm and the

upper ontology used.

The results discussed in (Mascardi et al., 2009)

show that the goal of upper ontologies, namely sup-

porting semantic interoperability among a large num-

ber of ontologies accessible “under” them, has been

achieved: upper ontologies may provide an infor-

mative context for domain-dependent ontologies, and

two domain-dependent ontologies may be matched

with better results if they share the same context.

Starting from the above considerations, in this pa-

per we exploit upper ontologies as a means to enrich

the semantic context of concepts c ∈ o and c

∈ o

be-

longing to an alignment a, and we use a context-based

WSD algorithm to assign c and c

their correct mean-

ing. This meaning is used to decide whether c and

are related, and to either keep or discard the corre-

spondence < c, c

>∈ a, namely, to repair a.

The paper is organised as follows: Section 2 in-

troduces the main deﬁnitions and surveys the state of

the art in correspondence repair algorithms. Section

3 describes our algorithm for correspondence repair

based on the Adapted Lesk approach and on upper on-

tologies, whereas Section 4 presents experiments and

results. Section 5 concludes.

2 BACKGROUND

In this section we provide a short background on the

topics upon which our research roots: WordNet, the

Lesk algorithm for word sense disambiguation, ontol-

ogy matching, and alignment repair.

2.1 WordNet

WordNet (Fellbaum, 1998) is a large lexical database

of English, developed under the direction of George

A. Miller. Nouns, verbs, adjectives and adverbs are

grouped into sets of cognitive synonyms (synsets),

each expressing a distinct concept. A synset or syn-

onym set is deﬁned as a set of one or more syn-

onyms that are interchangeable in some context with-

out changing the truth value of the proposition in

which they are embedded.

Most synsets are connected to other synsets via a

number of semantic relations that include:

– Semantic Relations between Nouns. Hypernyms:

Y is a hypernym of X if every X is a (kind of) Y ; hy-

ponyms: Y is a hyponym of X if every Y is a (kind of)

X; coordinate terms: Y is a coordinate term of X if X

and Y share a hypernym; holonym: Y is a holonym of

X if X is a part of Y ; meronym: Y is a meronym of X

if Y is a part of X.

– Semantic Relations between Verbs. Hypernym:

the verb Y is a hypernym of the verb X if the activity X

is a (kind of) Y ; troponym: the verb Y is a troponym of

the verb X if the activity Y is doing X in some manner;

entailment: the verb Y is entailed by X if by doing X

you must be doing Y ; coordinate terms: those verbs

sharing a common hypernym.

While semantic relations apply to all members of a

synset because they share a meaning but are all mutu-

ally synonyms, words can also be connected to other

words through lexical relations, including antonymies

(opposites of each other) and derivationally related, as

well.

2.2 Lesk Algorithm

The Lesk algorithm is an algorithm for word sense

disambiguation introduced by Michael E. Lesk in

1986 (Lesk, 1986).

It is based on glosses, which are deﬁnitions of

each sense of a word, used to disambiguate a pair of

words in a sentence by looking at the highest overlap

(number of common words between the glosses).

In 2002, Satanjeev Banerjee and Ted Pedersen

(Banerjee and Pedersen, 2002) adapted Lesk algo-

rithm by using WordNet as the source of glosses.

Hereafter we give a very general description of the

main steps of Adapted Lesk approach:

1. take a sentence s, a target word tw to be disam-

biguated in s, and a context window cw of 2n + 1

words, with n words before and n words after tw;

2. for each word w in cw retrieve the set W S of Word-

net synonyms, hypernyms, hyponyms, holonyms,

meronyms, troponyms, attributes, and all the as-

sociated glosses of the words in W S, be them

(w), g

(w), ...., g

(w);

3. for each pair of words wa, wb in cw and for each

couple of glosses, g

(wa) and g

(wb), deﬁne an

overlap as the longest sequence of common con-

secutive words: this overlap results in a score stat-

ing how much g

(wa) is the right deﬁnition for wa,

and g

(wb) is the right deﬁnition for wb;

4. to disambiguate the target word tw, select the

gloss in g

(tw), ...., g

(tw) with highest score.

KEOD 2009 - International Conference on Knowledge Engineering and Ontology Development

240

2.3 Ontology Matching

This section is based on (Euzenat and Shvaiko, 2007)

and uses the same terminology adopted there.

Deﬁnition 1. Matching Process. A matching

process can be seen as a function f which takes two

ontologies o and o

, a set of parameters p and a set of

oracles and resources r, and returns an alignment A

between o and o

Deﬁnition 2. Correspondence. A correspondence

(or mapping) between an entity e belonging to ontol-

ogy o and an entity e

belonging to ontology o

is a

5-tuple < id, e,e

, R, con f > where:

• id is a unique identiﬁer of the correspondence;

• e and e

are the entities (e.g. properties, classes,

individuals) of o and o

respectively;

• R is a relation such as “equivalence”, “more gen-

eral”, “disjointness”, “overlapping”, holding be-

tween the entities e and e

• con f is a conﬁdence measure (typically in the

[0, 1] range) holding for the correspondence be-

tween the entities e and e

;

Deﬁnition 3. Alignment. An alignment of ontolo-

gies o and o

is a set of correspondences between

entities of o and o

In our approach we only consider classes (c) as en-

tities and equivalence as relation, thus we can drop R

from the 5-tuple, as it is always “≡”. Our repair algo-

rithm does not exploit conﬁdence and unique identi-

ﬁers, so we drop them too for sake of readability. For

the purposes of this paper, a correspondence is then a

couple < c, c

2.4 Alignment Repair

To the best of our knowledge, very few attempts to

solve the problem of repairing correspondences be-

tween ontologies exist (Haeri et al., 2006; Meilicke

et al., 2007; Meilicke et al., 2008a; Meilicke et al.,

2008b).

The intuition behind (Haeri et al., 2006) is that an

ontology alignment between o and o

gives a measure

for similarity across the two ontologies, which can

be seen as an estimate for the distance between each

pair of points in two directed acyclic graphs g and

embedded in a metric space. A set of techniques

for “alignment reﬁnement” is based on the deﬁnition

of the partial order on the possible isomorphisms be-

tween g and g

and on a ranking of the possible align-

ments between o and o

The approach followed in (Meilicke et al., 2007)

and (Meilicke et al., 2008a) is to interpret the prob-

lem of identifying wrong correspondences in an on-

tology alignment as a diagnosis task. The authors

formalise correspondences as “bridge rules” in dis-

tributed description logics and analyse the impact of

each correspondence on the ontologies it connects. A

correspondence that correctly states the semantic re-

lations between ontologies should not cause inconsis-

tencies in any of the ontologies. If it does, the method

computes sets of correspondences that jointly cause

a symptom and repairs each symptom by removing

correspondences from these sets.

In (Meilicke et al., 2008b) a supervised machine

learning technique is presented as a means to learn

disjointness axioms in lightweight ontologies, where

a stronger axiomatization is needed in order to apply

complex reasoning techniques to the debug of map-

pings.

To the best of our knowledge, WSD methodolo-

gies have not yet been exploited for boosting the map-

ping repair process and, among WSD techniques, the

Adapted Lesk algorithm, that characterizes our ap-

proach, has been taken into consideration neither for

ontology matching nor for mappings repair. The re-

cent survey on the exploitation of WordNet in ontol-

ogy matching (Lin and Sandkuhl, 2008) supports our

claim: it demonstrates that the use of WordNet has be-

come more and more widespread in ontology match-

ing methodologies, but it never mentions the Adapted

Lesk approach, though it is based on WordNet too.

3 ADAPTED LESK- AND UPPER

ONTOLOGY-BASED

CORRESPONDENCE REPAIR

Our approach to correspondence repair is based on the

Adapted Lesk algorithm shortly discussed in Section

2.2. It also exploits one upper ontology uo and the

WordNet thesaurus.

Suppose that we are given an alignment a between

o and o

, and we want to repair correspondences re-

lated to concept c

∈ o. We ﬁrst generate two partial

alignments o-uo and o

-uo that will be used for pro-

viding a context to c

and to the concepts related to c

in the alignment a.

Based on that, three weighted context windows

CW1, CW2 and CW3 are created. They contain:

• CW1

1. all the concepts derived from all the correspon-

dences < c

, c

> that belong to the alignment

A CORRESPONDENCE REPAIR ALGORITHM BASED ON WORD SENSE DISAMBIGUATION AND UPPER

ONTOLOGIES

241

a to be repaired (note that there may be more

∈ o

such that < c

, c

>∈ a);

2. all the concepts derived from all the correspon-

dences < c

, uoc

> that belong to the align-

ment o-uo (again, there may be more uoc

∈ uo

such that < c

, uoc

>∈ o-uo).

– CW1 has weight 1: it contains those concepts

that are closer to c

because either there is a

direct correspondence between c

and them in

a, or there is a direct correspondence between

and them in the o-uo alignment.

• CW2

1. all the concepts in uo belonging to correspon-

dences between concepts c

in CW1 and uo;

2. all the concepts in uo directly related by means

of “subClassOf” relation or other kinds of user-

deﬁned relation, to one uoc

∈ uo belonging to

CW1.

– CW2 has weight 0.5: a step further has been

moved from c

• CW3 all the concepts in uo directly related to one

uoc

∈ uo belonging to CW2.

– CW3 has weight 0.25: it contains even more

distant concepts.

Figure 1 graphically shows the concepts in the

three context windows, their distance from c

, and

their weights.

The concept c

is added to CW1, CW2 and CW3.

The following steps, based on Adapted Lesk, are

then taken for each CW:

1. retrieve all the WordNet synsets for c

2. for each synset of c

, syn(c

)

(a) retrieve all its hypernyms (hype), hyponyms

(hypo), holonyms (holo) and meronyms (mero)

(b) build the list A of all these glosses, namely

allGlosses(syn(c

, hype,hypo, holo, mero))

3. for each concept c in CW such that c ∈ CW and

c 6= c

(a) retrieve all its synsets and all the hype, hypo,

holo and mero of all synsets

(b) build the list B of all these glosses, namely

allGlosses(allsyn(c, hype,hypo, holo, mero))

4. for each element of the list A

(a) for each element of the list B

i. compare each couple of glosses and compute

the longest consecutive sequence of common

words

ii. save the obtained score in a (A) x (B) matrix

5. for each synset of c

compute the maximum score

for each c (score(g

max

(c)) by extracting the max-

imum value of each (A) x (B) matrix.

6. save the maximum score in the CW matrix, with

number of rows equal to the total number of

synsets for c

, and number of columns equal to

the total number of context words in CW.

Finally, for each synset of c

, sum by row the

scores obtained from the matrixes associated with

CW1, CW2, and CW3 by applying the weights cor-

responding to each context window. The following

formula is an example of the computation for synset

of c

, with cw1

i j

, cw2

i j

and cw3

i j

as cells of the

three CW matrixes:

∑

j=1

cw1

i j

∗ 1 +

∑

j=1

cw2

i j

∗ 0.5 +

∑

j=1

cw3

i j

∗ 0.25

(1)

The “winner” gloss for c

, namely the gloss that

is more likely to correctly deﬁne c

, is the one associ-

ated with the maximum s

The same process is repeated for each c

∈ o

that

belongs to the set of correspondences < c

, c

>∈ a,

and the “winner” gloss for c

is found.

In the end, for each correspondence, a comparison

of the “winner” glosses is done and classiﬁed in one

of the following cases:

• the glosses are equal;

• one of the glosses is equal to that of a synset of

either c

or c

;

• one of the glosses is equal to that of a hype or hypo

or holo or mero of the winner glosses of either c

or c

;

• none of the glosses satisﬁes the above conditions.

The mapping < c

, c

> is discarded if and only if it

falls under the last case otherwise it is kept.

3.1 A Practical Example

This section shows an example taken from one of

our tests, to repair all the correspondences for the

concept Culture. The ﬁnal alignment contains

only one correspondence for the Culture concept:

< Culture, Scul pture >. Looking at the common

radix of the two concepts (-ulture) there is a strong

evidence that every string-based ontology matching

method would have matched them, although they are

not equivalent concepts. The creation of the three fol-

lowing context windows (CWs) for Culture results in

the following:

KEOD 2009 - International Conference on Knowledge Engineering and Ontology Development

242

Figure 1: Weighted context windows.

• CW1 (culture, sculpture, vulture), the concepts of

the correspondence plus concepts of uo from c

uo mappings (namely vulture);

• CW2 (culture, sculpture, art, work, bird), c

plus

uo concepts from c

-uo mappings (namely sculp-

ture, art, work), plus uo concepts related to c

-uo

in uo (namely bird);

• CW3 (culture, art, work) c

plus uo concepts re-

lated to c

-uo in uo (namely art, work).

• The total number of synsets (Ss) for Culture is

7, with synset offset (the unique identiﬁer for a

WordNet synset given its syntactic category, POS)

8287844, 5751794, 5984936, 920510, 14459824,

6194409, 917759.

• The number of hype, hypo, holo, mero for each of

them is: (1,7,0,1), (1,3,0,0), (1,1,0,0), (1,1,0,0),

(1,0,0,0), (1,3,0,0), (1,4,0,0) respectively.

• The number of Ss and related terms for the other

words in the three CWs are:

– for CW1: sculpture (2 S, 4 hype and 10 hypo),

vulture (2 S, 2 hype and 4 hypo);

– for CW2: sculpture (same as above), art (4 S,

4 hype and 43 hypo and 1 holo), work (7 S, 7

hype and 87 hypo), bird (5 S, 5 hype, 37 hypo

and 20 mero);

– for CW3: art (same as above), work (same as

above).

• If we apply Equation (1) on the matrixes associ-

ated with CW1, CW2 and CW3, whose values are

depicted in Table 1, the winner gloss for Culture

turns out to be the one associated with S offset

14459824, “a highly developed state of perfec-

tion; having a ﬂawless or impeccable quality”.

The disambiguation procedure of the word Sculp-

ture is obtained considering all the correspondences

in which Sculpture is involved, in our example <

Art, Scul pture > and < Culture, Scul pture >. The

three CWs are:

• CW1 (sculpture, culture, art), all the concepts

from all the correspondences;

• CW2 (sculpture, artery, art, work, vulture), c

plus

uo concepts from c

-uo mappings (namely artery,

art, work, vulture), plus uo concepts related to c

uo in uo (namely sculpture, which is redundant

and has been deleted);

• CW3 (sculpture, blood, vessel, artifact, art, work,

bird) c

plus uo concepts related to c

-uo in uo

(namely blood, vessel, artifact, art, work, bird).

• The number of Ss for Sculpture is 2, with S offset

4157320, with 2 hype and 9 hypo, and 937656,

with 2 hype and 1 hypo. For the other CWs words

they are:

– for CW1: culture (7 S, 7 hype, 19 hypo and 1

mero), art (same as above);

– for CW2: artery (2 S, 2 hype, 76 hypo), art

(same as above), work (same as above), vulture

(same as above);

– for CW3: blood (5 S, 5 hype, 12 hypo), ves-

sel (3 S, 3 hype, 50 hypo,10 mero and 1 holo),

A CORRESPONDENCE REPAIR ALGORITHM BASED ON WORD SENSE DISAMBIGUATION AND UPPER

ONTOLOGIES

243

Table 1: Score matrixes for word Culture.

CW1 CW2 CW3

S offset sculpture vulture sculpture art work bird art work

8287844 1 0 1 1 2 1 1 2

5751794 1 0 1 2 1 1 2 1

5984936 1 0 1 1 1 0 1 1

920510 1 0 1 1 2 1 1 2

14459824 3 1 3 3 2 1 3 2

6194409 1 0 1 2 2 1 2 2

917759 1 1 1 2 3 1 2 3

Table 2: Score matrixes for word Sculpture.

CW1 CW2 CW3

S offset culture art artery art work vulture blood vessel artifact art work bird

4157320 1 5 1 5 2 1 1 1 1 5 2 2

937656 2 25 1 25 2 0 1 1 1 25 2 0

artifact (1 S, 1 hype and 45 hypo), art (same

as above), work (same as above), bird (same as

above).

• The ﬁnal CW1, CW2 and CW3 matrixes are de-

picted in Table 2. According to them, the win-

ner gloss for Sculpture is the one associated with

S offset 937656, “creating ﬁgures or designs in

three dimensions”.

• Once found the winner glosses for Culture and

Sculpture respectively, the repair procedure com-

pares them and discards the correspondence as a

wrong one.

4 EXPERIMENTS AND RESULTS

In order to conduct our experiments we have re-

used a subset of the alignments obtained while run-

ning the tests described in (Mascardi et al., 2009).

Here, we only consider alignments obtained by ex-

ploiting SUMO-OWL and OpenCyc as upper ontolo-

gies, since DOLCE turned out to be too “upper” and

gave poor results. The alignments via uo have been

obtained using two different matching algorithms: NS

(No Structure), that matches an ontology o with the uo

comparing them directly, and WS (With Structure),

that also considers sub- and super-concepts of the uo

concept involved in the matching. To build the con-

text windows depicted in the previous section we also

re-used, for each test, the partial alignments o-uo and

-uo.

Our repairing algorithm is written in Java and uses

the Alignment API

to manage the alignments as well

as the OWL version of the uos. The Java library used

to manage WordNet 3.0 is also embedded in the API.

http://alignapi.gforge.inria.fr/

The whole mappings repair procedure, with the inclu-

sion of the disambiguation step, based on the Adapted

Lesk algorithm, has been implemented from scratch

as no Java version of the Lesk algorithm seems to ex-

ist at the moment of writing.

Our complete mappings repair procedure imple-

mentation is as follows:

1. the SplitAlignment function is ﬁrst called, it re-

ceives as input an alignment and splits it in all

the subsets of the alignment corresponding to the

mappings of concept c

∈ o with every concept c

from o

as well as the subsets of the inverse align-

ment (all the mappings of concept c

∈ o

with

every concept c

from o)

2. the MakeContextWindows function is run on ev-

ery alignment subset in order to generate the three

context windows for each concept, which con-

tain all the context words listed in Section 3.

The algorithm pre-processes each context window

by deleting duplicate concepts and by removing

those concepts that do not belong to WordNet.

For composite concepts (concepts with more than

one term, e.g. ComputerHardware) a tokeniza-

tion procedure is carried on to split them into sim-

ple terms and for each term the three context win-

dows are built, each containing any other term of

the composite concept as a context word (refer-

ring to the above example the context windows

for the term Computer will also contain the word

Hardware and the same for the second term). If

the composite concept contains stop words

remove them.

3. the AdaptedLesk function provides the disam-

biguation task shown in Section 3. Before com-

puting the maximum consecutive overlap each

The list is available at http://ir.dcs.gla.ac.uk/resources/

linguistic utils/stop words.

KEOD 2009 - International Conference on Knowledge Engineering and Ontology Development

244

gloss has been treated with a stop words cleaning

and a word stemming procedure (the algorithm we

used to obtain a word stem from a term, namely

a process for removing the commoner morpho-

logical and inﬂexional endings from words, is

the Porter Stemmer, whose Java version is freely

available at Porter’s web page

). For this ﬁrst

prototype the score computation is limited to the

longest overlap between two glosses. In case there

is more than one overlap, only the best is consid-

ered and the other ones are not taken into account

in the ﬁnal score computation. The output win-

ner gloss is in form of the WordNet offset of the

synset containing the gloss as well as the label of

the concept and the score of the gloss.

4. the RepairMappings function takes the original

alignment as input and, for each mapping, makes

the checks on the semantic compatibility between

the winner glosses of the respective concepts as

stated in Section 3. If one or both concepts are

composite ones, a cross-checking is executed on

the set of all the winner glosses for each term.

The output of this ﬁnal step is a repaired align-

ment, pruned of all those mappings for which the

concept glosses resulted incompatible.

In the evaluation phase of our mappings repair al-

gorithm we have calculated the precision of the re-

paired alignment against a reference alignment. The

results of the evaluation are depicted in Table 3. The

precision has been computed using the precEval func-

tion provided by the Alignment API. The P column

shows the precision of the tests without any repair-

ing procedure whereas the PR column represents the

precision obtained from the repaired alignment after

applying our approach. The names of the ontologies

used is shown in each test column header, except for

the extension that is always .owl.

As shown by the results most of the tests present a

higher precision after the application of the Adapted

Lesk repairing procedure. In particular, for both

SUMO and OpenCyc WS methods, the comparison

with old precision always results to be in favour of the

repaired alignment. For SUMO NS method one test

only gives a worse precision after the pruning process

while for OpenCyc NS 2 tests out of 10 give no gain

and 2 tests have a lower precision. The average gain in

precision after our repair process using OpenCyc WS

amounts at 20%, using OpenCyc NS is 3%, while for

SUMO WS is 15%, and for SUMO NS is 8%. The to-

tal average gain is 11,5%. The total average error es-

timation, meaning that the algorithm discards correct

correspondences (false negatives) amounts to 2%.

http://tartarus.org/∼martin/PorterStemmer/

Table 3: Comparing Precision in Tests Results.

P PR

Test1 - Ka, Bibtex

SUMO, NS 0.67 0.83

SUMO, WS 0.38 0.46

OpenCyc, NS 0.71 0.83

OpenCyc, WS 0.56 0.83

Test2 - Biosphere, Top-bio

SUMO, NS 0.00 0.00

SUMO, WS 0.16 0.57

OpenCyc, NS 0.00 0.00

OpenCyc, WS 0.03 0.50

Test3 - Space, Geoﬁle

SUMO, NS 0.45 0.64

SUMO, WS 0.14 0.34

OpenCyc, NS 0.55 0.62

OpenCyc, WS 0.21 0.48

Test4 - Restaurant, Food

SUMO, NS 0.34 0.40

SUMO, WS 0.25 0.31

OpenCyc, NS 0.42 0.36

OpenCyc, WS 0.29 0.34

Test5 - MPEG7Genres, Subject

SUMO, NS 0.42 0.54

SUMO, WS 0.28 0.50

OpenCyc, NS 0.47 0.48

OpenCyc, WS 0.20 0.44

Test6 - Travel, Vacation

SUMO, NS 0.31 0.30

SUMO, WS 0.28 0.29

OpenCyc, NS 0.15 0.22

OpenCyc, WS 0.13 0.24

Test7 - Resume, Agent

SUMO, NS 0.44 0.51

SUMO, WS 0.25 0.41

OpenCyc, NS 0.42 0.44

OpenCyc, WS 0.22 0.41

Test8 - Resume, HL7 RBAC

SUMO, NS 0.57 0.62

SUMO, WS 0.28 0.49

OpenCyc, NS 0.71 0.70

OpenCyc, WS 0.28 0.54

Test9 - Ecology, Top-bio

SUMO, NS 0.12 0.15

SUMO, WS 0.09 0.16

OpenCyc, NS 0.14 0.14

OpenCyc, WS 0.11 0.14

Test10 - Vertebrate, Top-bio

SUMO, NS 0.29 0.40

SUMO, WS 0.07 0.15

OpenCyc, NS 0.67 0.67

OpenCyc, WS 0.16 0.28

A comparative evaluation with the other ap-

proaches depicted in section 2.4 would be difﬁcult:

methodologies are disparate, not every system is fully

automatic as our and experiments were conducted on

test sets that are not comparable

. Moreover, in (Haeri

Our test ontologies consist of 112 concepts on average.

This dimension is greater than that of the benchmark on-

tologies used in the above studies, based on the OntoFarm

A CORRESPONDENCE REPAIR ALGORITHM BASED ON WORD SENSE DISAMBIGUATION AND UPPER

ONTOLOGIES

245

et al., 2006) no experiments are presented. Never-

theless, it’s worth mentioning that in (Meilicke et al.,

2007; Meilicke et al., 2008a) the average gain in pre-

cision is 16% but the error rate is omitted, while in

(Meilicke et al., 2008b) the average gain in precision

is 16% (with average error rate 1%) and 18% (with

average error rate 5%), using reference disjointness

and learned disjointness respectively.

5 CONCLUSIONS AND FUTURE

WORK

This paper focuses on a new automatic approach for

repairing ontology alignments based on WSD tech-

niques, in particular on the Adapted Lesk algorithm,

and on upper ontologies used to enrich the context

necessary to compute the meaning disambiguation of

concepts and thus process the repairing task.

Our future work will focus on the extension of the

context window for each concept of a mapping by ex-

ploiting the local context available in the input on-

tologies, such as for example sub- and super concepts

structure as well as other related concepts. Free text

comments attached to the concept can also contribute

to the context construction.

Another extension we wish to implement, in order

to lower the error rate, is the reﬁnement of the scores

computation, considering all the possible overlaps be-

tween two glosses in the ﬁnal score composition, as

well as introducing a threshold to ﬁlter more relevant

winner glosses.

A comparative evaluation between our approach

and the other existing completely automatic method-

ologies for mappings repair on the same test set is an-

other activity that we are going to accomplish.

We also wish to exploit other WSD techniques for

both mappings repair and ontology matching tasks.

The design and implementation of an ontology match-

ing procedure based on Adapted Lesk and the ex-

ploitation of local context as well as upper ontologies

is on its way.

ACKNOWLEDGEMENTS

The work of the ﬁrst author was partly supported

by the KPLab project funded by EU 6th FP. The

work of the second author was partly supported by

the Italian research project Iniziativa Software CINI-

FinMeccanica.

Dataset, with 39 concepts on average.

REFERENCES

Agirre, E. and Edmonds, P. (2007). Word Sense Disam-

biguation - Algorithms and Applications. Springer.

Banerjee, S. and Pedersen, T. (2002). An adapted Lesk al-

gorithm for word sense disambiguation using Word-

Net. In Gelbukh, A. F., editor, CICLing 2002, volume

2276 of LNCS, pages 136–145. Springer.

Do, H. H., Melnik, S., and Rahm, E. (2002). Compar-

ison of schema matching evaluations. In Chaudhri,

A. B., Jeckle, M., Rahm, E., and Unland, R., editors,

NODe 2002, volume 2593 of LNCS, pages 221–237.

Springer.

Euzenat, J. and Shvaiko, P. (2007). Ontology Matching.

Springer.

Fellbaum, C., editor (1998). WordNet – An Electronic Lexi-

cal Database. The MIT Press.

Gangemi, A., Guarino, N., Masolo, C., Oltramari, A., and

Schneider, L. (2002). Sweetening ontologies with

DOLCE. In G

omez-P

erez, A. and Benjamins, V. R.,

editors, EKAW 2002, volume 2473 of LNCS, pages

166–181. Springer.

Haeri, S. H., Hariri, B. B., and Abolhassani, H. (2006).

Coincidence-based reﬁnement of ontology matching.

In SCIS+ISIS 2006.

Lenat, D. and Guha, R. (1990). Building large knowledge-

based systems. Addison Wesley.

Lesk, M. (1986). Automatic sense disambiguation using

machine readable dictionaries: how to tell a pine cone

from an ice cream cone. In SIGDOC ’86, pages 24–

26. ACM.

Lin, F. and Sandkuhl, K. (2008). A survey of exploiting

wordnet in ontology matching. In Bramer, M., ed-

itor, IFIP AI, volume 276 of IFIP, pages 341–350.

Springer.

Mascardi, V., Locoro, A., and Rosso, P. (2009). Automatic

ontology matching via upper ontologies: A systematic

evaluation. IEEE Trans. Knowl. Data Eng., to appear.

Meilicke, C., Stuckenschmidt, H., and Tamilin, A. (2007).

Repairing ontology mappings. In AAAI 2007, pages

1408–1413. AAAI Press.

Meilicke, C., Stuckenschmidt, H., and Tamilin, A. (2008a).

Reasoning support for mapping revision. J. Logic and

Computation.

Meilicke, C., V

olker, J., and Stuckenschmidt, H. (2008b).

Learning disjointness for debugging mappings be-

tween lightweight ontologies. In Gangemi, A. and

Euzenat, J., editors, EKAW, volume 5268 of LNCS,

pages 93–108. Springer.

Niles, I. and Pease, A. (2001). Towards a standard upper

ontology. In Welty, C. and Smith, B., editors, FOIS

2001, pages 2–9. ACM.

KEOD 2009 - International Conference on Knowledge Engineering and Ontology Development

246