align++

A Heuristic-based Method for Approximating the Mismatch-at-Risk

in Schema-based Ontology Alignment

Alexandra Mazak

, Bernhard Schandl

and Monika Lanzenberger

1,2

Institute of Software Technology and Interactive Systems, Vienna University of Technology, Vienna, Austria

European Research Council, ERCEA, Covent Garden 21/23, Place Rogier, B-1049 Brussels, Belgium

Department of Distributed and Multimedia Systems, University of Vienna, Vienna, Austria

Keywords:

Ontology alignment, Application context, Modeling focus, Heterogeneity coefﬁcient, Mismatch-at-risk met-

ric.

Abstract:

Frequently, ontologies based on the same domain are similar but also have many differences, which are known

as heterogeneity. The alignment of entities which are not meant to be used in the same context, or which

follow different modeling conventions, may cause mismatch in ontology alignment. End-users would beneﬁt

from knowing the risk level of mismatch between ontologies prior to starting a time- and cost-intensive pro-

cedure. With our heuristic-based method align++ we propose to consider the general application context of

a modeled domain (the modeling context) in order to enhance the user support in schema-based alignment.

In the method’s ﬁrst part, ontology concepts are enriched with weighting meta-information, resulting from

two indicators: importance weighting indicator and importance outdegree indicator. These indicators contain

model- and graph-based information and can be observed and measured at the schema level of an ontology.

The output of the ﬁrst part are ranking lists of importance indicators for each ontology concept in the role

of a domain class. In the second part, the candidate sample for our mismatch-risk model bases on external

user input by manually identifying concepts between the lists of each source ontology. The heterogeneity risk

among the concepts’ importance indicator values is measured as standard deviation over the candidate sam-

ple. Afterwards these measured values are aggregated, and a heterogeneity coefﬁcient is calculated. On the

basis of this risk factor the mismatch-at-risk (MaR) between ontologies can be approximated as a threshold

for schema-based ontology alignment.

1 INTRODUCTION

An ontology is an artefact representing a scope of a

real world domain for a speciﬁc purpose. In a col-

laborative modeling process multiple perspectives of

a matter are condensed into a shared conceptualiza-

tion. System analysts, in collaboration with domain

experts, represent their view of the real world by us-

ing an abstract model, an ontology. Naturally, such

models are marked by their authors’ intentions and

perspectives, and therefore cannot claim to represent

objective reality. When a group of engineers start to

conceptualize a certain domain they should agree on

some shared representation forms, e.g., an expressive

ontology language like OWL (Dean and Schreiber,

2004), and on a speciﬁc purpose for modeling this do-

main. This purpose (e.g., a certain business goal) re-

stricts the modelers’ views, and therefore the perspec-

tive on a domain. Ontology creators use entities to

represent the domain of interest in a speciﬁc context,

which results mainly from the purpose-speciﬁc usage

of the domain. We call this speciﬁc context the mod-

eling context. According to (Janiesch, 2010), “when

regarding modeling methods as social and contextual-

ized complexes, it becomes necessary to include some

stance of context in the meta model. [...], models or

parts thereof can be equipped with context”.

Frequently, ontologies that describe the same do-

main of interest are similar but also expose many dif-

ferences. These are known as heterogeneity and are

rooted in diversity in ontology modeling. One reason

for conceptual heterogeneity—which is also called

semantic heterogeneity (Euzenat, 2001)—is the dif-

ference in perspective when modeling two ontolo-

gies (Euzenat and Shvaiko, 2007). Their example

of maps addresses the problem of difference in per-

Mazak A., Schandl B. and Lanzenberger M..

align++ - A Heuristic-based Method for Approximating the Mismatch-at-Risk in Schema-based Ontology Alignment.

DOI: 10.5220/0003063600170026

In Proceedings of the International Conference on Knowledge Engineering and Ontology Development (KEOD-2010), pages 17-26

ISBN: 978-989-8425-29-4

 2010 SCITEPRESS (Science and Technology Publications, Lda.)

spective from a spatio-temporal point of view. In

(Benerecetti et al., 2001) the authors describe three

kinds of perspectives: spatio-temporal, logical, and

cognitive. Heterogeneity resulting from the ﬁrst two

kinds can be solved by DL-based techniques like

SAT solver (Giunchiglia and Shvaiko, 2003). The

pragmatic heterogeneity (Bouquet et al., 2004)—

which is called semiotic heterogeneity by (Euzenat

and Shvaiko, 2007)—results from differences in inter-

preting entities with regard to a speciﬁc context: “The

intended use of entities has a great impact on their

interpretation, therefore, matching entities which are

not meant to be used in the same context is often error-

prone” (Euzenat and Shvaiko, 2007).

In our approach we focus on semantics from a

cognitive perspective which leads to pragmatic het-

erogeneity problems in ontology alignment. There-

fore, we prefer the notion model-pragmatic instead of

model-theoretic semantics. The cognitive perspective

includes the speciﬁc purpose of a modeled domain,

and therefore it is related to the (intensional) context

layer (Ehrig, 2007) of an ontology. Additionally, a

possible mismatch risk can occur at the ontology layer

which is called explication mismatch (Klein, 2001).

This mismatch results from differences in modeling

conventions (Chalupsky, 2000), which means dissimi-

larities in describing concepts. More detailed descrip-

tions of heterogeneity and mismatch types have been

given by (Visser et al., 1997), (Chalupsky, 2000),

(Klein, 2001), and (Euzenat and Shvaiko, 2007).

Another problem in ontology alignment is to give

end-users a quick and efﬁcient overview of the source

ontologies. Additionally, they should be supported to

gain insight into the modeling process of those on-

tologies. A method which makes such an outline

feasible can give users an idea about the application

(modeling) context in which the entities are used for

a speciﬁc purpose.

This paper is structured as follows: ﬁrst, we de-

scribe the need of efﬁcient aids for user support in

schema-based ontology alignment. Then we intro-

duce our heuristic-based method align++ and present

details about its two parts. We describe the idea of

encoding context- and structure-based heterogeneity

as possible risk factors in numerical values to approx-

imate a mismatch-at-risk between ontologies. We ﬁ-

nally underpin our research assumptions of align++

Part A with an evaluation survey.

2 APPROACH

In previous works we have proposed that in addi-

tion to the two factors entity labels and relation-

ships among entities the modeling focus on enti-

ties should be additionally considered (Mazak et al.,

2010). Analogous to the demand described by (Jani-

esch, 2010), “[...] we attempt to systematize the cur-

rent perceptions of context as relevant parameters for

the adaption of conceptual modeling methods”; and

relating to (Ehrig et al., 2004), “[...] similar enti-

ties are used in similar context”. In our approach the

entities we focus on are the concepts of ontologies

(or their classes, which are concrete representations

of concepts, respectively). Our approach considers

domain knowledge as meta-information in the form

of two indicators, an importance weighting indica-

tor and an importance outdegree indicator for classes.

We denote with domain knowledge the modeling fo-

cus, which results from the context in which a certain

domain has to be modeled.

Let us assume, for instance, that there are two on-

tologies (O

and O

) that describe the same domain

of interest, a software tool for conference organiza-

tion support (OAEI, 2009). We assume two differ-

ent usage scenarios for these ontologies. In the ﬁrst

scenario, the purpose of creating both ontologies is to

describe authors and their papers (Scenario 1). There-

fore, the modeling focus of the ontology engineers is

mainly on the concepts Author, Contribution, and Ar-

ticle, as well as these concepts’ relations to other con-

cepts. In the second scenario, the speciﬁc purpose of

ontology O

is to describe the events and organiza-

tions of the conference (Scenario 2), while the pur-

pose of ontology O

remains the same as in Scenario

1. Therefore, the modeling focus of ontology O

Scenario 2 is on the concepts Working Event, Admin-

istrative Event, and Organization. The context rep-

resents the environment in which the entities of an

ontology have a certain level (importance level) of

meaning. Thus, the introduced modeling context is

equatable to the notion of application context (Ehrig

et al., 2004). The differences due to the modeling fo-

cus cause semantic, pragmatic, and also terminologi-

cal heterogeneity problems. Therefore, mismatch be-

tween ontologies may occur in the alignment process.

We have designed a heuristic-based method called

align++, which follows the objective to support

the end-user in ontology alignment by making het-

erogeneity between source ontologies visible before

starting a schema-based alignment technique. The

method provides a metric that quantiﬁes the possible

mismatch between ontologies. It helps users to gain

a better understanding of ontologies, and disburdens

them from complex, time-, and cost-intensive tasks.

The name align++ results from the two steps in which

this method is divided, an ex ante and an ex post step.

Firstly, using the techniques of the ex ante step of

KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development

Event

parallel_with

follows

has_Topic

organized_by

takes_place_in

has_Social_Event

Event

Topic

Chair

Location

Social_Event

0.05 (Lowest)

0.25 (Low)

0.50 (Middle)

0.75 (High)

0.95 (Highest)

Domain Concept Property Range Concept iweighting Level

Event

Figure 1: Example recording the importance-weighted owl:Class Event at the schema level (TBox) of an ontology.

Part A, information that results from the context and

ontology layer of an ontology can be observed and

measured. Secondly, each domain concept is anno-

tated with these measurements in the form of meta-

information by weighted values. The concepts with

their labels and computed values are recorded as or-

dered ranking lists. Possible heterogeneity factors re-

sulting from the individual process of meta-modeling

ontologies at the schema level are mapped to their

concepts. Thus, we enrich the element level with

meta-information of the structure level.

The ex post step of Part B starts with a user se-

lection of similar concepts out of the ranking lists of

two or more ontologies as input for our mismatch-risk

model. This strategy of a manually conducted con-

cept selection minimizes a possible structural falsiﬁ-

cation induced by other methods, e.g., lexical match-

ing techniques. After this user selection we evaluate

the heterogeneity risk measured as standard deviation

among the concepts’ importance indicator values, ag-

gregate the measured values, and calculate a hetero-

geneity coefﬁcient. On the basis of this risk factor the

mismatch-at-risk (MaR) between the source ontolo-

gies can be approximated as a threshold value for the

schema-based alignment process.

2.1 Part A: Evaluating

Risk-determining Indicators

We use OWL DL (Dean and Schreiber, 2004) as vo-

cabulary to describe domains of interests. There, an

ontology is a set of logical axioms that are asserted

in the TBox at the schema level. With our method we

focus on the more general context of these logical ax-

ioms or statements, rather than on situational details

of the ABox at the instance level. We agree with (Ja-

niesch, 2010) in that the use of situational context is

too detailed to allow for a meaningful reuse in ontol-

ogy alignment. Therefore, our method considers only

schema level information but no instance data. We

further assume that the modeling context is mainly

hidden in the relational structure of an ontology (cf.

Section 3). According to (Euzenat and Shvaiko,

2007), “matching ontologies from their relational (or

external) structure is very powerful [...]”, and “it is

worth considering what are the important relations

before using such techniques”—meaning techniques

which consider the relational structure of an ontology.

The modeling focus is not directly observable and

measurable, hence we need indicators that quantify

the level of meaning encoded in these schemas for fur-

ther computation. For this purpose we introduce two

indicators: the importance weighting indicator (IwI

)

and the importance outdegree indicator (IoI

) of on-

tology concepts (c). As introduced in our previous

works (Mazak et al., 2010), IwI

∈ [0; 1] results from

the importance-weighted (model-pragmatic) seman-

tics of binary relations (owl:ObjectProperties) de-

pending on their particular domain/range combina-

tions. According to (Euzenat and Shvaiko, 2007),

“the semantics of ontologies can be constrained

by additional axioms”, which are in our case the

rdfs:domain and rdfs:range assertions that con-

strain an owl:ObjectProperty (Horridge, 2004).

This means that the local semantics (meaning) of a

statement is constrained based on its purpose-speciﬁc

usage. This information is mainly encoded in the re-

lations between concepts (owl:ObjectProperties)

and not only in their taxonomic relations.

The ﬁrst step of the weighting procedure manu-

ally conducted by the ontology engineers during the

ontology design and development process, since “se-

mantics is usually speciﬁed explicitly at design time”

(Shvaiko and Euzenat, 2004). As an aid for setting

importance weighting levels in this procedure the on-

tology engineers could be geared to the competency

questions in (Gr

uninger and Fox, 1995) and (Noy and

McGuinness, 2001). The importance weighting pro-

cedure is practicable for the ontology developers also

in large ontologies. Since an importance weight is

annotated, with a simple point-and-click interaction

(Mazak et al., 2010), when the object property with

its domain/range constraints is generated at design

time. IwI

values encode the usage of entities in a

certain application context. This context layer meta-

information is annotated on the relation signature σ

align++ - A Heuristic-based Method for Approximating the Mismatch-at-Risk in Schema-based Ontology Alignment

Table 1: Importance Weighted Indicator (IwI

) values for ontologies O

and O

(a) Scenario 1: equal modeling focus.

IwI

Level confOf (O

) crs dr (O

)

Highest Contribution article

Author author

High – abstract

Middle – reviewer

– review

Low – –

Lowest Administrative event conference

Working event program

Organization chair

Person participant

Member PC –

Scholar –

(b) Scenario 2: different modeling focus.

IwI

Level confOf (O

) crs dr (O

)

Highest Administrative event article

Working event author

Organization

High – abstract

Middle – reviewer

– review

Low Scholar –

Lowest Contribution conference

Person program

Author chair

Member PC participant

– –

R → C × C (Ehrig et al., 2004) at the schema level.

Therefore, the level of context-based heterogeneity—

being a possible risk factor in the alignment process—

is encoded in the value of a concept’s IwI

In a second step we identify the importance outde-

gree indicator values (IoI

, IoI

∈ [0;1]), which result

from a weighting based on the outdegree of a concept

c in proportion to the concept with the highest outde-

gree within the ontology. Therefore, this second indi-

cator considers a possible heterogeneity risk resulting

from differences in describing concepts. More pre-

cisely, in the values of IoI

the heterogeneity risk of a

concept based on differences in modeling conventions

(Chalupsky, 2000) is encoded. Additionally, IoI

indicates the importance of concepts for structure-

based alignment techniques (e.g., graph-based meth-

ods). Such information is important for users to de-

tect efﬁcient initial points for starting alignment or

mapping methods like Anchor-PROMPT (Noy and

Musen, 2001).

Figure 1 shows an excerpt of an ontology that has

been enriched with IwI

indicators for instance, the

relation: “Event follows Event” has been weighted

with highest importance, while the relation: “Event

takes place in Location” is only of low importance.

As can be seen from this ﬁgure, the indicator-based

values of concept relations can be stored in multistage

hash maps. In the current version of align++—which

is implemented using the Eclipse environment (Gron-

back, 2009)—, concepts with their weighted values

and labels are recorded in form of ordered lists, which

can be additionally used as ranking lists (cf. Table 1

and Table 2).

2.2 Part B: Exploiting Risk Factors

during Ontology Alignment

The second part of align++, the ex post step, is ini-

tiated at the beginning of an alignment between two

ontologies. To describe this part in detail we start with

an example on the basis of the ontologies confOf (O

)

and crs dr (O

), and the two scenarios we have de-

scribed in Section 2. Further, we assume that all log-

ical statements of these two ontologies have already

been importance-weighted (cf. Section 2.1) based on

the respective scenario, and that for each domain con-

cept the IwI

and IoI

-based values have been com-

puted.

Before they start an alignment process, end-users

should be supported so that they can get a quick

and context-based overview of the source ontologies.

They should be able to easily detect the core concepts

of these ontologies, their importance in a certain ap-

plication context, and whether concepts are efﬁcient

candidates to be selected as initial points for graph-

based alignment methods or propositional techniques

(e.g. SAT solver).

Table 1 shows the lists in which the concepts of

and O

are ranked by their indicator-based val-

ues. The ranking bases on an average of the values

resulting from the weighting procedures, which have

been manually conducted by the survey respondents

(cf. Section 3). On the one hand, end-users can easily

detect the core concepts Author and author, which are

also syntactically similar, and Contribution and arti-

cle (Table 1a). As we can see, these lists help users to

take care of terminological heterogeneity, which oc-

curs due to variations in names referring to the same

concepts, like in case of Contribution and article. On

the other hand the list depicted in Table 1b shows dif-

KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development

Table 2: Importance Outdegree Indicator (IoI

IoI

Level confOf (O

) crs dr (O

)

High Person article

Middle Contribution author

Administrative event program

Working event chair

Organization

Member PC

Low Author abstract

Scholar reviewer

review

conference

participant

ferences in the ranking of concepts. These differences

are identiﬁable due to the differences of their IwI

based values. It is evident that both ontologies de-

scribe the same domain of interest, but with a differ-

ent modeling focus on it. Moreover, the user can de-

tect that the intended usage of the concepts may differ.

Thus, they can derive that the application context is

probably not the same (i.e., pragmatic heterogeneity).

We assume that this kind of heterogeneity mainly re-

sults from the interpretation of entities by humans due

to a certain application context: “this kind of hetero-

geneity is difﬁcult for the computer to detect and even

more difﬁcult to solve, because it is out of its reach”

(Euzenat and Shvaiko, 2007).

Figure 2 presents the differences in the structures

of our two example ontologies, which are reﬂected

in their respective IoI

values. O

consists of a large

number of classes, which are arranged in three hier-

archy levels. O

has signiﬁcantly fewer classes with

only two levels of hierarchy. This kind of structural

heterogeneity results from differences in describing

concepts: “[...] a distinction between two classes

can be modeled using a qualifying attribute or by in-

troducing a separate class” (Klein, 2001). (Chalup-

sky, 2000) denotes this kind of heterogeneity as dif-

ferences in modeling conventions. Table 2 presents

this possible heterogeneity factor in form of ranked

concepts based on their IoI

-based values. These val-

ues encode graph-based information; in particular, the

outdegree of a concept in proportion to the concept

with the highest outdegree in each of the source on-

tologies. In our example, the concepts Person of O

and article of O

have the most outgoing relations to

other concepts.

All these kinds of differences or heterogene-

ity cause mismatch between ontologies in their align-

ment. It would be a beneﬁt for end-users to know

the risk level of mismatch (or mismatch-at-risk) MaR

before starting a time- and cost-intensive schema-

based alignment process: “in real-world applications,

schemas/ontologies usually have both well deﬁned

and obscure labels (terms), and contexts they oc-

cur, therefore, solutions from both problems would

be mutually beneﬁcial” (Shvaiko and Euzenat, 2004).

Therefore, we introduce a statistical method in this

second part of align++ which we call mismatch-risk

model. Using this method, a possible mismatch-at-

risk (MaR) between source ontologies, which results

from heterogeneity factors at the context and ontol-

ogy layer, can be approximated.

(Shvaiko and Euzenat, 2004) point out that se-

mantics is usually given in a structure and not at the

element level. The ﬁrst input of our risk model ex-

ploits (local) model-based semantic and graph-based

syntactic meta-information from the structure level

of ontologies annotated on their concepts at the el-

ement level. This internal input results from the

semi-automated computations of the IwI

and IoI

Part A of our method (cf. Section 2.1). Accord-

ing to (Euzenat and Valtchev, 2004), “to provide the

most complete basis for comparison, one may wish

to bring knowledge encoded in relation types to the

object level”. Therefore, align++ considers structure

level meta-information encoded at the element level

to approximate the MaR as an efﬁcient benchmark.

Additionally, according to the process dimension de-

scribed by (Shvaiko and Euzenat, 2004) the input is

interpreted by an external resource in the form of hu-

man input. Therefore, the input for the mismatch-risk

model is both internal and external.

The risk model needs external user input for

Figure 2: Two differently structured ontologies describing

the same domain of interest, visualized in Prot

manually identifying matching candidates from the

ranking lists of each source ontology. This strategy

of a manually conducted concept selection minimizes

the risk of information loss, resulting from possibly

poor quality produced by automated methods. For

instance, in Scenario 1 (Table 1a) a user can easily

detect the correspondence between the concepts Con-

tribution of O

and article of O

, whereas lexical

align++ - A Heuristic-based Method for Approximating the Mismatch-at-Risk in Schema-based Ontology Alignment

matching methods cannot accomplish this.

According to (Euzenat and Shvaiko, 2007) the

technique of manually determining the candidate

sample can be classiﬁed under repository of struc-

tures. One approach of this technique has been intro-

duced by (Rahm et al., 2004). This fragment-oriented

approach decomposes a large matching problem into

smaller sub-problems on schema fragments, based on

a divide-and-conquer strategy. Therefore, schema el-

ements become special schema fragments. Various

types of schema information will be exploited by this

approach, as well as background knowledge. The

purpose of this fragment-oriented approach for our

method is to determine an efﬁcient candidate sample

for the mismatch-risk model as input.

It can be assumed that, while experienced ontol-

ogy engineers will expect a certain level of hetero-

geneity between ontologies, they have no means to

validate their expectations before they actually start

the alignment process, leading to uncertainty or the

risk of mismatch between source ontologies.

To address this, we adopt the value at risk (VaR)

metric, which is a widely used risk measure in ﬁnan-

cial mathematics (Franke et al., 2004), for Part B of

our align++ method. We analyze the variation of the

indicator-based values among the concepts of the can-

didate sample to approximate the mismatch-at-risk

(MaR) between the source ontologies. As the varia-

tion among these values increases, the probability for

MaR grows. Therefore, the risk factor in our method

is the margin of deviation among the indicator-based

values. This margin of deviation (the heterogeneity

risk) is similar to the volatility risk in ﬁnancial mar-

kets. This heterogeneity risk can be denoted as a (con-

tinuous) random variable (X). A widely used risk

measure that provides a quantiﬁed estimate of uncer-

tainty is the standard deviation σ(X), deﬁned as

σ(X) =

E[(X − µ)

] (1)

More formally, let X be a random variable with

mean value E(X) = µ. The operator E denotes the ex-

pected value of X. The standard deviation (σ) is the

square root of the expected value of (X − µ)

(Stahel,

2000). In a further step we aggregate the measured

values to compute the median absolute deviation as

a robust estimator of variation. We call this median,

which is a reliable measure of uncertainty, the hetero-

geneity coefﬁcient of the sample. On the basis of this

coefﬁcient as a cumulated risk factor we can approxi-

mated the MaR.

In ﬁnancial mathematics, the VaR risk metric sum-

marizes the distribution of possible losses by a quan-

tile, i.e., a point with a speciﬁed probability of higher

losses (Franke et al., 2004). To adopt this approach

we assume that our candidate sample underlies a nor-

mal distribution. Thus, we convert the random vari-

able X with its parameters µ and σ to a random vari-

able Z with expectation E(Z) = µ = 0 and σ = 1, us-

ing a transformation to standardize X (Meintrup and

Sch

afﬂer, 2005):

Z =

X − µ

(2)

The risk positions in the mismatch-risk model are

the IwI

-based values of the candidates (concepts) in

the sample determined by the user, or the IoI

-based

concept values, respectively. The margin of deviation

or variation among these values are the realizations of

Z. The output of the mismatch-risk model is the ap-

proximated MaR between all (i.e., two or more) on-

tologies. This is a threshold value such that the prob-

ability that the variation gets “unfavorable” because it

exceeds this value, is the given value. The MaR be-

tween source ontologies can be calculated by quan-

tiles, similar to the value-at-risk metric in ﬁnancial

statistics (Eller et al., 2002). The normalization (cf.

Equation 2) helps us to calculate these quantiles, since

it is easier to determine the scaling factor for a certain

conﬁdence level (e.g, 95% or 99%) by the quantile of

the standard normal distributed random variable Z.

In our example, let us assume the user detects the

correspondences between the concepts Author/author

and Contribution/article and selects these concepts as

the candidate sample for the mismatch-risk model.

On the basis of this external user input we calculate a

heterogeneity coefﬁcient as the median based on the

variation among the IwI

-based values of these candi-

dates. Table 3a presents this measure of uncertainty

for Scenario 1, where the modeling focus is equal for

both ontologies. The heterogeneity coefﬁcient is 0.04,

which indicates a very low risk factor. In contrast,

the coefﬁcient of Scenario 2 (cf. Table 3b), where the

modeling focus is different, indicates a high risk fac-

tor. On the basis of these heterogeneity coefﬁcients

we calculate the MaR for both scenarios for a 95%-

conﬁdence level. The 95%-quantile of a standard nor-

mal distributed random variable Z lies in a deﬁned

range between 1.64 and 1.65. Therefore, the scaling

factor for the calculation of the MaR in both tables

is 1.64. The MaR for Scenario 1 (Table 3a) approxi-

mates a very low threshold value, while in Scenario 2

(Table 3b) the mismatch-at-risk is highest with 88%.

Thus, it would be a better choice for the user to align

the ontologies in Scenario 1 than in Scenario 2.

In analogy to the calculation of the mismatch-at-

risk based on the heterogeneity at the context layer

between these ontologies, the heterogeneity risk at the

ontology layer resulting from the variation of the IoI

based concept values can be computed.

KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development

Table 3: Calculation of Heterogeneity coefﬁcients and Mismatch-at-risk levels for both scenarios.

(a) Calculations on the basis of the modeling context in Scenario 1.

IwI of concept Standard deviation of

IwI

-based values

Author 0.95 author 0.92 0.02

Contribution 0.92 article 0.83 0.06

Heterogeneity coefﬁcient 0.04

MaR with 95% conﬁdence level 7%

(b) Calculations on the basis of the modeling context in Scenario 2.

IwI of concept Standard deviation of

IwI

-based values

Author 0.13 author 0.92 0.56

Contribution 0.11 article 0.83 0.51

Heterogeneity coefﬁcient 0.53

MaR with 95% conﬁdence level 88%

3 EVALUATION

We conducted an evaluation of align++ by a

questionnaire-survey which we mailed to 20 respon-

dents. 5 female and 13 male respondents completed

the questionnaires, which were anonymized for the

analysis process. 12 of these 18 participants were re-

searchers in Computer Science, while 4 respondents

were students in the ﬁelds of Computational Intelli-

gence, Software & Information Engineering, and In-

formation & Knowledge Management. Further, two

respondents were employees in leading positions at a

software house. 12 respondents declared themselves

to be well-versed in ontology engineering and align-

ment, while the others declared themselves as versed.

In the course of this survey the respon-

dents were asked to weight all logical statements

(owl:ObjectProperties with their domain and

range axioms) of the two example ontologies confOf

) and crsdr (O

). For this purpose a simple point-

and-click user interface was implemented. With the

aid of our assumed scenarios (cf. Section 2.2) we have

predeﬁned the respective application (modeling) con-

text. In order for respondents to know how to weight

each axiom with an importance level a brief intro-

duction with examples was included with the survey

questionnaire.

The very low variation among the IwI

-based val-

ues of concepts in Scenario 1 emphasizes our assump-

tions made in Section 2, whereas the very high varia-

tion of those values among the same concepts in Sce-

nario 2 reﬂects the heterogeneity and mismatch prob-

lems that were described in Section 1 of this paper.

A further result of the participants’ weighting pro-

cedures is that all of the 18 respondents have weighted

the axioms in a nearly equal manner due to the given

modeling focus, which can be seen from Table 4. In

Scenario 1 the concepts Author and Contribution of

as well as author and article of O

have nearly

equal IwI

-based mean values, represented in Ta-

ble 3a. If the predeﬁned focus is on authors and their

papers (Scenario 1) the relations where these concepts

participate in the role of a domain class are weighted

Table 4: Importance Weighting Indicator (IwI

), calculated

from the 18 respondents’ property weightings.

Ontology O

Both Scenarios Scenario 1 Scenario 2

Respondent

author

article

Author

Contribution

Author

Contribution

1 0.95 0.84 0.95 0.95 0.05 0.05

2 0.95 0.90 0.95 0.95 0.25 0.15

3 0.95 0.79 0.95 0.95 0.05 0.15

4 0.95 0.90 0.95 0.95 0.05 0.15

5 0.95 0.79 0.95 0.85 0.25 0.05

6 0.95 0.79 0.95 0.85 0.05 0.15

7 0.95 0.84 0.85 0.85 0.25 0.05

0.95 0.84 0.95 0.85 0.25 0.15

9 0.95 0.78 0.95 0.85 0.05 0.15

10 0.95 0.85 0.85 0.85 0.05 0.05

11 0.95 0.84 0.85 0.95 0.25 0.15

12 0.95 0.79 0.95 0.95 0.25 0.15

13 0.95 0.79 0.95 0.95 0.05 0.05

14 0.95 0.85 0.95 0.95 0.05 0.05

15 0.95 0.85 0.95 0.95 0.05 0.15

16 0.95 0.79 0.95 0.95 0.25 0.05

17 0.95 0.85 0.85 0.95 0.05 0.05

0.95 0.85 0.85 0.95 0.05 0.15

mean 0.92 0.83 0.95 0.92 0.13 0.11

highest, which results in calculated IwI

-based means

with values of 0.83 and 0.95. Otherwise, if the fo-

cus of O

was on events and organizations (Scenario

2) the binary relations in which the concepts Author,

Contribution participate are weighted lowest, which

results in IwI

-based mean values of 0.13 for Au-

thor and 0.11 for Contribution. From this it follows

that the standard deviation between the IwI

-based

values of the concepts Author/author and Contribu-

tion/article is lowest with values of 0.02 and 0.05 in

Scenario 1 and highest in Scenario 2. Table 4 repre-

sents the equalities and differences of the weighting

annotations per respondent and points out that the ap-

plication context (modeling context) restricts a mod-

eler’s view (cf. Section 1 and Section 2).

After the weighting procedure the participants

were asked further questions. They were asked to an-

swer them on a 5-level Likert scale (strongly disagree,

disagree, undecided, agree, strongly agree). In the

align++ - A Heuristic-based Method for Approximating the Mismatch-at-Risk in Schema-based Ontology Alignment

following we present an overview on the ratings and

explanatory statements given by the 18 respondents.

89% strongly agree that the modeling focus of an

ontology and its entities depends on a certain per-

spective ontology engineers have in mind when con-

ceptualizing a domain of interest. They further state

that due to semantic relativism, as already known in

database engineering, models are always subjective,

which causes heterogeneity problems in the align-

ment of these models. 75% strongly agree, and 17%

agree that the meaning of ontology concepts and their

context-sensitive (purpose-speciﬁc) usage mainly de-

pends on this modeling focus. Additionally, they

agree that the common understanding of engineers

which bases on the application of the ontology is im-

portant. One of the participants mentions, “it is not

possible to model anything without the inﬂuence of

context-sensitive parameters”. Another respondent

states, “a concept can be very important in one re-

lation, and unimportant in another depending on the

modelers’ foci” (cf. Table 3). This feedback corre-

sponds with our assumptions as described before.

Answering the question whether there are other

components on which the meaning of concepts de-

pends the majority of the respondents reply with

“yes”. According to the participants these com-

ponents include “experiences, culture, stakeholders,

background of engineers, skills, environmental pa-

rameters, intended audience”. Therefore, we have

to state more precisely that with the expression “the

modeling focus mainly depends on a certain model-

ing context” we denote the application context (Ehrig

et al., 2004) of an ontology as mentioned in Section 2.

This is the context in which an ontology and its enti-

ties are modeled for a purpose-speciﬁc usage (e.g., a

certain business goal).

The participants were further asked whether they

agree that the logical statements among concepts are

an indicator for their context-sensitive usage. 91%

of the participants strongly agree with this assump-

tion. They explain that semantic relations or logi-

cal statements are a kind of formalized description

of the intended usage of the concepts. The rest

states that also the taxonomic structure, which is com-

monly used in ontology alignment should be consid-

ered, too. We assume in our approach that the local

context of concepts (i.e., their outgoing relations—

owl:ObjectProperties—to other concepts within

the ontology) is more important (cf. Section 2.1).

All respondents strongly agree that for instance,

the importance weighting degree for the relation

Author → writes → Contribution would be different

if the ontology engineers’ modeling focus was on the

authors rather than on the conference program. This

fact additionally points out that semantic as well as

semiotic heterogeneity can in fact be made visible by

our approach, as described in detail in Section 2.2.

In our approach, ontology modelers can choose

between ﬁve degrees of weighting labels: Highest,

High, Middle, Low, and Lowest Importance. We think

that users prefer to assign importance labels instead

of numerical values. 13 of the respondents state that

ﬁve degrees are enough, 4 consider three as sufﬁcient,

and 1 respondent indicates that a ﬁner-grained schema

would be better. We think that ﬁve degrees, including

a neutral level, are a reasonable compromise to con-

vey an importance weighting to the user. However,

since these ﬁve levels are mapped to the continuous

interval [0;1] the approach allows one to arbitrarily

increase or reduce the number of degrees.

All participants strongly agree that the concepts’

ranking lists, which base on the introduced indicators,

are efﬁcient to give end-users a quick and context-

based overview about the core concepts of the source

ontologies. Further, they strongly agree that due to

the indicator-based concept values end-users are able

to easily detect possible differences in the applica-

tion context or modeling focus, respectively (cf. Sec-

tion 2.2). The majority of the respondents point out

that it may be useless to align ontologies with dif-

ferent perspectives on their entities. Finally, they

strongly agree that the IwI

- and IoI

-based concept

values are efﬁcient indicators for possible heterogene-

ity risks between source ontologies in schema-based

ontology alignment.

4 RELATED WORK

Detailed surveys about techniques which also use

weights in their approach have been given by (Eu-

zenat and Shvaiko, 2007) and by (Ehrig, 2007). Some

of those techniques consider solely is-a relationships

among concepts, while others (e.g., statistical meth-

ods) exploit the instance data of ontologies. These

instances serve as representative samples to take

measurements on which comparisons between two

source ontologies can be established. In our approach

we advance a view, corresponding with (Janiesch,

2010), that the situational context at the instance level

is too detailed to allow a meaningful reuse; therefore,

we consider only schema level information.

The semantics of an is-a taxonomy is exploitable

by counting the paths within the hierarchy. The

weighting of such a taxonomy is mainly computed

by ﬁxed values (e.g., 0.5, or 1) for each path length,

depending on the distance from the root. A consider-

ation of object properties themselves is not useful to

KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development

make meaningful statements.

In our approach we consider

owl:ObjectProperties with their domain and

range axioms to make use of their semantics. Au-

tomatic ranking methods (Wu et al., 2008) identify

the importance of concepts by counting the number

of relations of one concept to another in a ﬁrst step,

while also taking into account the other concept’s

importance. However, a method that aims to consider

the concept importance in a certain application

context requires non-trivial knowledge about the

modeled domain. Thus, our method already starts

its weighting procedure (cf. Section 2.1) during the

ontology design and development process. In our

opinion nobody is better qualiﬁed to annotate ontolo-

gies with weighting factors than ontology engineers

themselves. Another beneﬁt of the approach is that

the manually annotated weighting labels are speciﬁc

values for each logical statement, instead of ﬁxed

values as in other methods.

Semantic-based techniques often build on in-

termediate formal ontologies to deﬁne a common

context or background knowledge in order to bridge

the gap caused by the lack of a common ground

for comparison. This common ground can often

be found in external resources and models (e.g.,

DOLCE, WordNet). These methods help in handling

the disambiguation of multiple possible meanings of

terms. In the align++ method such oracles are not

required. We involve the end-users as an external

resource to detect similar concepts on the basis of

the ranking lists output by Part A. These lists help

end-users to deﬁne efﬁcient candidate samples for the

mismatch-risk model. The consideration of different

heterogeneity types between source ontologies as

possible risk factors, before starting an alignment,

is new also the approximation of a mismatch-at-risk

between ontologies in schema-based alignment.

5 CONCLUSIONS AND FUTURE

WORK

The approach we provide is a heuristic-based method

to make heterogeneity visible for end-users before

starting time- and cost-intensive schema-based align-

ment methods. With our method the risk level of

a possible mismatch between ontologies can be ap-

proximated in the form of a mismatch-at-risk (MaR)

value. Therefore, if two or more ontologies are avail-

able for alignment the user can choose those two on-

tologies with a minimum MaR. Otherwise, if only

two ontologies of a certain domain are existing the

beneﬁt for the user is to know about the mismatch

risk before aligning them. Additionally, our presented

method supports users in a better understanding of the

source ontologies by providing a quick and context-

based overview of these ontologies by ranking lists

of their concepts. Currently, we conduct a detailed

user evaluation of align++ Part B; for the future, we

aim to extend our approach by considering more el-

ements of the respective ontologies (e.g., taxonomy

relationships) in the calculation of the heterogeneity

coefﬁcient.

ACKNOWLEDGEMENTS

Our special thanks go to Secure Business Austria Re-

search GmbH for their pecuniary aid in the course

of the FAMOS-Project (Female Academy for Men-

toring, Opportunities and Self-Development).

REFERENCES

Benerecetti, M., Bouquet, P., and Ghidini, C. (2001). On the

dimensions of context dependence. In Third Interna-

tional and Interdisciplinary Conference, CONTEXT,

Dundee (UK).

Bouquet, P., Euzenat, J., Franconi, E., Seraﬁni, L., Stamou,

G., and Tessaris, S. (2004). D2.2.1: Speciﬁcation of

a common framework for characterizing alignment.

Knowledge Web Consortium.

Chalupsky, H. (2000). OntoMorph: A translation system

for symbolic logic. In Anthony G. Cohn, F. G. and

Selman, B., editors, KR2000: Principles of Knowl-

edge Representation and Reasoning, pages 471–182,

San Francisco, CA.

Dean, M. and Schreiber, G. (2004). OWL Web Ontol-

ogy Language Reference (W3C Recommendation 10

February 2004). World Wide Web Consortium.

Ehrig, M. (2007). Ontology Alignment: Bridging the Se-

mantic Gap, volume 4 of Semantic Web And Beyond

Computing for Human Experience. Springer, 1st edi-

tion.

Ehrig, M., Haase, P., Hefke, M., and Stojanovic, N. (2004).

Similarity for Ontologies - a Comprehensive Frame-

work. In In Workshop Enterprise Modelling and

Ontology: Ingredients for Interoperability, at PAKM

2004, Vienna (Austria).

Eller, R., Schwaiger, W. S. A., and Federa, R.

(2002). Bankenbezogene Risiko- und Erfolgsrech-

nung. Sch

affer-Poeschel Verlag, Stuttgart (DE).

Euzenat, J. (2001). Towards a Principled Approach to Se-

mantic Interoperability. In Workshop on Ontologies

and Information Sharing, IJCAI01, Seattle (WA US).

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

10.1.1.13.9779.

Euzenat, J. and Shvaiko, P. (2007). Ontology Matching.

Springer, Heidelberg (DE).

align++ - A Heuristic-based Method for Approximating the Mismatch-at-Risk in Schema-based Ontology Alignment

Euzenat, J. and Valtchev, P. (2004). Similarity-based on-

tology alignment in OWL-Lite. In The 16th European

Conference on Artiﬁcial Intelligence, ECAI-04, Valen-

cia (Spain).

Franke, J., H

ardle, W., and Hafner, C. (2004). Einf

uhrung

in die Statistik der Finanzm

arkte, volume 2 of Statistik

und ihre Anwendungen. Springer, 1st edition.

Giunchiglia, F. and Shvaiko, P. (2003). SEMAN-

TIC MATCHING. Technical Report DIT-03-

013, University of Trento Department of In-

formation and Communication Thechnology,

38050 Povo, Trento (IT), Via Sommarive 14.

http://eprints.biblio.unitn.it/archive/00000381/01/013.

pdf.

Gronback, R. C. (2009). Eclipse Modeling Project: A

Domain-speciﬁc Language Toolkit. Addison-Wesley,

1st edition.

uninger, M. and Fox, M. S. (1995). Methodology for the

Design and Evaluation of Ontologies. In International

Joint Conference on Artiﬁcial Inteligence IJCAI95,

Workshop on Basic Ontological Issues in Knowledge

Sharing, Toronto (CA).

Horridge, M. (2004). A Practical Guide To Build-

ing OWL Ontologies With The Protege-OWL

Plugin. University of Manchester, 1 edition.

http://owl.cs.manchester.ac.uk/tutorials/protegeowltu

torial/.

Janiesch, C. (2010). Situation vs. Context: Considerations

on the Level of Detail in Modelling Method Adapta-

tion. In 43rd Hawaii International Conference on Sys-

tem Sciences, pages 1–10. IEEE Computer Society.

Klein, M. (2001). Combining and Relating Ontologies: An

Analysis of Problems and Solutions. In Gomez-Perez,

A., Gruninger, M., Stuckenschmidt, H., and Uschold,

M., editors, Workshop on Ontologies and Information

Sharing, IJCAI’01, Seattle (WA).

Mazak, A., Schandl, B., and Lanzenberger, M. (2010). En-

hancing Structure-based Ontology Alignment by En-

riching Models with Importance Weightings. In 3rd

International Workshop on Ontology Alignment and

Visualization (OnAV’10), Krakow (Poland).

Meintrup, D. and Sch

afﬂer, S. (2005). Stochastik. Statistik

und ihre Anwendungen. Springer, 1st edition.

Noy, N. F. and McGuinness, D. L. (2001). Ontology De-

velopment 101: A Guide to Creating Your First On-

tology. Technical Report SMI-2001-0880, Stanford

University, Stanford (CA), 94305.

Noy, N. F. and Musen, M. A. (2001). Anchor-PROMPT:

Using Non-local Context for Semantic Matching. In

Workshop on Ontologies and Information Sharing at

the Seventeenth International Joint Conference on Ar-

tiﬁcial Intelligence (IJCAI-2001), Seattle (WA).

OAEI (2009). OAEI-2009 Campaign Conference track.

http://oaei.ontologymatching.org/2009/conference/.

Rahm, E., Do, H.-H., and Maßmann, S. (2004). Match-

ing Large XML Schemas. In SIGMOD Record, vol-

ume 33. ACM.

Shvaiko, P. and Euzenat, J. (2004). A Survey of Schema-

based Matching Approaches. Technical Report DIT-

04-087, University of Trento, Department of Informa-

tion and Communication Technology.

Stahel, W. A. (2000). Statistische Datenanalyse. Vieweg &

Sohn Verlagsgesellschaft mbH, 3rd edition.

Visser, P. R. S., Jones, D. M., Bench-Capon, T.,

and Shave, M. (1997). An analysis of On-

tology Mismatches; Heterogeneity versus Inter-

operability. In AAAI 1997, Spring Symposium

on Ontological Engineering, Stanford (CA US).

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

10.1.1.26.6709.

Wu, G., Li, J., Feng, L., and Wang, K. (2008). Identify-

ing Potentially Important Concepts and Relations in

an Ontology. In Proceedings of the 7th International

Conference on The Semantic Web, Karlsruhe (Ger-

many).

KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development