4 COINCIDENCE BASED
WEIGHTING
In this section we introduce and discuss a new weight-
ing model for an alignment, with which we will later
design our genetic algorithm. The coincidence based
alignment weight function is sufficiently discussed in
(Haeri et al., 2006), and here, we will have a brief in-
troduction to it. Before talking about the weight itself,
lets take some time, and discuss the matter.
Consider a mapping m, between two ontologies with
graphs G
i
1
,G
i
2
, and also consider two nodes v
1
j
,v
1
k
∈
V(G
i
1
) and their matches m(v
1
j
),m(v
1
k
). The weight-
ing system should result a high weight if v
1
j
is
close to m(v
1
j
) and also is v
1
k
to m(v
1
k
) and be-
sides, e = (v
1
j
,v
1
k
) : t ∈ E(G
i
1
) is preserved under
m. In this case v
1
j
,v
1
k
are close to m(v
1
j
),m(v
1
k
), and
there is an edge both between v
1
j
,v
1
k
, and between
m(v
1
j
),m(v
1
k
). This case is considered to be the most
desired one and should be given the highest value.
In the second case lets suppose the edge is not pre-
served. Here, a negligible negative point should be
given. The reason for negative point is the fact that,
the edge is not preserved and the structural matching
of the graphs is interrupted. In this case the nodes are
very close but the edge is missing.
The farther any of the nodes is, from its match, the
lower should be the positive value of the matching.
If the edge is preserved, we give this matching a low
positive value. But when the edge is not preserved, in
fact it is an undesired matching. So we give it a nega-
tive point. In this case not only the nodes are far from
their matches, but also the edge is not preserved.
According to above considerations there should be six
different categories: (Suppose G,G
′
are graphs of two
ontologies O,O
′
to be aligned. a, b are concepts from
G, and a
′
,b
′
from G
′
)
• Category I. a and a
′
are too close
2
, and b, b
′
are
close as well. That means, the distance between a,
a
′
is low, and so is the distance between b,b
′
. The
edge between a,b is preserved so this category is
of much importance. This is because actually the
two edges coincide too much.
• Category II. In this category the edge is pre-
served but only one of the a or b is close to its
match. This is good but not as much as the previ-
ous category.
• Category III. The two peers of an edge are
close to their matches, that means, a is close to a
′
and b is close to b
′
. But the edge is not preserved.
This category should not be penalized much, be-
2
in terms of a distance function described before
cause at least concepts are close to their matches
and vertices coincide.
• Category IV. The edge between a,b is not pre-
served, and b if far from b
′
. The only positive
point of such a matching is the fact that a and a
′
are close.
• Category V. and VI. in these categories, both
a,a
′
and b,b
′
are far from one another, and the
difference is in the preservation of edges. Both
cases are not desired and should obtain low points.
According to the above sort and discussions, the
following weigh function is suggested:
w(m) = w
0
(m) − w
l
(m) − w
r
(m)
w
0
(m) =
∑
(v
1
,v
2
):t∈E(G) , (m(v
1
),m(v
2
)):t∈E(G
′
)
f(v
1
)+ f (v
2
)
w
l
(m) =
∑
(v
1
,v
2
):t∈E(G) , (m(v
1
),m(v
2
)):t/∈E(G
′
)
g(v
1
)+g(v
2
)
w
r
(m) =
∑
(v
1
,v
2
):t/∈E(G) , (m(v
1
),m(v
2
)):t∈E(G
′
)
g(v
1
)+g(v
2
)
The functions f and g, referred to as Normaliza-
tion Functions, are in the form:
f : R → R
+
g : R → R
+
f,g are related to the distance function. In fact,
f should be a positive decreasing function, so that
if δ(v,m(v)) grows, it decreases to reduce the pos-
itive point. And on the other hand g should be a
positive increasing function to grow with the growth
of δ(v, m(v)) to increase the negative point for that
match. Normalization functions are defined by tun-
ing the system. This will be described again later.
5 GENETIC ALGORITHM
This section describes the designed genetic algorithm.
Matching two general graphs in polynomial running
time algorithms is impossible, because the problem
in its general case is MAX SNP-Hard (Arora et al.,
1992). So a random search algorithm could be a good
idea when designed carefully. This led us to the idea
of using genetic algorithm.
WEBIST 2007 - International Conference on Web Information Systems and Technologies
178