3.3 Proposed Algorithm
The proposed method is based on the assumption that
“two concepts are much more similar how less distant
and more correlated among them they are”. This sim-
ple definition makes intuitive the use of a fuzzy sys-
tem (Zadeh, 1992) to produce a similarity measure of
two concepts as a function of distance and correlation.
The measure will range in [0, 1].
The distance component is defined by the following:
dist(c
1
, c
2
) = ShortestWeightedPath(c
1
, c
2
). (5)
This term is based on a structural approach (Graph
Distance Model). It denotes the distance between two
ontology concepts. It’s expressed as the minimum
weighted sum of relations crossed to reach c
2
from
c
1
.
The correlation component is given by:
corr(c
1
, c
2
) =
µ
1 +
N
i
T
a
¶
∗
µ
1 +
N
r
T
r
.
¶
(6)
where N
i
is the number of NICAs of c
1
and c
2
, T
a
is
the total number of attributes of c
1
and c
2
, N
r
is the
number of the CRs of FCMs of c
1
and c
2
, T
r
is the
total number of relations of the FCMs of c
1
and c
2
.
This term is based on a behavioral approach and de-
notes the correlation between two ontology concepts
in terms of their specialization considering the func-
tional aspect given by attributes and relations. In our
scenario, we consider that two concepts are corre-
lated, in a behavioral sense, “if they are able to ex-
press something similar”. The expression in (6) is
made by two gain terms. It means that correlation
increases if c
1
and c
2
share many attributes not inher-
ited from their MRCAs and if their ancestors share
many relations. This is a global measure of how c
1
and c
2
carry a similar information content. The rela-
tions to consider in this formula are only those giving
an informative contribution, while structural informa-
tion (like is-a, subclass-of, etc) is neglected. To pro-
ceed in measuring similarity, we have to estimate at
first the relevant terms like ancestors, common and
not common attributes, common and not common re-
lations. In this way, we are able to measure Distance
and Correlation. These two values are used as crisp
inputs in a fuzzy system (see Fig 2). We defined the
fuzzy sets for the inputs and the similarity output us-
ing the following fuzzy rules:
• IF closed AND correlated THEN very similar
• IF closed AND average correlated THEN similar
• IF closed AND not correlated THEN average sim-
ilar
• IF far AND not correlated THEN very dissimilar
• IF far AND average correlated THEN dissimilar
• IF far AND correlated THEN average dissimilar
• IF average closed AND correlated THEN similar
• IF average closed AND average correlated THEN
average similar
• IF average closed AND not correlated THEN av-
erage dissimilar
We used Gaussian membership functions for the in-
puts, while the output has triangular one. Crisp simi-
larity is obtained from the output fuzzy set using the
centroid rule.
Figure 2: The Fuzzy System.
The proposed algorithm is based on measure combin-
ing the well known structural approach, the Shortest
Weighted Path, and an information-based one, which
relies on behavioral considerations about the concepts
in the ontology. In particular, the second term of
the measure estimates the behavioral specialization of
each node inside the ontology. This result considering
the analysis of attributes and relations. Differently
from other related works, relations inside the ontol-
ogy have particular relevance for the algorithm. Re-
lations are able to provide information regarding not
only the ontology structure, but also the contents the
ontology deals with.
3.4 Experimental Results
Before performing experiments the fuzzy system has
to be trained to tune the parameters of the membership
functions belonging to the fuzzy sets. For the training
phase we used some ontologies taken from a selection
of OWL ontologies (Protege’, 2004). Such ontologies
have been specifically developed for the Semantic
Web and provided by the Protege’ team of the Stan-
ford Medical Informatics at the Stanford University
School of Medicine. We have trained our fuzzy sys-
tem using the Matlab Fis Editor module to check our
fuzzy rules and to find the appropriate membership
functions for the two input variables Distance and
Correlation and the output variable Similarity. Fig-
ures 3, 4, 5 show the membership functions adopted
for these three values.
WEBIST 2008 - International Conference on Web Information Systems and Technologies
184