• Symmetry: ∀c
1
, c
2
: sim(c
1
, c
2
) = sim(c
2
, c
1
)
In order to evaluate the performance of our lexical
similarity measure, experiments and results are shown
in the following section.
4 EXPERIMENTS AND
DISCUSSIONS
We use ontologies taken from the OAEI benchmark
2008
1
to test and evaluate the performance of our
measure and other ones through comparing between
their output and reference alignments. This bench-
mark consists of ontologies modified from the ref-
erence ontology 101 by changing properties, using
synonyms, extending structures and so on. Since the
measures here concentrate on calculating the string-
based similarity, only ontologies relating to modi-
fied labels and the real bibliographic ontologies are
chosen to evaluate. Consequently, the considered
ontologies consist of 101, 204, 301, 302, 303 and
304. Actually, these chosen ontologies are quite
suitable for the validation and comparison among
Needleman-Wunsch, Jaro-Winkler, Levenshtein, nor-
malized Kondrak’s method combining Dice and n-
grams approaches, with using the same classical met-
rics. These classical metrics are Precision, Recall and
F-measure and can be shown as in Eq. (25).
Precision =
No. correct f ound correspondences
No. found correspondences
Recall =
No.
correct f ound correspondences
No. existing correspondences
F − measure =
2∗ Precision∗ Recall
Precision+ Recall
(25)
Precision, Recall, F-measure and their average
values of six pairs of ontologies are presented in Table
1. Note that these results in Table 1 are obtained by
means of thresholds changed for nine different values
from 0.5 to 0.9 with the increment of 0.05; in addition,
two parameters including α = 0.2 and β = 0.4 were
applied. Based on each threshold value, the align-
ments are achieved for five participants. Then average
Precision, Recall and F-measure for all these thresh-
olds are calculated.
In Table 1, our measure gives premier value of av-
erage F-measure compared to those of other methods.
It clearly indicates that our approach is more effec-
tive than the others. Moreover, both our measure and
Levenshtein’s are slightly better than Kondrak’s met-
ric for each pair of ontologies. For the ontology 101,
when compared to itself, all methods above produce
1
http://oaei.ontologymatching.org/
the values of Precision, Recall and F-measure to be
1.0. The value of Recall is quite important because it
lets us estimate the number of true positives which is
compared to the number of existing correspondences
in the reference alignment. In general, with the same
value of Recall, the measure which is better provides
higher Precision. Although Recall values of Leven-
shtein, Kondrak, Jaro-Winkler, Needleman-Wunsch
measures and ours are similar for ontology 301, our
measure gives better Precision values than those of
these measures. That means our approach is better
than existing methods. Since ontology 301 consists
of concepts which are slightly or completely modi-
fied from reference ontology, the number of obtained
true positive concepts are the same for string-based
metrics mentioned before. Thus, in this case Recall
measures have the same values in all methods. Be-
cause ontology 204 only contains concepts modified
from the reference one by adding underscores, abbre-
viations and so on, the measures achieve the rather
high results of F-measure. Ontology 304 has simi-
lar vocabularies to the ontology 101, so Precision and
Recall values which are achieved for this pair of on-
tologies are also good. Jaro-Winkler measure is also
known as a good approach because its average Re-
call value is slightly higher than others. However its
average Precision is significantly lower than others,
for example: 0.773 compared to 0.930, 0.786, 0.899
and 0.957. Therefore, the number of obtained false
positive concepts of Jaro-Winkler is higher than other
measures. This phenomenon occurs in the same man-
ner in the pairs of ontologies 302 and 303.
Besides the above evaluation, our measure is also
more rational in several cases. For example, given
two strings c
1
=‘glass’ and c
2
=‘grass’. There is only
one edit transforming c
1
into c
2
: the substitution of
‘l’ with ‘r’. Therefore, the Levenshtein distance be-
tween two strings ‘glass’ and ‘grass’ is 1. Apply-
ing Eq. (11) and Eq. (22), the similarity between two
strings ‘glass’ and ‘grass’ is 0.8 while the similar-
ity degree of our measure yields 0.5. In fact, the
two strings ‘glass’ and ‘grass’ describe different ob-
jects. While the Levenshtein measure returns the
height similarity score value (0.8), the result 0.5 of
our measure is quite reasonable. In another exam-
ple, if n ≥ 2 then two strings Rep and Rap have no
n-grams in common. In this case, applying Dice’s
measure to these strings brings the dissimilarity. Ad-
ditionally, the family of Dice’s methods has a charac-
teristic which relies on the set of samples but not on
their positions. Because the sets of bigrams of two
strings Label and Belab including {la, ab, be, el} are
the same, the similarity value of these strings equal to
1, which seems inappropriate. In short, our approach
ApplyingInformation-theoreticandEditDistanceApproachestoFlexiblyMeasureLexicalSimilarity
509