majority of them share the idea that the weight of a
concept corresponds to the probability that selecting
at random a resource, it is characterized by a set of
features including one representing such a concept,
or one of its descendants in the ontology. Then, the
higher the weight of a concept the lower its specificity.
For instance, the concept student has a smaller weight
than person since the former is more specific than the
latter. Therefore, in formulating a query, the lower
the weights of the concepts, the higher their selective
power, and a more focused answer set is returned.
The performance of a semantic search engine de-
pends on the semantic matchmaking method and the
approach used to weigh the reference ontology. In this
paper, we focus on the analysis of four different ap-
proaches for weighting the concepts of an ontology,
and we carry out an experiment in order to asses the
analyzed ontology weighting methods.
The presented methods are divided according to
two groups (S
´
anchez et al., 2011): (i) extensional
methods (also known as distributional methods),
where the concept weights are derived by taking into
account both the topology of the ISA hierarchy and
the content of the resource space, also referred to as
dataset, (ii) intensional methods (also known as in-
trinsic methods), where the concept weights are de-
rived on the basis of the sole topology of the ISA hi-
erarchy.
In this paper, we selected the semantic similar-
ity method SemSim (Formica et al., 2013) in order to
evaluate the assessment of the four methods. In the
mentioned paper, the authors illustrate that SemSim
outperforms the most representative similarity meth-
ods proposed in the literature, i.e., Dice, Cosine, Jac-
card, and Weighted Sum. The SemSim method re-
quires: i) a dataset consisting of a set of resources
annotated according to a given ontology, and ii) a
method for associating weights with the concepts of
the ontology. Then, SemSim has been conceived to
compute the semantic similarity between a given user
request and any annotated resource in the dataset.
With respect to this work, in the mentioned paper
we considered only two weighting methods, i.e., the
frequency and the probabilistic approaches. In this
paper, they correspond to the Annotation Frequency
Method and the Top Down Topology Method, respec-
tively. Note that, in order to be coherent with the re-
sults given in (Formica et al., 2013), in this paper we
keep the same experimental setting, in particular, the
reference ontology and the dataset presented in the
mentioned work.
The next section gives a brief overview about on-
tology weighting. Section 3 provides the basic no-
tions concerning weighted ontologies and ontology
based feature vectors and proposes a probabilistic
model for weighted ontologies. Section 4 describes
in detail the four methods. Section 5 illustrates the
assessment of the methods and, finally, Section 6 con-
cludes.
2 RELATED WORK
According to the extensional methods, also referred
to as distributional (S
´
anchez et al., 2011), the infor-
mation content of a concept is in general estimated
from the frequency distribution of terms in text cor-
pora. Hence, this type is based on the extensional
semantics of the concept itself as its probability can
be derived on the basis of the number of occurrences
of the concept in the text corpora. This approach was
used in (Jiang and Conrath, 1997), (Resnik, 1995),
and (Lin, 1998) to assess semantic similarity between
concepts. Other proposals include the inverse docu-
ment frequency (IDF) method, and the method based
on the combination of term frequency (TF) and the
IDF (Manning et al., 2008). In our work, we de-
rived the concept frequency method and the annota-
tion frequency method, respectively, from those used
in (Resnik, 1995) and the IDF.
According to the intensional methods, also re-
ferred to as intrinsic (S
´
anchez et al., 2011), informa-
tion content is computed starting from the conceptual
relations existing between concepts and, in particular,
from the taxonomic structure of concepts. With this
regard, one of the most relevant methods is presented
in (Seco et al., 2004). This is based on the number
of concepts’ hyponyms and the maximum number of
concepts in the taxonomy. In (Meng et al., 2012), the
authors present a method derived from (Seco et al.,
2004) but they also consider the degree of generality
of concepts and, hence, their depth in the taxonomy.
In (S
´
anchez et al., 2011), the authors claim that the
taxonomical leaves are enough to describe and dif-
ferentiate two concepts because ad-hoc abstractions
(e.g., abstract entities) rarely appear in a universe of
discourse, but have an impact on the size of the hy-
ponym tree. In (Hayuhardhika et al., 2013), the au-
thors propose to use the density factor to estimate con-
cept weights on the basis of the sum of inward and
outward connections with other concepts against the
total number of connections in the ontology. Finally,
just to mention one more example, (Abioui et al.,
2018) takes into account both the taxonomic structure
and other semantic relationships to compute weights
of concepts.
In this work, first of all we focus on a tree-shaped
taxonomy organized as an ISA hierarchy and, within
A Comparative Assessment of Ontology Weighting Methods in Semantic Similarity Search
507