Gaylord-Bechtelar Rakszawa Poland 125 ISO9001, LEED CarbonSteel Shaping EDMMachine
Supplier City Country Capacity Certification Material Process Machine
15.01.19
Available from
01.04.19
Available to
Walsh LLC Berlin Germany 100 AS9000, MIL AluminumAlloy CNCMilling MillingMachine 01.02.19 15.03.19
Figure 3: Format of test data.
between two concepts as the information content
of their least common subsumer:
Sim
res
= IC(lcs) (4)
• Lin (Lin et al., 1998). Lin extends Resnik by in-
cluding a calculation of the information content of
the two concepts to be matched in addition to the
information content of their least common sub-
sumer:
Sim
lin
=
2 ∗ IC(lcs)
IC(c
s
) + IC(c
t
)
(5)
• Jiang-Conrath (Jiang and Conrath, 1997) propose
a hybrid approach that is derived from the edge-
based notion by adding the information content as
a decision factor. The normalised Jiang-Conrath
similarity (Seco et al., 2004) is computed as:
Sim
jc
= 1 −
IC(c
s
) + IC(c
t
) − 2 ∗ IC(lcs)
2
(6)
Apart from using different similarity techniques, the
four configurations used the same approach, allow-
ing to isolate the performance measurement to the
similarity technique applied. The evaluation was per-
formed on a machine with Intel Core i7 processor and
16 GB of RAM memory. We generated a composite
consumer query that included two sub-queries rep-
resenting different and randomised variations of the
facets. Sub-queries reflect the fact that a consumer
may want to request multiple processes in one single
query, for example both cutting and assembling metal
parts.
For each of the four configurations, the 10 top re-
sulting hits returned by the algorithm were evaluated
for correctness by three domain experts. A majority
vote was used to consolidate the evaluation results,
hence if two out of three evaluators judged a result as
correct, it was finally considered correct. The eval-
uation measure used was precision@k (Elbedweihy
et al., 2015), whereby the precision is measured rel-
ative to the rank k of the search result. For example,
precision@3 is 0.67 if 2 out of the three first search
results in the ranked list of search results are correct.
Since the experts only evaluated the top 10 search
results there is no full ground truth alignment from
which recall can be measured.
To support the domain experts in their evaluation
they were given some extra context information in the
form of a hierarchical listing of sub- and superclasses
for each of the ontology concepts relevant for each
query result. Since the domain experts had little ex-
perience with ontologies, such context information is
important for the validity and reliability of the evalu-
ation (Cheatham and Hitzler, 2014).
Figure 4 shows the results from the evaluation.
As the figure reveals, all four techniques returned
only correct search results among the top 3 suppliers.
Lin, Resnik and Wu-Palmer also achieves a 100 %
precision until the precision@6 threshold where Lin
and Resnik return 1 false positive result each while
Wu-Palmer maintains its 100 % precision. At preci-
sion@10, Wu-Palmer, the edge-based technique, ob-
tains the highest precision of 0.80. These results con-
tradict results from other experiments, e.g., those re-
ported in Resnik (Resnik, 1995) and Seco et al. (Seco
et al., 2004), where the information content-based
similarity methods perform better than edge-based
methods. One possible explanation of this is that
many of these experiments base the similarity compu-
tation on the WordNet ontology, which describes gen-
eral knowledge. When domain-specific ontologies are
used, as in our case, the results may differ (Pirr
´
o,
2009).
There are some validity threats related to the eval-
uation that should be mentioned. First of all, and
in general, the task of assessing the relevance of
search results in information retrieval evaluation can
be highly subjective (Manning et al., 2010). In this
case the queries consisted of multiple parameters and
the threshold for determining their collective correct-
ness may vary. For example, should the domain ex-
perts weigh some parameters higher than others, or
should a search result be deemed correct if 5 out of 7
parameters are considered similar? Furthermore, the
experience and knowledge of the domain experts with
regard to particular details may also vary. For exam-
ple, one expert may be aware that a particular machine
is applicable to several different materials, while the
other experts may not.
Second, the domain experts used different strate-
gies for determining correct versus false search re-
sults. The first domain expert required that both sub-
queries were fulfilled by a supplier’s offered resources
in order to state a correct result. The second domain
expert considered a search result as correct as long as
the resources offered by a supplier satisfied one out
of the two sub-queries. As did the third domain ex-
KEOD 2019 - 11th International Conference on Knowledge Engineering and Ontology Development
470