extract relevant domain terminologies and to regroup
them into relevant classes. The second experiment
Table 2: Evaluation of the outputs compared to the gold
standard GS.
Outputs Precision Recall F-measure Ranking
O
spm
0,54 0,54 0,54 1
O
m
0,2 0,2 0,2 3
O
sp
0,3 0,3 0,3 2
have been carried out to show how far the proposed
measures take into account the gold standard approx-
imation and make an accurate evaluation comparing
to the classical ones. An astronomy ontology (gold
standard) has been built manually from texts of the as-
tronomy domain using Terminae tool (Szulman et al.,
2008). Associated to that ontology, two outputs have
been provided by Formal Concept Analysis (FCA): a
first output O1 has been settled (75 classes) by using
a list of 24 initial terms/ basic concepts (V1). A sec-
ond output O2 (111 classes) has been settled by using
a set of terms (V2) that extends V1. There is no exact
matching between the handcrafted clusters (O1 and
O2) and the gold standard. Classical measures results
are null (for O1 and O2, CP = CR = CFM = 0). This
experiment proves that the proposed evaluation mea-
sures give a more accurate information on quality of
systems’ performances.
Table 3: Evaluation of the outputs obtained by FCA method:
results.
Outputs P R FM
O1 0,5 0,17 0,25
O2 0,37 0,2 0,26
4 CONCLUSIONS AND FUTURE
WORK
We have decomposed the problem of ontology ac-
quisition evaluation into different sub-problems. In
this paper, we have focused on the evaluation of se-
mantic classes acquisition. We proposed a protocol
for comparative evaluation allowing the matching be-
tween semantic classes and the gold standard. Thus,
the quality assessment on gold standards is controver-
sial: there has to be a gold standard, quality has to be
assumed, etc. From the same textual corpus, there is a
multitude of acceptable solutions that vary from one
expert to another. In order to take into account the
variability of the gold standard we proposed to tune
the systems’ outputs. This enable to find the maximal
correspondence with the gold standard. We also pro-
posed to compute a gradual relevance, the aim is to
detect differences which are due to errors from those
that can be due to different conceptualisation choices.
Experiments have showed that the proposed eval-
uation measures gave higher values than traditional
ones and a more accurate information on the quality
of the performances of acquisition systems.
We have focused on the quality of the classifica-
tion neglecting, at the lexical level, the correspon-
dence between terms that may itself be only partial
(a term in the semantic class can be a variant of an-
other concept of the gold standard). We use a simple
matching technique in order not to distort the eval-
uation. As future work, we aim at including in our
measures, a terminological distance between labels as
proposed in (Zargayouna and Nazarenko, 2010).
ACKNOWLEDGEMENTS
The authors would like to thank Thibault Mondary
(LIPN) and Yue Ma (LIPN) for fruitful discussions.
This work was partly realized as part of the Quaero
Program, funded by OSEO, French State agency for
innovation.
REFERENCES
Brank, J., Madenic, D., and Groblenik, M. (2006). Gold
standard based ontology evaluation using instance as-
signment. In Proc. of the 4th Workshop on Evaluating
Ontologies for the Web (EON2006), Scotland.
Maedche, A. and Staab, S. (2002). Measuring Similarity
between Ontologies. In Proc. Of the European Con-
ference on Knowledge Acquisition and Management -
EKAW-2002. Spain.
Szulman, S., Aussenac-Gilles, N., and Despres, S. (2008).
The terminae method and platform for ontology engi-
neering from texts. In Bridging the Gap between Text
and Knowledge - Selected Contributions to Ontology
Learning and Population from Text. IOS press.
Wu, Z. and Palmer, M. (1994). Verb semantics and lexical
selection. In 32nd. Annual Meeting of the Association
for Computational Linguistics, pages 133 –138, New
Mexico.
Zargayouna, H. and Nazarenko, A. (2010). Evaluation
of textual knowledge acquisition tools: a challenging
task. In LREC 2010, Malta.
Zavitsanos, E., Paliouras, G., and Vouros, G. (2008). A dis-
tributional approach to evaluating ontology learning
methods using a gold standard. In Ontology Learn-
ing and Population Workshop (OLP 2008), European
Conference on Artificial Intelligence (ECAI 2008).
KEOD 2011 - International Conference on Knowledge Engineering and Ontology Development
448