Table 1: Features of the two ontologies.
Onto_SV Onto_ST
Number of concepts 615 1251
Depth 6 6
Hierarchical IS_A relations yes yes
Properties No yes
Meronymy relation No yes
Other semantic relations No yes
Learning process Supervised Unsupervised
but it is the closer one to the domain knowledge
as expressed in the specification document.
4.2 Limitations and Advantages of our
Approach
The quality of the resulting ontology depends
entirely on the quality of the specification
document: when inconsistencies appear in the
specification file, human interpretation is required to
correct their consequences in the ontology. This is
one of the advantages of formalization: it helps
localize any fuzzy information or inconsistency
within highly structured documents like these
specifications. Whatever the effort made by their
authors, meaning variations (whether lexical,
syntactical or related to the text material
presentation) are one of the features of natural
language in text. While processing the document,
several such cases occurred: either the semantics of
the relation was not the expected one, or one of the
items of an enumeration had a different status from
others, etc. A detailed study is given in (Kamel and
Aussenac, 2009).
5 CONCLUSIONS AND FUTURE
WORK
We have shown that, in the very positive context
where texts are structured with well-defined tags
with a clear semantics, it is possible to define a text
processing chain that results efficient for the
automatic construction of an ontology. This chain,
implemented with the GATE platform, includes
rules that exploit together several features of the
document: its explicit structure through available
tags and its content in natural language. The
ontology obtained with this automatic process
results rich in concepts and relations, and each of its
element is precisely connected to the text from
which it originates. This method is applicable to all
XML documents referring database specifications
and validated by the INSPIRE standard.
We are aware that this ontology contains
inconsistencies that should be manually corrected.
In the scope of the GEONTO project, ontology
manual cleaning is planned.
For the time being, we feel like enriching the
ontology automatically built up, in particular thanks
to a more systematic analysis of definitions
(especially when they contain conjunctions or
disjunctions) and the text material presentation (we
have identified several kinds of typographic marks
that were not considered yet).
REFERENCES
Ahmad, K., Holmes-Higgin, P.R., 1995. SystemQuick : A
unified approach to text and terminology. In
Terminology in Advanced Microcomputer
Applications. Proceedings of the 3
rd
TermNet
Symposium.. 181-194. Vienna, Austria.
Asher, N., Busquet, J., Vieu, L., 2001. La SDRT: une
approche de la cohérence du discours dans la tradition
de la sémantique dynamique. Verbum 23, 73-101.
Auger, A., Barriere, C., 2008. Pattern based approaches to
semantic relation extraction: a state-of-the-art.
Terminology, John Benjamins, 14-1,1-19.
Aussenac-Gilles, N., Despres, S., Szulman, S. 2008. The
TERMINAE Method and Platform for Ontology
Engineering from texts. Bridging the Gap between
Text and Knowledge - Selected Contributions to
Ontology Learning and Population from Text. P.
Buitelaar, P. Cimiano (Eds.), IOS Press, p. 199-223.
Barrière, C., Agbado, A. 2006. TerminoWeb: a software
environment for term study in rich contexts.
International Conference on Terminology,
Standardization and Technology Transfert (TSTT
2006), Beijing (China), p. 103-113.
Bourigault, D., 2002. UPERY: un outil d’analyse
distributionnelle étendue pour la construction
d’ontologies à partir de corpus. TALN 2002, Nancy,
24-27 juin 2002
Buitelaar, P., Olejnik, D., Sintek, M., 2004. A Protégé
plug-in for ontology extraction from text based on
linguistic analysis. In Proceedings of the 1
st
European
Semantic Web Symposium (ESWS), p. 31-44.
Buitelaar, P., Cimiano, P., Magnini, B., 2005. Ontology
Learning From Text: Methods, Evaluation and
Applications. IOS Press.
Charolles, M., 1997. L’encadrement du discours:
Univers, Champs, Domaines et Espaces. Cahier de
Recherche Linguistique, LANDISCO, URA-CNRS
1035, Univ. Nancy 2, n°6, 1-73.
Daoust, F ., 1996. SATO (Système d’Analyse de Texte
par Ordinateur). Version 4.0. Manuel de référence,
Service d’Analyse de Texte par Ordinateur (ATO).
Montréal : Université du Québec
Giuliano, C., Lavelli, A., Romano, L., 2006. Exploiting
Shallow Linguistic Information for Relation
KEOD 2009 - International Conference on Knowledge Engineering and Ontology Development
164