ONTOLOGY LEARNING BY ANALYZING XML DOCUMENT STRUCTURE AND CONTENT

Nathalie Aussenac-Gilles, Mouna Kamel

2009

Abstract

Most existing methods for ontology learning from textual documents rely on natural language analysis. We extend these approaches by taking into account the document structure which bears additional knowledge. The documents that we deal with are XML specifications of databases. In addition to classical linguistic clues, the structural organization of such documents also contributes to convey meaning. In a first stage, we characterize the semantics of XML mark-up and of their relations. Then parsing rules are defined to exploit the XML structure of documents and to create ontology concepts and semantic relations. These rules make it possible to automatically learn a kernel of ontology from documents. In a second stage; this ontology is enriched with the results of text analysis by lexico-syntactic patterns. Both ontology learning rules and patterns are implemented in the Gate platform.

References

  1. Ahmad, K., Holmes-Higgin, P.R., 1995. SystemQuick : A unified approach to text and terminology. In Terminology in Advanced Microcomputer Applications. Proceedings of the 3rd TermNet Symposium.. 181-194. Vienna, Austria.
  2. Asher, N., Busquet, J., Vieu, L., 2001. La SDRT: une approche de la cohérence du discours dans la tradition de la sémantique dynamique. Verbum 23, 73-101.
  3. Auger, A., Barriere, C., 2008. Pattern based approaches to semantic relation extraction: a state-of-the-art. Terminology, John Benjamins, 14-1,1-19.
  4. Aussenac-Gilles, N., Despres, S., Szulman, S. 2008. The TERMINAE Method and Platform for Ontology Engineering from texts. Bridging the Gap between Text and Knowledge - Selected Contributions to Ontology Learning and Population from Text. P.
  5. Buitelaar, P. Cimiano (Eds.), IOS Press, p. 199-223.
  6. Barrière, C., Agbado, A. 2006. TerminoWeb: a software environment for term study in rich contexts. International Conference on Terminology, Standardization and Technology Transfert (TSTT 2006), Beijing (China), p. 103-113.
  7. Bourigault, D., 2002. UPERY: un outil d'analyse distributionnelle étendue pour la construction d'ontologies à partir de corpus. TALN 2002, Nancy, 24-27 juin 2002
  8. Buitelaar, P., Olejnik, D., Sintek, M., 2004. A Protégé plug-in for ontology extraction from text based on linguistic analysis. In Proceedings of the 1st European Semantic Web Symposium (ESWS), p. 31-44.
  9. Buitelaar, P., Cimiano, P., Magnini, B., 2005. Ontology Learning From Text: Methods, Evaluation and Applications. IOS Press.
  10. Charolles, M., 1997. L'encadrement du discours: Univers, Champs, Domaines et Espaces. Cahier de Recherche Linguistique, LANDISCO, URA-CNRS 1035, Univ. Nancy 2, n°6, 1-73.
  11. Daoust, F ., 1996. SATO (Système d'Analyse de Texte par Ordinateur). Version 4.0. Manuel de référence, Service d'Analyse de Texte par Ordinateur (ATO). Montréal : Université du Québec
  12. Giuliano, C., Lavelli, A., Romano, L., 2006. Exploiting Shallow Linguistic Information for Relation Extraction from Biomedical Literature. In Proc. EACL 2006.
  13. Grcar, M., Klein, E., Novak, B., 2007. Using TermMatching Algorithms for the Annotation of Geoservices. Post-proceedings of the ECML-PKDD 2007 Workshops, Springer, Berlin - Heidelberg - New York.Boston, MA: Kluwer Academic Plubisher .
  14. Grefenstette G. (1994), Explorations in Automatic Thesaurus Discovery. Boston, MA: Kluwer Academic Plubisher.
  15. Hindle, D., 1990. Noun classification from predicate argument structures. In proc. of the 28th Annual Meeting of the Association for Computational Linguistics (ACL'90), Berkeley USA.
  16. Jacquemin, C., 1997. Présentation des travaux en analyse automatique pour la reconnaissance et l'acquisition terminologique. In Séminaire du LIPN, Université Paris 13, Villetaneuse.
  17. Kamel, M., Aussenac, N., 2009. Construction automatique d'ontologies à partir de spécifications de bases de données. Ingénierie des Connaissances, Hammamet Tunisie 2009.
  18. Laurens, F., 2006. Construction d'une Ontologie à partir de Textes en Langage Naturel. Rapport de Stage Master 1 en linguistique-Informatique, September 2006.
  19. Nédellec, C., Nazarenko, A., 2003. Ontology and Information Extraction. in S. Staab & R. Studer (eds.) Handbook on Ontologies in Information Systems, Springer.
  20. Rebeyrolle, J., Tanguy, L. 2000. Repérage automatique de structures linguistiques en corpus: le cas des énoncés définitoires. Cahiers de Grammaire, 25, 153-174
  21. Tirmizi, S., Sequeda, S., Miranker, J.F, 2008. Translating
  22. SQL Applications to the Semantic Web. Dexa 2008, Turin , Italie, 450-464.
  23. Virbel, J., Luc, C., 2001. Le modèle d'architecture textuelle: fondements et expérimentation. Verbum, Vol. XXIII, N. 1, p. 103-123.
Download


Paper Citation


in Harvard Style

Aussenac-Gilles N. and Kamel M. (2009). ONTOLOGY LEARNING BY ANALYZING XML DOCUMENT STRUCTURE AND CONTENT . In Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2009) ISBN 978-989-674-012-2, pages 159-165. DOI: 10.5220/0002293301590165


in Bibtex Style

@conference{keod09,
author={Nathalie Aussenac-Gilles and Mouna Kamel},
title={ONTOLOGY LEARNING BY ANALYZING XML DOCUMENT STRUCTURE AND CONTENT},
booktitle={Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2009)},
year={2009},
pages={159-165},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002293301590165},
isbn={978-989-674-012-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2009)
TI - ONTOLOGY LEARNING BY ANALYZING XML DOCUMENT STRUCTURE AND CONTENT
SN - 978-989-674-012-2
AU - Aussenac-Gilles N.
AU - Kamel M.
PY - 2009
SP - 159
EP - 165
DO - 10.5220/0002293301590165