ON THE USE OF CORRESPONDENCE ANALYSIS TO LEARN SEED ONTOLOGIES FROM TEXT

Davide Eynard, Fabio Marfia, Matteo Matteucci

Abstract

In the present work we show our approach to generate hierarchies of concepts in the form of ontologies starting from free text. This approach relies on the statistical model of Correspondence Analysis to analyze term occurrences in text, identify the main concepts it refers to, and retrieve semantic relationships between them. We present a tool which is able to apply different methods for the generation of ontologies from text, namely hierarchy generation from hierarchical clustering representation, search for Hearst Patterns on the Web, and bootstrapping. Our evaluation shows that the precision in the generation of hierarchies of the tool is attested to be around 60% for the best automatic approach and around 90% for the best human-assisted approach.

References

  1. Alfonseca E., M. S. (2002). Extending a lexical ontology by a combination of distributional semantics signatures. In Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management.
  2. Benzcri, J.-P. (1976). L'Analyse des Donnes. Dounod.
  3. Caraballo and Charniak (1998). New figures of merit for best-first probabilistic chart parsing. In Computational Linguistics.
  4. Caraballo, S. (1999). Automatic construction of a hypernym-labeled noun hierarchy from text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics.
  5. Cimiano, P. (2006). Ontology Learning and Population from Text. Springer.
  6. Cimiano P., Handschuh S., S. S. (2004). Towards the selfannotating web. In Proceedings of the 13th World Wide Web Conference.
  7. Faure D., N. C. (1998). A corpus-based conceptual clustering method for verb frames and ontology. In Proceedings of the LREC Workshop on Adapting lexical and corpus resources to sublanguages and applications.
  8. Harris, Z. (1968). Mathematical Structures of Language. Wiley.
  9. Hearst, M. (1992). Automatic acquisition of hyponyms from a large text corpora. In Proceedings of the 14th International Conference of Computational Linguistics.
  10. Hearst M., S. H. (1993). Customizing a lexicon to better suit a computational task. In Proceedings of the ACL SIGLEX Workshop on Acquisition of Lexical Knowledge from Text.
  11. Maedche A., Pekar V., S. S. (2003). On discovering taxonomic relations from the web. Technical report, Institute AIFB - University of Karlsruhe, Germany.
  12. Murtagh, F. (2005). Correspondence Analysis and Data Coding with Java and R. Chapman & Hall.
  13. Murtagh, F. (2007). Ontology from hierarchical structure in text. Technical report, University of London Egham.
  14. Schutze, H. (1993). Word space. In Advances in Neural Information Processing Systems 5.
Download


Paper Citation


in Harvard Style

Eynard D., Marfia F. and Matteucci M. (2010). ON THE USE OF CORRESPONDENCE ANALYSIS TO LEARN SEED ONTOLOGIES FROM TEXT . In Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2010) ISBN 978-989-8425-29-4, pages 430-439. DOI: 10.5220/0003102204300439


in Bibtex Style

@conference{keod10,
author={Davide Eynard and Fabio Marfia and Matteo Matteucci},
title={ON THE USE OF CORRESPONDENCE ANALYSIS TO LEARN SEED ONTOLOGIES FROM TEXT},
booktitle={Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2010)},
year={2010},
pages={430-439},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003102204300439},
isbn={978-989-8425-29-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2010)
TI - ON THE USE OF CORRESPONDENCE ANALYSIS TO LEARN SEED ONTOLOGIES FROM TEXT
SN - 978-989-8425-29-4
AU - Eynard D.
AU - Marfia F.
AU - Matteucci M.
PY - 2010
SP - 430
EP - 439
DO - 10.5220/0003102204300439