Building Domain Ontologies from Text Analysis: An Application for Question Answering

Rodolfo Delmonte

Abstract

In the field of information extraction and automatic question answering access to a domain ontology may be of great help. But the main problem is building such an ontology, a difficult and time consuming task. We propose an approach in which the domain ontology is learned from the linguistic analysis of a number of texts which represent the domain itself. NLP analysis is done with GETARUNS system. GETARUNS can build a Discourse Model and is able to assign a relevance score to each entity. From Discourse Model we extract best candidates to become concepts in the domain ontology. To arrange concepts in the correct hierarchy we use WordNet taxonomy. Once the domain ontology is built we reconsider the texts to extract information. In this phase the entities recognized at discourse level are used to create instances of the concepts. The predicate-argument structure of the verb is used to construct instance slots for concepts. Eventually, the question answering task is performed by translating the natural language question in a suitable form and use that to query the Discourse Model enriched by the ontology.

References

  1. Brill, E., Lin, J., Banko, M., Dumais, S., & Ng, A.: Data-Intensive Question Answering. In E. M. Voorhees & D. K. Harman (eds.), The Tenth Text Retrieval Conference (TREC 2001). 122-131.
  2. Ciravegna, F.: (LP)2, an Adaptive Algorithm for Information Extraction from Web related Texts. In: Proc. IJCAI-2001 Work. on Adaptive Text Extraction and Mining (2001)
  3. Riloff, E.: A Case Study in Using Linguistic Phrases for Text Categorization on the WWW. In: AAAI/ICML Work. Learning for Text Categorization (2001)
  4. Litkowski, K. C.: Syntactic Clues and Lexical Resources in Question-Answering. In E. M. Voorhees & D. K. Harman (eds.), The Ninth Text Retrieval Conference (TREC-9). NIST Special Publication 500-249. Gaithersburg, MD., (2001) 157-166
  5. Berners-Lee, T., Hendler, J., and Lassila, O. The Semantic Web. Scientific American (May 2001)
  6. Lassila, O. and Swick, R. (eds.). Resource Description Framework (RDF) model and syntax specification. Available at http://www.w3.org/ TR/1999/REC-rdf-syntax-19990222.
  7. OWL http://www.w3.org/2004/OWL/
  8. Kahan, J. and Koivunen, M. Annotea: an open RDF infrastructure for shared web annotations, in Proceedings of WWW10 (May 2001)
  9. Boris Katz, Jimmy J. Lin, Sue Felshin: The START Multimedia Information System: Current Technology and Future Directions, In Proceedings of the International Workshop on Multimedia Information Systems (MIS 2002)
  10. Borislav Popov, Atanas Kiryakov, Dimitar Manov, Angel Kirilov, Damyan Ognyanoff, Miroslav Goranov: Towards Semantic Web Information Extraction, Workshop on Human Language Technology for the Semantic Web http://gate.ac.uk/conferences/iswc2003/ proceedings/popov.pdf (2003)
  11. Chintan Patel, Kaustubh Supekar, and Yugyung Lee, OntoGenie: Extracting Ontology Instances from WWW, Workshop on Human Language Technology for the Semantic Web http://gate.ac.uk/conferences/iswc2003/proceedings/patel.pdf (2003)
  12. R. Navigli and P. Velardi: Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites, Computational Linguistics, (30-2), MIT Press, April, (2004)
  13. Delmonte R.: Parsing Preferences and Linguistic Strategies, in LDV-Forum - Zeitschrift fuer Computerlinguistik und Sprachtechnologie - "Communicating Agents", Band 17, 1,2, (2000) 56-73
  14. Delmonte R.: Parsing with GETARUN, Proc.TALN2000, 7° confèrence annuel sur le TALN, Lausanne, (2000) 133-146
  15. Delmonte R.: Generating from a Discourse Model, Proc. MT-2000, BCS, Exeter, (2000) 25-1/10
  16. Delmonte R., D. Bianchi: From Deep to Partial Understanding with GETARUNS, Proc. ROMAND 2002, Università Roma2, Roma, (2002) 57-71
  17. Delmonte R.: GETARUN PARSER - A parser equipped with Quantifier Raising and Anaphoric Binding based on LFG, Proc. LFG2002 Conference, Athens, pp.130-153, at http://cslipublications. stanford.edu/hand/miscpubsonline.html (2002)
  18. Delmonte, R.: Getaruns: a Hybrid System for Summarization and Question Answering. In Proc. Natural Language Processing (NLP) for Question-Answering, EACL, Budapest, (2003) 21-28
  19. Delmonte R.: Evaluating GETARUNS Parser with GREVAL Test Suite, Proc. ROMAND - 20th International Conference on Computational Linguistics - COLING, University of Geneva, (2004) 32-41
  20. Delmonte R.: The Semantic Web Needs Anaphora Resolution, Proc.Workshop ARQAS, 2003 International Symposium on Reference Resolution and Its Applications to Question Answering and Summarization, Venice, Ca' Foscari University, (2003) 25-32
  21. Hirschman, L. Marc Light, Eric Breck, & J. D. Buger. Deep Read: A reading comprehension system. In Proc. A CL 7899.University of Maryland (1999)
  22. Hovy, E., U. Hermjakob, & C. Lin.: The Use of External Knowledge in Factoid QA. In E. M. Voorhees & D. K. Harman (eds.), The Tenth Text Retrieval Conference (TREC 2001). (2002) 644-652
  23. Litkowski, K. C.: Syntactic Clues and Lexical Resources in Question-Answering. In E. M. Voorhees & D. K. Harman (eds.), The Ninth Text Retrieval Conference (TREC-9). (2001) 157-166
  24. Litkowski, K. C.: CL Research Experiments in TREC-10 Question-Answering. In E. M. Voorhees & D. K. Harman (eds.), The Tenth Text Retrieval Conference (TREC 2001). (2002) 122-131
  25. Ravichandran, D. & E. Hovy.: Learning Surface Text Patterns for a Question Answering System. Proceedings of the 40th ACL. Philadelphia, PA., (2002) 41-7
  26. Schwitter R., D. Mollà, R. Fournier & M. Hess.: Answer Extraction: Towards better Evaluations of NLP Systems. In Proc. Works. Reading Comprehension Tests as Evaluation for Computer-Based Language Understanding Systems, Seattle, (2000) 20-27
  27. WordNet www.cogsci.princeton.edu/wn/
  28. Jena http://www.hpl.hp.com/semweb/
  29. Dan Klein and Christopher D. Manning: Accurate Unlexicalized Parsing. ACL, (2003) 423-430
  30. D. Lin.: Dependency-based evaluation of MINIPAR. In Proceedings of the Workshop on Evaluation of Parsing Systems at LREC 1998. Granada, Spain, (1998)
  31. Sleator, Daniel, and Davy Temperley: "Parsing English with a Link Grammar." Proceedings of IWPT 7893, (1993)
  32. Delmonte R., Sara Tonelli, Marco Aldo Piccolino Boniforti, Antonella Bristot, Emanuele Pianta: VENSES - a Linguistically-Based System for Semantic Evaluation, RTE Challenge Workshop, Southampton, PASCAL - European Network of Excellence, (2005) 49-52
Download


Paper Citation


in Harvard Style

Delmonte R. (2006). Building Domain Ontologies from Text Analysis: An Application for Question Answering . In Proceedings of the 3rd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2006) ISBN 978-972-8865-50-4, pages 3-16. DOI: 10.5220/0002483200030016


in Bibtex Style

@conference{nlucs06,
author={Rodolfo Delmonte},
title={Building Domain Ontologies from Text Analysis: An Application for Question Answering},
booktitle={Proceedings of the 3rd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2006)},
year={2006},
pages={3-16},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002483200030016},
isbn={978-972-8865-50-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2006)
TI - Building Domain Ontologies from Text Analysis: An Application for Question Answering
SN - 978-972-8865-50-4
AU - Delmonte R.
PY - 2006
SP - 3
EP - 16
DO - 10.5220/0002483200030016