A FRAMEWORK FOR STRUCTURED KNOWLEDGE EXTRACTION AND REPRESENTATION FROM NATURAL LANGUAGE THROUGH DEEP SENTENCE ANALYSIS

Stefania Costantini, Niva Florio, Alessio Paolucci

Abstract

We present a framework that allow to extract knowledge from natural language sentences using a deep analysis technique based on linguistic dependencies. The extracted knowledge is represented in OOLOT, an intermediate format inspired by the Language of Thought (LOT) and based on Answer Set Programming (ASP). OOLOT uses ontology oriented lexicon and syntax. Finally, it is possible to export the knowledge in OWL and native ASP.

References

  1. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., and Ives, Z. (2007). Dbpedia: A nucleus for a web of open data. The Semantic Web, pages 722-735.
  2. Banerjee, S. and Pedersen, T. (2002). An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet, volume 2276 of Lecture Notes in Computer Science. Springer Berlin / Heidelberg.
  3. Baral, C., Dzifcak, J., and Son, T. C. (2008). Using answer set programming and lambda calculus to characterize natural language sentences with normatives and exceptions. In Proceedings of the 23rd national conference on Artificial intelligence - Volume 2, pages 818-823. AAAI Press.
  4. Bos, J. and Markert, K. (2005). Recognising textual entailment with logical inference. In HLT 7805: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 628-635. Association for Computational Linguistics.
  5. Cer, D., de Marneffe, M., Jurafsky, D., and Manning, C. (2010). Parsing to stanford dependencies: Trade-offs between speed and accuracy. LREC 2010.
  6. Charniak, E. (1996). Tree-bank grammars. In Proceedings of the National Conference on Artificial Intelligence, pages 1031-1036.
  7. Charniak, E. and Johnson, M. (2005). Coarse-to-fine n-best parsing and maxent discriminative reranking. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 173-180. Association for Computational Linguistics.
  8. Chomsky, N. (1956). Three models for the description of language. IEEE Transactions on Information Theory, 2(3):113-124.
  9. Chomsky, N. (1957). Syntactic Structures. The MIT Press.
  10. Church, A. (1932). A set of postulates for the foundation of logic. The Annals of Mathematics, 33(2):346-366.
  11. Collins, M. (1996). A new statistical parser based on bigram lexical dependencies. In Proceedings of the 34th annual meeting on Association for Computational Linguistics, pages 184-191. Association for Computational Linguistics.
  12. Collins, M. (1997). Three generative, lexicalised models for statistical parsing. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, pages 16-23. Association for Computational Linguistics.
  13. Costantini, S. and Paolucci, A. (2008). Semantically augmented DCG analysis for next-generation search engine. CILC (July 2008).
  14. Costantini, S. and Paolucci, A. (2010). Towards translating natural language sentences into asp. In Proc. of the Intl. Worksh. on Answer Set Programming and Other Computing Paradigms (ASPOCP), Edimburgh.
  15. De Marneffe, M., MacCartney, B., and Manning, C. (2006). Generating typed dependency parses from phrase structure parses. In Proceedings of LREC, volume 6, pages 449-454. Citeseer.
  16. De Marneffe, M. and Manning, C. (2008). The stanford typed dependencies representation. In Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation, pages 1-8. Association for Computational Linguistics.
  17. Eiter, T. (2010). Answer set programming for the semantic web. Logic Programming, pages 23-26.
  18. Kasneci, G., Ramanath, M., Suchanek, F., and Weikum, G. (2008). The YAGO-NAGA approach to knowledge discovery. SIGMOD Record, 37(4):41-47.
  19. Klein, D. and Manning, C. (2003a). Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, pages 423-430. Association for Computational Linguistics.
  20. Klein, D. and Manning, C. (2003b). Fast exact inference with a factored model for natural language parsing. Advances in neural information processing systems, pages 3-10.
  21. Kowalski, R. (2011). Computational Logic and Human Thinking: How to be Artificially Intelligent - In Press. Cambridge University Press.
  22. McClosky, D., Charniak, E., and Johnson, M. (2006). Effective self-training for parsing. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 152- 159. Association for Computational Linguistics.
  23. Mollá, D. and Hutchinson, B. (2003). Intrinsic versus extrinsic evaluations of parsing systems. In Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: are evaluation methods, metrics and resources reusable?, pages 43-50. Association for Computational Linguistics.
  24. Neuhaus, P. and Bröker, N. (1997). The complexity of recognition of linguistically adequate dependency grammars. In Proc. of ACL-97/EACL-97.
  25. Pereira, F. and Shieber, S. (2002). Prolog and naturallanguage analysis. Microtome Publishing.
  26. Petrov, S., Barrett, L., Thibaux, R., and Klein, D. (2006). Learning accurate, compact, and interpretable tree annotation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 433-440. Association for Computational Linguistics.
  27. Petrov, S. and Klein, D. (2007). Improved inference for unlexicalized parsing. In Proceedings of NAACL HLT 2007, pages 404-411.
  28. Schindlauer, R. (2006). Answer-set programming for the Semantic Web.
  29. Tesnière, L. (1959). Elèments de syntaxe structurale. Klincksieck, Paris. ISBN 2252018615.
Download


Paper Citation


in Harvard Style

Costantini S., Florio N. and Paolucci A. (2011). A FRAMEWORK FOR STRUCTURED KNOWLEDGE EXTRACTION AND REPRESENTATION FROM NATURAL LANGUAGE THROUGH DEEP SENTENCE ANALYSIS . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 274-279. DOI: 10.5220/0003663702820287


in Bibtex Style

@conference{kdir11,
author={Stefania Costantini and Niva Florio and Alessio Paolucci},
title={A FRAMEWORK FOR STRUCTURED KNOWLEDGE EXTRACTION AND REPRESENTATION FROM NATURAL LANGUAGE THROUGH DEEP SENTENCE ANALYSIS},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={274-279},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003663702820287},
isbn={978-989-8425-79-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - A FRAMEWORK FOR STRUCTURED KNOWLEDGE EXTRACTION AND REPRESENTATION FROM NATURAL LANGUAGE THROUGH DEEP SENTENCE ANALYSIS
SN - 978-989-8425-79-9
AU - Costantini S.
AU - Florio N.
AU - Paolucci A.
PY - 2011
SP - 274
EP - 279
DO - 10.5220/0003663702820287