Automatic Generation of Large Knowledge Bases using Deep Semantic and Linguistically Founded Methods

Sven Hartrumpf, Hermann Helbig, Ingo Phoenix

Abstract

Large-scale knowledge acquisition from texts is one of the challenges of the information society that can only be mastered by technical means. While the syntactic analysis of isolated sentences is relatively well understood, the problem of automatically parsing on all linguistic levels, starting from the morphological level through to the semantic level, i.e. real understanding of texts, is far from being solved. This paper explains the approach taken in this direction by the MultiNet technology in bridging the gap between the syntactic semantic analysis of single sentences and the creation of knowledge bases representing the content of whole texts. In particular, it is shown how linguistic text phenomena like inclusion or bridging references can be dealt with by logical means using the axiomatic apparatus of the MultiNet formalism. The NLP techniques described are practically applied in transforming large textual corpora like Wikipedia into a knowledge base and using the latter in meaning-oriented search engines.

References

  1. Baumgartner, P. and Kühn, M. (2000). Abducing coreference by model construction. Journal of Language and Computation, 1:193-209.
  2. Copestake, A., Flickinger, D., Sag, I., and Pollard, C. (2005). Minimal recursion semantics. Journal of Research on Language and Computation, 3:281-332.
  3. Ge, N., Hale, J., and Charniak, E. (1998). A statistical approach to anaphora resolution. In Proc. 6th Workshop on Very Large Corpora.
  4. Gnörlich, C. (2002). Technologische Grundlagen der Wissensverwaltung für die automatische Sprachverarbeitung. PhD thesis, FernUniversität Hagen.
  5. Hartrumpf, S. (2003). Hybrid Disambiguation in Natural Language Analysis. Der Andere Verlag, Osnabrück, Germany.
  6. Hartrumpf, S. (2005). Question answering using sentence parsing and semantic network matching. In Peters et al., C., editor, 5th Workshop of the CrossLanguage Evaluation Forum, CLEF 2004, pages 512- 521. Springer.
  7. Hartrumpf, S., Helbig, H., and Osswald, R. (2003). The semantically based computer lexicon HaGenLex - Structure and technological environment. Traitement automatique des langues, 44(2):81-105.
  8. Helbig, H. (2006). Knowledge Representation and the Semantics of Natural Language. Springer, Berlin.
  9. Hobbs, J., Stickel, M., Appelt, D., and Martin, P. (1993). Interpretation as abduction. Artificial Intelligence, 63(1- 2):69-142.
  10. Kamp, H. and Reyle, U. (1993). From Discourse to Logic: Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Kluwer, Dordrecht.
  11. Klavans, J. L. and Resnik, P., editors (1996). The Balancing Act: Combining Symbolic and Statistical Approaches to Language. Language, Speech, and Communication. MIT Press, Cambridge, Massachusetts.
  12. Leveling, J. (2006). Formale Interpretation von Nutzeranfragen für natürlichsprachliche Interfaces zu Informationsangeboten im Internet. Der andere Verlag, Tönning, Germany.
  13. Ravichandran, D. and Hovy, E. (2002). Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pages 41- 47, Philadelphia, Pennsylvania.
  14. Socher, R., Huval, B., Manning, C. D., and Ng, A. Y. (2012). Semantic Compositionality Through Recursive Matrix-Vector Spaces. In Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1201-1211.
  15. vor der Brück, T. (2010). Hypernymy extraction using a semantic network representation. International Journal of Computational Linguistics and Applications, pages 243-250.
  16. vor der Brück, T. and Helbig, H. (2010). Retrieving meronyms from texts using an automated theorem prover. Journal of Language Technology and Computational Linguistics, 25(1):57-81.
Download


Paper Citation


in Harvard Style

Hartrumpf S., Helbig H. and Phoenix I. (2014). Automatic Generation of Large Knowledge Bases using Deep Semantic and Linguistically Founded Methods . In Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-758-015-4, pages 297-304. DOI: 10.5220/0004756202970304


in Bibtex Style

@conference{icaart14,
author={Sven Hartrumpf and Hermann Helbig and Ingo Phoenix},
title={Automatic Generation of Large Knowledge Bases using Deep Semantic and Linguistically Founded Methods},
booktitle={Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2014},
pages={297-304},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004756202970304},
isbn={978-989-758-015-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - Automatic Generation of Large Knowledge Bases using Deep Semantic and Linguistically Founded Methods
SN - 978-989-758-015-4
AU - Hartrumpf S.
AU - Helbig H.
AU - Phoenix I.
PY - 2014
SP - 297
EP - 304
DO - 10.5220/0004756202970304