On Operative Creation of Lexical Resources in Different Languages

Svetlana Sheremetyeva

Abstract

Cognitive modeling is to a large extent mediated by lexicons thus bringing in focus operative creation of high quality lexical resources. This paper presents a methodology and tool for automatic extraction of lexical data from textual sources. The methodology combines n-gram extraction and a filtering algorithm, which operates blocs of shallow linguistic knowledge. The specificity of the approach is three-fold, - (i) it allows dynamic extraction of lexical resources and does not rely on a pre-constructed corpus; (ii) it does not miss low frequency units; (iii) it is portable between different lexical types, domains and languages. The methodology has been implemented into a tool that can be used in a wide range of text processing tasks useful for cognitive modeling from ontology acquisition, to automatic annotation, multilingual information retrieval, machine translation, etc.

References

  1. Motivation in Grammar and the Lexicon (Human Cognitive Processing), .Ed. Panther KU., G. Radden. John Benjamin's publishing Company (2011) 313.
  2. Cholakov K, Kordoni, V., Zhang, Y.: Towards domain-independent deep linguistic processing: Ensuring portability and re-usability of lexicalized grammars. In: Proceedings of COLING 2008 Workshop on Grammar Engineering Across Frameworks (GEAF08), Manchester, UK (2008).
  3. Lefever E., Macken, L., Hoste, V.: Language-independent bilingual terminology extraction from a multilingual parallel corpus. In: Proceedings of the 12th Conference of the European Chapter of the ACL, Athens, Greece (2009) 496-504.
  4. Valderrabanos V. A. S., Belskis, A., Iraola L.: TExtractor: a multilingual terminology extraction tool. In: Proceedings of the second international conference on Human Language Technology Research, San Diego, California (2002) 393-398
  5. Seretan, V., Wehrli, E. Multilingual collocation extraction with a syntactic parser. In: Language Resources and Evaluation, 43(1) (2009) 71-85.7.
  6. Daille B., E. Morin. An effective compositional model for lexical alignment. IJCNLP 2008: Third International Joint Conference on Natural Language Processing, January 7-12, Hyderabad, India (2008) 95-102.
  7. Michou A., Seretan, V.: Tool for Multi-Word Expression Extraction in Modern Greek Using Syntactic Parsing. In: Proceedings of the EACL Demonstrations Sessions. Athens, Greece (2009).
  8. Rayson, P., Archer, D., Piao, S., and McEnery, T.The UCREL semantic analysis system. In: Proceedings of the LREC-04 Workshop, beyond Named Entity Recognition Semantic Labelling for NLP Tasks, Lisbon, Portugal, (2004) 7-12.
  9. Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1) (1993) 61-74.
  10. Thuy, V., Aw, A., Zhang, Min.: Term extraction through unithood and termhood unification. In: Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP-08), Hyderabad, India (2008).
  11. Piao, S. L., Rayson, P., Archer, D., McEnery, T.: Comparing and Combining A Semantic Tagger and A Statistical Tool for MWE Extraction. Computer Speech & Language Volume 19, Issue 4, (2005) 378-39715.
  12. Sharoff, S.: What is at stake: a case study of Russian expressions starting with a preposition. In: Proceedings of the Second ACL Workshop on Multiword Expressions Integrating Processing, July (2004).
Download


Paper Citation


in Harvard Style

Sheremetyeva S. (2012). On Operative Creation of Lexical Resources in Different Languages . In Proceedings of the 9th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2012) ISBN 978-989-8565-16-7, pages 44-52. DOI: 10.5220/0004089300440052


in Bibtex Style

@conference{nlpcs12,
author={Svetlana Sheremetyeva},
title={On Operative Creation of Lexical Resources in Different Languages},
booktitle={Proceedings of the 9th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2012)},
year={2012},
pages={44-52},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004089300440052},
isbn={978-989-8565-16-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2012)
TI - On Operative Creation of Lexical Resources in Different Languages
SN - 978-989-8565-16-7
AU - Sheremetyeva S.
PY - 2012
SP - 44
EP - 52
DO - 10.5220/0004089300440052