On Operative Creation of Lexical Resources in Different Languages
Svetlana Sheremetyeva
2012
Abstract
Cognitive modeling is to a large extent mediated by lexicons thus bringing in focus operative creation of high quality lexical resources. This paper presents a methodology and tool for automatic extraction of lexical data from textual sources. The methodology combines n-gram extraction and a filtering algorithm, which operates blocs of shallow linguistic knowledge. The specificity of the approach is three-fold, - (i) it allows dynamic extraction of lexical resources and does not rely on a pre-constructed corpus; (ii) it does not miss low frequency units; (iii) it is portable between different lexical types, domains and languages. The methodology has been implemented into a tool that can be used in a wide range of text processing tasks useful for cognitive modeling from ontology acquisition, to automatic annotation, multilingual information retrieval, machine translation, etc.
References
- Motivation in Grammar and the Lexicon (Human Cognitive Processing), .Ed. Panther KU., G. Radden. John Benjamin's publishing Company (2011) 313.
- Cholakov K, Kordoni, V., Zhang, Y.: Towards domain-independent deep linguistic processing: Ensuring portability and re-usability of lexicalized grammars. In: Proceedings of COLING 2008 Workshop on Grammar Engineering Across Frameworks (GEAF08), Manchester, UK (2008).
- Lefever E., Macken, L., Hoste, V.: Language-independent bilingual terminology extraction from a multilingual parallel corpus. In: Proceedings of the 12th Conference of the European Chapter of the ACL, Athens, Greece (2009) 496-504.
- Valderrabanos V. A. S., Belskis, A., Iraola L.: TExtractor: a multilingual terminology extraction tool. In: Proceedings of the second international conference on Human Language Technology Research, San Diego, California (2002) 393-398
- Seretan, V., Wehrli, E. Multilingual collocation extraction with a syntactic parser. In: Language Resources and Evaluation, 43(1) (2009) 71-85.7.
- Daille B., E. Morin. An effective compositional model for lexical alignment. IJCNLP 2008: Third International Joint Conference on Natural Language Processing, January 7-12, Hyderabad, India (2008) 95-102.
- Michou A., Seretan, V.: Tool for Multi-Word Expression Extraction in Modern Greek Using Syntactic Parsing. In: Proceedings of the EACL Demonstrations Sessions. Athens, Greece (2009).
- Rayson, P., Archer, D., Piao, S., and McEnery, T.The UCREL semantic analysis system. In: Proceedings of the LREC-04 Workshop, beyond Named Entity Recognition Semantic Labelling for NLP Tasks, Lisbon, Portugal, (2004) 7-12.
- Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1) (1993) 61-74.
- Thuy, V., Aw, A., Zhang, Min.: Term extraction through unithood and termhood unification. In: Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP-08), Hyderabad, India (2008).
- Piao, S. L., Rayson, P., Archer, D., McEnery, T.: Comparing and Combining A Semantic Tagger and A Statistical Tool for MWE Extraction. Computer Speech & Language Volume 19, Issue 4, (2005) 378-39715.
- Sharoff, S.: What is at stake: a case study of Russian expressions starting with a preposition. In: Proceedings of the Second ACL Workshop on Multiword Expressions Integrating Processing, July (2004).
Paper Citation
in Harvard Style
Sheremetyeva S. (2012). On Operative Creation of Lexical Resources in Different Languages . In Proceedings of the 9th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2012) ISBN 978-989-8565-16-7, pages 44-52. DOI: 10.5220/0004089300440052
in Bibtex Style
@conference{nlpcs12,
author={Svetlana Sheremetyeva},
title={On Operative Creation of Lexical Resources in Different Languages},
booktitle={Proceedings of the 9th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2012)},
year={2012},
pages={44-52},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004089300440052},
isbn={978-989-8565-16-7},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 9th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2012)
TI - On Operative Creation of Lexical Resources in Different Languages
SN - 978-989-8565-16-7
AU - Sheremetyeva S.
PY - 2012
SP - 44
EP - 52
DO - 10.5220/0004089300440052