A Dictionary based Stemming Mechanism for Polish
Michał Korzycki
2012
Abstract
In this paper we present and evaluate a robust stemming mechanism for Polish. We use the Polish Inflection Dictionary to build a Rule Based Stemmer and a Generative Reversed Rule Stemmer. The combination of both stemmers in the shape of the described Hybrid Stemmer provides us with a high precision stemming mechanism that is able to match human performance. This assumption is supported by a conducted experiment, the results of which are presented.
References
- Chomsky, N.: Aspects of the Theory of Syntax, MIT Press, (1965)
- Koskenniemi, K.: Two-level Morphology - A general Computational Model for Word-Form Recognition and Production, University of Helsinki Publication No. 11 (1983)
- Lubaszewski, W., Wróbel, H., Gaje?cki, M., Moskal, B., Orzechowska, A., Pietras, P., Pisarek, P., Rokicka, T.: Slownik Fleksyjny je?zyka polskiego, Lexis Nexis, Kraków (2001)
- Lubaszewski, W. (ed.): Slowniki komputerowe i automatyczna ekstrakcja informacji z tekstu, Kraków, AGH Press, (2009), original text in Polish
- Lubaszewski, W.: A Grammar for the Polish Inflection Lexicon TASK Quarterly : scientific bulletin of Academic Computer Centre in Gdansk ; ISSN 1428-6394 - (2000) vol. 4 no. 2 s.291-300. - Abstr.
- Weiss, D.: Stempelator: A Hybrid Stemmer for the Polish Language. Technical Report RA002/05, Institute of Computing Science, PoznaÁ University of Technology, Poland, (2005).
- Weiss, D.: A survey of freely available polish stemmers and evaluation of their applicability in information retrieval. In: Human Language Technologies as a Challenge for Computer Science and Linguistics, Proceedings of the 2nd Language and Technology Conference, pages 216-221, PoznaÁ, Poland, (2005).
- Korzycki, M.: Transducer skoÁczenie stanowy jako narze¸dzie rozpoznawania form tekstowych wyrazów [The Finite-State Transducer as a Tool for Polish Inflection Form Recognition], PhD Thesis, AGH (2008)
Paper Citation
in Harvard Style
Korzycki M. (2012). A Dictionary based Stemming Mechanism for Polish . In Proceedings of the 9th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2012) ISBN 978-989-8565-16-7, pages 143-150. DOI: 10.5220/0004100301430150
in Bibtex Style
@conference{nlpcs12,
author={Michał Korzycki},
title={A Dictionary based Stemming Mechanism for Polish},
booktitle={Proceedings of the 9th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2012)},
year={2012},
pages={143-150},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004100301430150},
isbn={978-989-8565-16-7},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 9th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2012)
TI - A Dictionary based Stemming Mechanism for Polish
SN - 978-989-8565-16-7
AU - Korzycki M.
PY - 2012
SP - 143
EP - 150
DO - 10.5220/0004100301430150