An Algorithm for Arabic Lexicon Generator Using Morphological Analysis

Samer Nofal

2006

Abstract

Several Natural Language Processing Systems (NLPSs) use lexicons, which are files that store information about words, such as word category, gender, number and tense. Computing lexical information instead of storing them will improve the time complexity of NLPSs. This work designs, implements and examines an algorithm for the Arabic morphological analyzer and lexicon generator. The algorithm is based on segmenting the word into a prefix, a stem and a suffix. The algorithm then tries to decide the fillers of the lexicon entries from the information contained in these segments. The algorithm makes several tests on the compatibility between the word components: the prefix, stem and suffix. This algorithm consults three types of lists for assertion purposes: prefixes list, suffixes list and stem lists. The algorithm was tested on three social and political articles of nearly 1300 words. The evaluation shows that we can depend on computational morphological analysis with a percentage of at least 80 percent. The 20 percent failure percentage is due to language exceptions and the hidden diacritics of Arabic words.

References

  1. Darwish, k. (2002). Building a shallow Arabic Morphological Analyzer in One Day. In Proceedings of the Association for Computational Linguistics (ACL-02), 40th Anniversary Meeting (pp.47-54), University of Pennsylvania, Philadelphia.
  2. http://www.xrce.xerox.com, Access Date: march/15/2004.
  3. A1-Shalabi, R, and Evens, M. (1998). A Computational Morphology System for Arabic. Workshop on Computational Approaches to Semitic Languages, COLING-ACL.
  4. Buckwalter, T. (2002). Buckwalter Arabic Morphological Analyzer Version 1.0, Linguistic Data Consortium (LDC) catalog number LDC2002L49 and ISBN 1-58563-257-0.
  5. Al-Jlayl, M., & Frieder, O. (2002). On Arabic search: Improving the Retrieval Effectiveness via Light Stemming Approach. In Proceedings of the 11th ACM International Conference on Information and Knowledge Management, Illinois Institute of Technology (pp. 340-347). New York: ACM Press.
Download


Paper Citation


in Harvard Style

Nofal S. (2006). An Algorithm for Arabic Lexicon Generator Using Morphological Analysis . In Proceedings of the 3rd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2006) ISBN 978-972-8865-50-4, pages 57-70. DOI: 10.5220/0002472800570070


in Bibtex Style

@conference{nlucs06,
author={Samer Nofal},
title={An Algorithm for Arabic Lexicon Generator Using Morphological Analysis},
booktitle={Proceedings of the 3rd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2006)},
year={2006},
pages={57-70},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002472800570070},
isbn={978-972-8865-50-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2006)
TI - An Algorithm for Arabic Lexicon Generator Using Morphological Analysis
SN - 978-972-8865-50-4
AU - Nofal S.
PY - 2006
SP - 57
EP - 70
DO - 10.5220/0002472800570070