Arabic Corpus Enhancement using a New Lexicon/Stemming Algorithm

Ashraf AbdelRaouf; Colin A. Higgins; Tony Pridmore; Mahmoud I. Khalil

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Arabic Corpus Enhancement using a New Lexicon/Stemming Algorithm

Topics: Image Understanding; Information Retrieval; Information Retrieval and Learning; Natural Language Processing; Object Recognition

In Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods ICPRAM - Volume 1, 435-440, 2013 , Barcelona, Spain

Authors: Ashraf AbdelRaouf ¹ ; Colin A. Higgins ¹ ; Tony Pridmore ¹ and Mahmoud I. Khalil ²

Affiliations: ¹ The University of Nottingham, United Kingdom ; ² Ain Shams University, Egypt

Keyword(s): Arabic Corpus, Optical Character Recognition, Data Retrieval, Morphological Analysis, Lexicon, Stemming Algorithm.

Related Ontology Subjects/Areas/Topics: Applications ; Artificial Intelligence ; Computer Vision, Visualization and Computer Graphics ; Data Engineering ; Image Understanding ; Information Retrieval ; Information Retrieval and Learning ; Knowledge Engineering and Ontology Development ; Knowledge-Based Systems ; Natural Language Processing ; Object Recognition ; Ontologies and the Semantic Web ; Pattern Recognition ; Software Engineering ; Symbolic Systems ; Theory and Methods

Abstract: Optical Character Recognition (OCR) is an important technology and has many advantages in storing information for both old and new documents. The Arabic language lacks both the variety of OCR systems and the depth of research relative to Roman scripts. An authoritative corpus is beneficial in the design and construction of any OCR system. Lexicon and stemming tools are essential in enhancing corpus retrieval and performance in an OCR context. A new lexicon/stemming algorithm is presented based on the Viterbi path method which uses a light stemmer approach. Lexicon and stemming lookup is combined to obtain a list of alternatives for uncertain words. This list removes affixes (prefixes or suffices) if there are any; otherwise affixes are added. Finally, every word in the list of alternatives is verified by searching the original corpus. The lexicon/stemming algorithm also assures the continuous updating of the contents of the corpus presented by (AbdelRaouf et al., 2010), which copes w ith the innovative needs of Arabic OCR research. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.108

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

AbdelRaouf, A., A. Higgins, C., Pridmore, T. and I. Khalil, M. (2013). Arabic Corpus Enhancement using a New Lexicon/Stemming Algorithm. In Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - ICPRAM; ISBN 978-989-8565-41-9; ISSN 2184-4313, SciTePress, pages 435-440. DOI: 10.5220/0004260704350440

@conference{icpram13,
author={Ashraf AbdelRaouf and Colin {A. Higgins} and Tony Pridmore and Mahmoud {I. Khalil}},
title={Arabic Corpus Enhancement using a New Lexicon/Stemming Algorithm},
booktitle={Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - ICPRAM},
year={2013},
pages={435-440},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004260704350440},
isbn={978-989-8565-41-9},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - ICPRAM
TI - Arabic Corpus Enhancement using a New Lexicon/Stemming Algorithm
SN - 978-989-8565-41-9
IS - 2184-4313
AU - AbdelRaouf, A.
AU - A. Higgins, C.
AU - Pridmore, T.
AU - I. Khalil, M.
PY - 2013
SP - 435
EP - 440
DO - 10.5220/0004260704350440
PB - SciTePress