Authors:
Souheila Ben Guirat
1
;
Ibrahim Bounhas
2
and
Yahia Slimani
3
Affiliations:
1
Computer Sciences Department, Prince Sattam Bin Abdulaziz University, K.S.A., Laboratory of Computer Science for Industrial Systems, Carthage University, Tunisia, JARIR: Joint Group for Artificial Reasoning and Information Retrieval and Tunisia
;
2
Laboratory of Computer Science for Industrial Systems, Carthage University, Tunisia, JARIR: Joint Group for Artificial Reasoning and Information Retrieval and Tunisia
;
3
Laboratory of Computer Science for Industrial Systems, Carthage University, Tunisia, Higher Institute of Multimedia Arts of Manouba (ISAMM), La Manouba University, Tunisia, JARIR: Joint Group for Artificial Reasoning and Information Retrieval and Tunisia
Keyword(s):
Arabic Information Retrieval, Hybrid Index, Statistical Modeling, Smoothing.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Artificial Intelligence
;
Computational Intelligence
;
Evolutionary Computing
;
Knowledge Discovery and Information Retrieval
;
Knowledge Engineering and Ontology Development
;
Knowledge-Based Systems
;
Machine Learning
;
Natural Language Processing
;
Pattern Recognition
;
Soft Computing
;
Symbolic Systems
Abstract:
Arabic document indexing is yet challenging given the morphological specificities of this language. Although there has been much effort in the field, developing more efficient indexing approaches is more and more demanding. One of the most important issues concerns the choice of the indexing units (e.g. stems, roots, lemmas, etc.) which both enhances retrieval efficiency and optimizes the indexing process. The question is how to process Arabic texts to retrieve the basic forms which better reflect the meaning of words and documents? In the literature several indexing units have been compared, while combining multiple indexes seems to be promising. In our previous works, we showed that hybrid indexes based on stems, patterns and roots enhances results. However, we need to find the optimal weight of each indexing unit. Therefore, this paper proposes to contribute in optimizing hybrid indexing. We compare and evaluate four pre-indexing methods.