The performances of the F1 measure for LCS and
CEPRF were quite similar. The reasons for these
results are because: (1) the expansion is recall-
oriented and the CEPRF directly uses AWN
synonyms in the expansion process, which may bring
about uncertainty. For example, the term ‘’, which
means ‘sort’ has many synonyms such as ‘’, which
means ‘universe’ and ‘format’. Because of this, the
expansion terms will affect the number of retrieved
documents (it will be increased) but the similarity to
the queries will be low. So the new related documents
to the queries will have low similarity; (2) the
elimination of stop-words improves the precision,
which reduces the appearance of the most frequent
words and the ones that do not have meaning.
The best results were obtained when the semantic
similarity measure was used to select the best
synonym
. The results show that our proposed
approach is effective in expanding the results and
disambiguating word senses. This automatic
expansion technique (CEPRFSS) based on PRF and
semantic similarity measure using AWN achieved the
best result in comparison to the other two systems i.e.,
LCS and CEPRF.
Building the Stem-Semantic relationship process
based on the terms in the TREC-2001 and the
synonyms in the Arabic WordNet with a semantic
similarity measure has improved AIR performance.
The use of the Arabic WordNet and the semantic
expansion is in line with the work of other
researchers. However, this improvement in expansion
is limited to the available synonyms of the terms in
the TREC-2001 and are further restricted to the small
size of the Arabic WordNet relations. Experiments
that have used the expansion technique showed that
the overall results exhibited improvement in retrieval
effectiveness in terms of MAP by 49% and without
degradation in recall compared to the baseline. In
addition, the result in terms of recall was also
improved by 7.3%.
5 CONCLUSIONS
This work presents two proposed approaches as
automatic expansion techniques. The first approach is
based on corpus and the PRF technique using the
Arabic WordNet to select the expansion terms, in
which the relationship among the candidate
expansion terms and the corresponding synonyms are
identified using corpus-based semantic similarity
measurements that are based on their co-occurrence
distributions. The second approach is that of
automatic query expansion, for which we jointly use
the Arabic WordNet and the PRF technique with
semantic similarity measurement to confirm the
newly expanded query terms. To overcome the
limitation of the semantic synonym selection from
WordNet, a corpus-based semantic similarity
measurement is also used. The expansion approach
has an important function in this system; it is the heart
of this retrieval system, utilising knowledge regarding
the synonyms from the Arabic WordNet based on
available synsets of terms as a semantic resource to
select expansion terms and expanded terms so as to
add suitable and relevant documents for the user
query. The CEPRFSS is an effective approach that
applies the PMI-IR semantic similarity measure with
the automatic corpus-based expansion technique to
select the most appropriate expansion terms to
disambiguate word senses. Overall, this approach has
improved the AIR performance.
REFERENCES
Al Ameed, H. K., Al Ketbi, S. O., Al Kaabi, A. A., Al
Shebli, K. S., Al Shamsi, N. F., Al Nuaimi, N. H. & Al
Muhairi, S. S. 2006. Arabic Search Engines
Improvement: A New Approach Using Search Key
Expansion Derived from Arabic Synonyms Structure.
6th International Conference on Innovations in
Information Technology, pp. 944-951.
Al-Eroud, A. F., Al-Ramahi, M. A., Al-Kabi, M. N.,
Alsmadi, I. M. & Al-Shawakfa, E. M. 2011. Evaluating
Google Queries Based on Language Preferences.
Journal of Information Science, vol. 37, pp. 282-292.
Al-Kabi, M., Wahsheh, H., Alsmadi, I., Al-Shawakfa, E.,
Wahbeh, A. & Al-Hmoud, A. 2012. Content-Based
Analysis to Detect Arabic Web Spam. Journal of
Information Science. vol. 38, pp. 284-296.
Attar, R. & Fraenkel, A. S. 1997. Local Feedback in Full-
Text Retrieval Systems. Journal of the ACM (JACM),
vol. 24, pp. 397-417.
Attia, M. A. 2007. Arabic tokenization system. Proceedings
of the 2007 Workshop on Computational Approaches
to Semitic Languages: Common Issues and Resources.
Prague, Czech Republic: Association for
Computational Linguistics, pp. 65-72.
Atwan, J., Mohd, M., Kanaan, G., Bsoul, Q. 2014. Impact
of stemmer on arabic text retrieval. The Tenth Asia
Information Retrieval Societies Conference (AIRS
2014). Sarawak, Malaysia, pp. 314-326.
Hoseini Ma-S. 2011. Modeling the arabic language through
verb based ontology. International Journal of
Academic Research; 3(3): 67-74.
Jarrar M. 2011. Building a formal arabic ontology
methodology and progress. In: Experts meeting on
Arabic Ontologies and Semantic Networks, 2011,
Alecso, Arab League, Tunis, pp. 497-503.