In these cases the first analysis is taken, which can
be erroneous approach but can be fixed manually.
We can also choose not to provide morphological
information for such words.
The linking algorithm is limited to linking
resources based on the sense information (I
se
). If any
resource does not have this information component
and its synset has no other words, then it will not be
linked by our linking algorithm.
We can enrich Arabase by linking it with
WordNet. Such that each Arabic sense is linked with
its corresponding English one in WordNet. Currently
the only interface is the integrated entries from
ArabicWordNet.
8 CONCLUSIONS
We compared different Arabic resources examining
their points of strength and weakness. Then we
presented a framework that can be used to compile
pieces of Arabic language information scattered
across these resources into a single resource. We
showed the trade-off between fully automated and
manual methods. Full automation will decrease
significantly the human effort, thus saving time and
man-power at the expense of decreasing the
accuracy and consistency of the resulting resources.
We showed the compromise between both methods
can result in an acceptable accuracy and consistency
with minimal human efforts.
REFERENCES
alkhalil dot net, 2011. KACST Available at:
http://sourceforge.net/projects/alkhalildotnet/.
[Accessed 23 June 2013].
almuajam, 2011. Arabic Interactive Dictionary Project.
Available at: http://sourceforge.net/projects/almuajam/
[Accessed 23 June 2013].
Arabic Stop Words, 2010. Available at:
http://arabicstopwords.sourceforge.net/. [Accessed 23
June 2013].
Arabic WordNet, 2007. A multi-lingual concept
dictionary, Available at: http://awnbrowser.
sourceforge.net/. [Accessed 23 June 2013].
Arramooz AlWaseet: Arabic dictionary for morphology.
Available at: http://arramooz.sourceforge.net/.
[Accessed 23 June 2013].
Attia, M., Rashwan, M., Al-Badrashiny M., 2009.
‘Fassieh; a Semi-Automatic Visual Interactive Tool
for the Morphological, PoS-Tags, Phonetic, and
Semantic Annotation of the Arabic Text’, IEEE
Transactions on Audio, Speech, and Language
Processing (TASLP): Special Issue on Processing
Morphologically Rich Languages.
Attia, M., Rashwan, M., Ragheb, A., Al-Badrashiny, M.,
Al-Basoumy, H. & Abdou, A., 2008. ‘A Compact
Arabic Lexical Semantics Language Resource Based
on the Theory of Semantic Fields’, Advances in
Natural Language Processing, LNCS vol. 5221, pp 65-
76.
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer,
T.K., & Harshman, R., 1990 ‘Indexing by Latent
Semantic Analysis’. Journal of the American society
for Information science, vol. 41, no. 6, pp. 391-407.
Diab, M., 2004. ‘The Feasibility of Bootstrapping an
Arabic WordNet leveraging Parallel Corpora and an
English WordNet’, Proceedings of the Arabic
Language Technologies and Resources, NEMLAR,
Cairo.
Diekema, A.R., 2004. ‘Preliminary Lexical Framework for
English-Arabic Semantic Resource Construction’,.
Semitic '04 Proceedings of the Workshop on
Computational Approaches to Arabic Script-based
Languages , Stroudsburg, PA, .pp. 10-14.
Elkateb, S., Black, W., Rodríguez, H., Alkhalifa, M.,
Vossen, P., Pease, A., & Fellbaum, C., 2006.
‘Building a WordNet for Arabic’, Proceedings of the
Fifth International Conference on Language
Resources and Evaluation (LREC 2006), Genoa, pp.
29-34.
ksucorpus, 2013. King Saud University Corpus of
Classical Arabic. Available at: http://
ksucorpus.ksu.edu.sa/ar/. [Accessed 23 June 2013].
Lesk, M., 1986. ‘Automatic sense disambiguation using
machine readable dictionaries: How to tell a pine cone
from an ice cream cone’, Proceedings of the 5th
annual international conference on Systems
documentation SIGDOC '86, Toronto, pp. 24-26.
Niles, I. & Pease, A., 2001. ‘Towards a standard upper
ontology’. Proceedings of the International
Conference on Formal Ontology in Information
Systems FOIS '01, Ogunquit, Maine, pp. 2-9.
Princeton University "About WordNet.” 2010.
WordNet.
Princeton University. Available at:
http://wordnet.princeton.edu. [Accessed 23 June
2013].
Rehurek, R. & Sojka, P., 2010. ‘Software Framework for
Topic Modelling with Large Corpora’, Proceedings of
the LREC 2010 Workshop on New Challenges for NLP
Frameworks, Valetta, pp. 46-50.
Tufis, D. (ed.), 2004. Special Issue on the BalkaNet
project. Romanian Journal of Information Science and
Technology, vol.7, no. 1-2
Vossen, P., 1998. ‘Introduction to EuroWordNet’,
Computers and the Humanities, vol. 32, no. 2-3, pp.
73-89.
Yaseen, M., Attia, M., Maegaard, B.... Rashwan, M., et.
al., 2006. ‘Building Annotated Written and Spoken
Arabic LR’s in NEMLAR Project’ Proceedings of the
Fifth International Conference on Language
Resources and Evaluation (LREC 2006), Genoa, pp.
533-538.
KDIR2013-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval
240