Arabase - A Database Combining Different Arabic Resources with Lexical and Semantic Information

Hazem Raafat, Mohamed Zahran, Mohsen Rashwan

Abstract

Language resources are important factor in any NLP application. However, the language resource support for Arabic is poor because the existing Arabic language resources are either scattered, inconsistent or even incomplete. In this paper we discuss the notion of having an integrated Arabic resource leveraging various pre-existing ones. We present a comparison between these resources then we present preliminary fully and semi-automated methods to integrate these resources. This work serves as a bootstrapping for a rich Arabic-Arabic resource with a good potential to interface with WordNet.

References

  1. alkhalil dot net, 2011. KACST Available at: http://sourceforge.net/projects/alkhalildotnet/. [Accessed 23 June 2013].
  2. almuajam, 2011. Arabic Interactive Dictionary Project. Available at: http://sourceforge.net/projects/almuajam/ [Accessed 23 June 2013].
  3. Arabic Stop Words, 2010. Available at: http://arabicstopwords.sourceforge.net/. [Accessed 23 June 2013].
  4. Arabic WordNet, 2007. A multi-lingual concept dictionary, Available at: http://awnbrowser. sourceforge.net/. [Accessed 23 June 2013].
  5. Arramooz AlWaseet: Arabic dictionary for morphology. Available at: http://arramooz.sourceforge.net/. [Accessed 23 June 2013].
  6. Attia, M., Rashwan, M., Al-Badrashiny M., 2009. 'Fassieh; a Semi-Automatic Visual Interactive Tool for the Morphological, PoS-Tags, Phonetic, and Semantic Annotation of the Arabic Text', IEEE Transactions on Audio, Speech, and Language Processing (TASLP): Special Issue on Processing Morphologically Rich Languages.
  7. Attia, M., Rashwan, M., Ragheb, A., Al-Badrashiny, M., Al-Basoumy, H. & Abdou, A., 2008. 'A Compact Arabic Lexical Semantics Language Resource Based on the Theory of Semantic Fields', Advances in Natural Language Processing, LNCS vol. 5221, pp 65- 76.
  8. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., & Harshman, R., 1990 'Indexing by Latent Semantic Analysis'. Journal of the American society for Information science, vol. 41, no. 6, pp. 391-407.
  9. Diab, M., 2004. 'The Feasibility of Bootstrapping an Arabic WordNet leveraging Parallel Corpora and an English WordNet', Proceedings of the Arabic Language Technologies and Resources, NEMLAR, Cairo.
  10. Diekema, A.R., 2004. 'Preliminary Lexical Framework for English-Arabic Semantic Resource Construction',. Semitic 7804 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages , Stroudsburg, PA, .pp. 10-14.
  11. Elkateb, S., Black, W., Rodríguez, H., Alkhalifa, M., Vossen, P., Pease, A., & Fellbaum, C., 2006. 'Building a WordNet for Arabic', Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006), Genoa, pp. 29-34.
  12. ksucorpus, 2013. King Saud University Corpus of Classical Arabic. Available at: http:// ksucorpus.ksu.edu.sa/ar/. [Accessed 23 June 2013].
  13. Lesk, M., 1986. 'Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone', Proceedings of the 5th annual international conference on Systems documentation SIGDOC 7886, Toronto, pp. 24-26.
  14. Niles, I. & Pease, A., 2001. 'Towards a standard upper ontology'. Proceedings of the International Conference on Formal Ontology in Information Systems FOIS 7801, Ogunquit, Maine, pp. 2-9.
  15. Princeton University "About WordNet.” 2010. WordNet. Princeton University. Available at: http://wordnet.princeton.edu. [Accessed 23 June 2013].
  16. Rehurek, R. & Sojka, P., 2010. 'Software Framework for Topic Modelling with Large Corpora', Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valetta, pp. 46-50.
  17. Tufis, D. (ed.), 2004. Special Issue on the BalkaNet project. Romanian Journal of Information Science and Technology, vol.7, no. 1-2
  18. Vossen, P., 1998. 'Introduction to EuroWordNet', Computers and the Humanities, vol. 32, no. 2-3, pp. 73-89.
  19. Yaseen, M., Attia, M., Maegaard, B.... Rashwan, M., et. al., 2006. 'Building Annotated Written and Spoken Arabic LR's in NEMLAR Project' Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006), Genoa, pp. 533-538.
Download


Paper Citation


in Harvard Style

Raafat H., Zahran M. and Rashwan M. (2013). Arabase - A Database Combining Different Arabic Resources with Lexical and Semantic Information . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: SSTM, (IC3K 2013) ISBN 978-989-8565-75-4, pages 233-240. DOI: 10.5220/0004656702330240


in Bibtex Style

@conference{sstm13,
author={Hazem Raafat and Mohamed Zahran and Mohsen Rashwan},
title={Arabase - A Database Combining Different Arabic Resources with Lexical and Semantic Information},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: SSTM, (IC3K 2013)},
year={2013},
pages={233-240},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004656702330240},
isbn={978-989-8565-75-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: SSTM, (IC3K 2013)
TI - Arabase - A Database Combining Different Arabic Resources with Lexical and Semantic Information
SN - 978-989-8565-75-4
AU - Raafat H.
AU - Zahran M.
AU - Rashwan M.
PY - 2013
SP - 233
EP - 240
DO - 10.5220/0004656702330240