A Formal Modeling Method to Enrich the Arabic Treebank ATB with Syntactic Properties

Raja Bensalem Bahloul, Kais Haddar, Philippe Blache

Abstract

The enrichment of an Arabic treebank with syntactic properties can facilitate many types of parsing processes. This enrichment allows also the increase of its use in different NLP applications, the acquirement of new linguistic resources and the ease of the probabilistic parsing process by using statistics to limit the properties to the satisfied ones or to the most frequent ones. In this context, our proposed enrichment method is based on a formalization phase, a Property Grammar induction phase from a source treebank and a treebank regeneration phase with a new syntactic property-based representation. Starting with a formalization phase in our enrichment problem may succeed its resolution procedure. In fact, it limits the specification of the data sets and the interactions between them to the used ones, which avoids any duplication. The formalization allows also the anticipation of the constraints to respect in the problem. The implementation of this enrichment method is experimented essentially on the Arabic treebank ATB. This experiment provides us with good and encouraging results and various properties of different types.

References

  1. Abdul-Mageed, M., Diab, M., 2012. AWATIF: A MultiGenre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis. Language Resources and Evaluation Conference (LREC'12), Istanbul, Turkey.
  2. Alkuhlani, S., Habash, N., 2011. A Corpus for Modeling Morpho-Syntactic Agreement in Arabic: Gender, Number and Rationality. Association for Computational Linguistics (ACL'11), Portland, Oregon, USA.
  3. Alkuhlani, S., Habash, N., Roth, R., 2013. Automatic Morphological Enrichment of a Morphologically Underspecified Treebank. North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT-NAACL'13), pp. 460- 470, Atlanta, Georgia, USA.
  4. Bensalem R. B., Elkarwi, M., 2014. Induction d'une grammaire de propriétés à granularité variable à partir du treebank arabe ATB. Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL'14), pp. 124-135, ATALA, ACL-ontology, Marseille, France.
  5. R. B. Bensalem, Elkarwi, M., Haddar, K., Blache, P., 2014. Building an Arabic Linguistic Resource from a treebank: The Case of Property Grammar. Text, Speech and Dialogue (TSD'14), pp. 240-246, Springer, Czech Republic.
  6. Blache, P., Rauzy, S., 2012. Hybridization and treebank enrichment with constraint-based representations. LREC'12- Workshop on Advanced Treebanking. Istanbul. Turkey.
  7. Cahill, A., 2008. Treebank-Based Probabilistic Phrase Structure Parsing. Language and Linguistics Compass 2 (1), 18-40.
  8. Çakici, R., 2005. Automatic induction of a CCG grammar for Turkish. ACL Student Research Workshop, pp. 73- 78, Ann Arbor, Michigan.
  9. El-taher, A. I., Abo Bakr, H. M., Zidan, I., Shaalan, K., 2014. An Arabic CCG approach for determining constituent types from Arabic treebank. Journal of King Saud University - Computer and Information Sciences, pp. 1319-1578.
  10. Habash, N., Rambow O., 2005. Arabic Tokenization, Partof-Speech Tagging and Morphological Disambiguation in One Fell Swoop. ACL, pp. 573-580, Ann Arbor, Michigan.
  11. Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R., 2006. OntoNotes: The 90% Solution. North American Chapter of the Association for Computational Linguistics (NAACL'06), pp. 57-60, USA.
  12. Koller, A., Thater, S., 2010. Computing weakest readings. ACL, Uppsala, Sweden.
  13. Maamouri, M., Bies, A., Buckwalter, T., Mekki, W., 2004. The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus. NEMLAR Conference on Arabic Language Resources and Tools, Cairo, Egypt.
  14. Müller, H. H., 2010. Annotation of Morphology and NP Structure in the Copenhagen Dependency Treebanks (CDT). International Workshop on Treebanks and Linguistic Theories, pp. 151-162, University of Tartu, Estonia.
  15. Oepen, S., Flickinger, D., Toutanova, K., Manning, C. D., 2002. LinGO Redwoods - A Rich and Dynamic Treebank for HPSG. LREC'02 - workshop on parsing evaluation, Las Palmas, Spain.
  16. Palmer, M., Babko-Malaya, O., Bies, A., Diab, M., Maamouri, M., Mansouri, A., Zaghouani, W., 2008. A Pilot Arabic Propbank. LREC'08, Marrakech, Morocco.
  17. Tounsi, L., Attia, M., Van-Genabith, J., 2009. Automatic Treebank-Based Acquisition of Arabic LFG Dependency Structures. The European Chapter of the ACL (EACL) Workshop on Computational Approaches to Semitic Languages, pp. 45-52, Greece.
Download


Paper Citation


in Harvard Style

Bensalem Bahloul R., Haddar K. and Blache P. (2015). A Formal Modeling Method to Enrich the Arabic Treebank ATB with Syntactic Properties . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KEOD, (IC3K 2015) ISBN 978-989-758-158-8, pages 108-117. DOI: 10.5220/0005617001080117


in Bibtex Style

@conference{keod15,
author={Raja Bensalem Bahloul and Kais Haddar and Philippe Blache},
title={A Formal Modeling Method to Enrich the Arabic Treebank ATB with Syntactic Properties},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KEOD, (IC3K 2015)},
year={2015},
pages={108-117},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005617001080117},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KEOD, (IC3K 2015)
TI - A Formal Modeling Method to Enrich the Arabic Treebank ATB with Syntactic Properties
SN - 978-989-758-158-8
AU - Bensalem Bahloul R.
AU - Haddar K.
AU - Blache P.
PY - 2015
SP - 108
EP - 117
DO - 10.5220/0005617001080117