Automatic Alignment of Persian and English Lexical Resources: A Structural-Linguistic Approach

Rahim Dehkharghani, Mehrnoush Shamsfard

Abstract

Cross-lingual mapping of linguistic resources such as corpora, ontologies, lexicons and thesauri is very important for developing cross-lingual (CL) applications such as machine translation, CL information retrieval and question answering. Developing mapping techniques for lexical ontologies of different languages is not only important for inter-lingual tasks but also can be implied to build lexical ontologies for a new language based on existing ones. In this paper we propose a two-phase approach for mapping a Persian lexical resource to Princeton's WordNet. In the first phase, Persian words are mapped to WordNet synsets using some heuristic improved linguistic approaches. In the second phase, the previous mappings are evaluated (accepted or rejected) according to the structural similarities of WordNet and Persian thesaurus. Although we applied it to Persian, our proposed approach, SBU methodology is language independent. As there is no lexical ontology for Persian, our approach helps in building one for this language too.

References

  1. Daudé, J., Padró, L., Rigau, G.: Mapping multilingual hierarchies using relaxation labeling. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC'99), Maryland (1999).
  2. Daudé, J., Padró, L., Rigau, G.: Mapping WordNets using structural information. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, China (2000).
  3. Lee, C., Lee, G., Jung Yun, S.: Automatic WordNet mapping using word sense disambiguation. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC 2000), Hong Kong (2000).
  4. Mihaltz. M., Prószéky. G.: Results and Evaluation of Hungarian Nominal WordNet v1.0. In Proceedings of the Second International WordNet Conference (GWC 2004), Brno, Czech Republic (2004).
  5. Lebart, L.: Traitement Statistique des Données. DUNOD, Paris (1990).
  6. Rodriguez, H., Farwell, D., Farreres, J., Bertran, M., Alkhalifa, M., Antonia Martí, M.: Arabic WordNet: Semi-automatic Extensions using Bayesian Inference. Proceedings of the 6th Conference on Language Resources and Evaluation LREC2008. Marrakech, Morocco (2008).
  7. Farreres, J., Gibert, K., Rodríguez, H.: Semiautomatic creation of taxonomies. In G. N. et al., editor, Proceedings of the Coling 2002 Workshop ”SemaNet'02: Building and Using Semantic Networks', Taipei, (2002).
  8. Farreres, J., Gibert, K., Rodríguez, H.: Towards binding Spanish senses to WordNet senses through taxonomy alignment. In S. et al., editor, Proceedings of the Second International WordNet Conferenc, pages 259-264, Brno. Masaryk University, (2003).
  9. Farreres, J., Rodríguez, H.: Selecting the correct synset for a Spanish sense. In Proceedings of the LREC international conference, Lisbon, Portugal (2004).
  10. Farreres, J.: Automatic Construction of Wide-Coverage Domain-Independent LexicoConceptual Ontologies. PhD Thesis, Polytechnic University of Catalonia, Barcelona (2005).
  11. Atserias, J., Climent, S., Farreres, X., Rigau, G., Rodriguez, H.: Combining multiple methods for the automatic construction of multilingual WordNets. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP), Tzigov Chark, Bulgaria (1997).
  12. Shamsfard M., Towards Semi Automatic Construction of a Lexical Ontology for Persian, Proceedings of the 6th Conference on Language Resources and Evaluation LREC2008. Marrakech, Morocco (2008).
  13. Assi, M., Aryanpour, M.: Aryanpour English-Persian and Persian-English dictionary. http://www.aryanpour.com
  14. Anvari, H.: Persian-Persian Sokhan dictionary. Sokhan Pub., Iran, Tehran (2002).
  15. Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics, 19(1):17-30, (1989).
  16. Simpson, T., Dao, T.: WordNet-based semantic similarity measurement. http://www.codeproject.com/KB/string/semanticsimilaritywordnet.aspx
  17. Fararooy, J.: Fararooy Persian Corpus. http://www.persianthesaurus.com
  18. Ramanand, J., Ukey A., Kiran Singh, B., Bhattacharyya, P.: Mapping and Structural Analysis of Multi-lingual WordNets. IEEE Data Eng. Bull. 30(1): 30-43 (2007).
Download


Paper Citation


in Harvard Style

Dehkharghani R. and Shamsfard M. (2009). Automatic Alignment of Persian and English Lexical Resources: A Structural-Linguistic Approach . In Proceedings of the 6th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2009) ISBN 978-989-8111-92-0, pages 55-66. DOI: 10.5220/0002175000550066


in Bibtex Style

@conference{nlpcs09,
author={Rahim Dehkharghani and Mehrnoush Shamsfard},
title={Automatic Alignment of Persian and English Lexical Resources: A Structural-Linguistic Approach },
booktitle={Proceedings of the 6th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2009)},
year={2009},
pages={55-66},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002175000550066},
isbn={978-989-8111-92-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2009)
TI - Automatic Alignment of Persian and English Lexical Resources: A Structural-Linguistic Approach
SN - 978-989-8111-92-0
AU - Dehkharghani R.
AU - Shamsfard M.
PY - 2009
SP - 55
EP - 66
DO - 10.5220/0002175000550066