Authors:
Benyamin Ahmadnia
1
;
Raul Aranovich
1
and
Bonnie J. Dorr
2
Affiliations:
1
Department of Linguistics, University of California, Davis, CA, U.S.A.
;
2
Institute for Human and Machine Cognition (IHMC), Ocala, FL, U.S.A.
Keyword(s):
Computational Linguistics, Natural Language Processing, Neural Machine Translation, Low-Resource Languages, Joint Learning.
Abstract:
This paper describes a systematic study of an approach to Farsi-Spanish low-resource Neural Machine Translation (NMT) that leverages monolingual data for joint learning of forward and backward translation models. As is standard for NMT systems, the training process begins using two pre-trained translation models that are iteratively updated by decreasing translation costs. In each iteration, either translation model is used to translate monolingual texts from one language to another, to generate synthetic datasets for the other translation model. Two new translation models are then learned from bilingual data along with the synthetic texts. The key distinguishing feature between our approach and standard NMT is an iterative learning process that improves the performance of both translation models, simultaneously producing a higher-quality synthetic training dataset upon each iteration. Our empirical results demonstrate that this approach outperforms baselines.