REFERENCES
Artetxe, M., Labaka, G., Agirre, E., and Cho, K. (2017).
Unsupervised neural machine translation. CoRR.
Bahdanau, D., Brakel, P., Xu, K., Goyal, A., Lowe, R.,
Pineau, J., Courville, A., and Bengio, Y. (2016). An
actor-critic algorithm for sequence prediction. arXiv
preprint arXiv:1607.07086.
Bahdanau, D., Cho, K., and Bengio, Y. (2015). Neural ma-
chine translation by jointly learning to align and trans-
late. ICLR.
Bengio, S., Vinyals, O., Jaitly, N., and Shazeer, N. (2015).
Scheduled sampling for sequence prediction with re-
current neural networks. In Advances in Neural Infor-
mation Processing Systems, pages 1171–1179.
Cettolo, M., Niehues, J., St
¨
uker, S., Bentivogli, L., and Fed-
erico, M. (2014). Report on the 11th iwslt evaluation
campaign, iwslt 2014. In Proceedings of the Inter-
national Workshop on Spoken Language Translation,
Hanoi, Vietnam, volume 57.
Cho, K., Van Merri
¨
enboer, B., Gulcehre, C., Bahdanau, D.,
Bougares, F., Schwenk, H., and Bengio, Y. (2014).
Learning phrase representations using rnn encoder-
decoder for statistical machine translation. Empirical
Methods in Natural Language Processing (EMNLP).
Gehring, J., Auli, M., Grangier, D., Yarats, D., and
Dauphin, Y. N. (2017). Convolutional sequence to se-
quence learning. In Proceedings of the 34th Interna-
tional Conference on Machine Learning-Volume 70,
pages 1243–1252. JMLR. org.
Goodfellow, I. (2016). Generative adversarial networks.
Conference and Workshop on Neural Information
Processing Systems(NIPS), page 2672–2680.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Courville, A., and Ben-
gio, Y. (2014). Generative adversarial nets. In
Advances in neural information processing systems,
pages 2672–2680.
He, T., Tan, X., Xia, Y., He, D., Qin, T., Chen, Z., and Liu,
T.-Y. (2018). Layer-wise coordination between en-
coder and decoder for neural machine translation. In
Advances in Neural Information Processing Systems,
pages 7944–7954.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term
memory. Neural computation, 9(8):1735–1780.
Husz
´
ar, F. (2015). How (not) to train your generative model:
Scheduled sampling, likelihood, adversary? arXiv
preprint arXiv:1511.05101.
Kalchbrenner, N. and Blunsom, P. (2013). Recurrent con-
tinuous translation models. In Proceedings of the 2013
Conference on Empirical Methods in Natural Lan-
guage Processing, pages 1700–1709.
Kalchbrenner, N., Espeholt, L., Simonyan, K., Oord, A.
v. d., Graves, A., and Kavukcuoglu, K. (2016). Neu-
ral machine translation in linear time. arXiv preprint
arXiv:1610.10099.
Kusner, M. J. and Hern
´
andez-Lobato, J. M. (2016).
Gans for sequences of discrete elements with
the gumbel-softmax distribution. arXiv preprint
arXiv:1611.04051.
Lamb, A. and Xie, M. (2016). Convolutional encoders for
neural machine translation. WEB download.
Lamb, A. M., Goyal, A. G. A. P., Zhang, Y., Zhang, S.,
Courville, A. C., and Bengio, Y. (2016). Professor
forcing: A new algorithm for training recurrent net-
works. In Advances In Neural Information Processing
Systems, pages 4601–4609.
Lample, G., Conneau, A., Denoyer, L., and Ranzato,
M. (2017). Unsupervised machine translation using
monolingual corpora only. CoRR.
Lample, G., Ott, M., Conneau, A., Denoyer, L., and Ran-
zato, M. (2018). Phrase-based & neural unsupervised
machine translation. In EMNLP, pp. 5039–5049.
Li, B. and Han, L. (2013). Distance weighted cosine simi-
larity measure for text classification. In International
Conference on Intelligent Data Engineering and Au-
tomated Learning, pages 611–618. Springer.
Luong, M.-T., Pham, H., and Manning, C. D. (2015). Ef-
fective approaches to attention-based neural machine
translation. arXiv preprint arXiv:1508.04025.
Mikolov, T., Karafi
´
at, M., Burget, L.,
ˇ
Cernock
`
y, J., and
Khudanpur, S. (2010). Recurrent neural network
based language model. In Eleventh annual confer-
ence of the international speech communication as-
sociation.
Muflikhah, L. and Baharudin, B. (2009). Document clus-
tering using concept space and cosine similarity mea-
surement. In 2009 International Conference on Com-
puter Technology and Development, volume 1, pages
58–62. IEEE.
Nguyen, H. V. and Bai, L. (2010). Cosine similarity metric
learning for face verification. In Asian conference on
computer vision, pages 709–720. Springer.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002).
Bleu: a method for automatic evaluation of machine
translation. In Proceedings of the 40th annual meeting
on association for computational linguistics, pages
311–318. Association for Computational Linguistics.
Ranzato, M., Chopra, S., Auli, M., and Zaremba, W. (2015).
Sequence level training with recurrent neural net-
works. arXiv preprint arXiv:1511.06732.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986).
Learning representations by back-propagating errors.
nature, 323(6088):533–536.
Sennrich, R., Haddow, B., and Birch, A. (2015). Neural
machine translation of rare words with subword units.
arXiv preprint arXiv:1508.07909.
Shen, S., Cheng, Y., He, Z., He, W., Wu, H., Sun, M., and
Liu, Y. (2015). Minimum risk training for neural ma-
chine translation. The Association for Computational
Linguistics(ACL).
Song, K., Tan, X., He, D., Lu, J., Qin, T., and Liu, T.-Y.
(2018). Double path networks for sequence to se-
quence learning. In Proceedings of the 27th Interna-
tional Conference on Computational Linguistics, pp.
3064–3074.
Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Se-
quence to sequence learning with neural networks. In
Advances in neural information processing systems,
pages 3104–3112.
Twin-GAN for Neural Machine Translation
95