from Wikipedia, which was necessary with our small
dataset, may help to avoid the cross-validation re-
sults in future work. Furthermore, the models may
be improved in the following ways. Firstly, overall
improvements may be achieved by simply increas-
ing the sizes of the models and optimizing their hy-
perparameters. Secondly, a named-entity recognition
system could be incorporated, because named-entities
seem to be a bottleneck for both models and cause
important errors. In future research we would also
like to extend the research to multilingual translation.
Finally, our dataset also contains images of the mu-
seum objects concerned. Thus, an interesting exten-
sion would be multi-modal translation in the domain
of cultural heritage.
REFERENCES
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z.,
Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin,
M., Ghemawat, S., Goodfellow, I., Harp, A., Irving,
G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kud-
lur, M., Levenberg, J., Man
´
e, D., Monga, R., Moore,
S., Murray, D., Olah, C., Schuster, M., Shlens, J.,
Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Van-
houcke, V., Vasudevan, V., Vi
´
egas, F., Vinyals, O.,
Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and
Zheng, X. (2015). TensorFlow: Large-scale machine
learning on heterogeneous systems. Software avail-
able from tensorflow.org.
Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural ma-
chine translation by jointly learning to align and trans-
late. arXiv preprint arXiv:1409.0473.
Cherry, C., Foster, G., Bapna, A., Firat, O., and Macherey,
W. (2018). Revisiting character-based neural machine
translation with capacity and compression. arXiv
preprint arXiv:1808.09943.
Cho, K., Van Merri
¨
enboer, B., Gulcehre, C., Bahdanau, D.,
Bougares, F., Schwenk, H., and Bengio, Y. (2014).
Learning phrase representations using rnn encoder-
decoder for statistical machine translation. arXiv
preprint arXiv:1406.1078.
Chung, J., Cho, K., and Bengio, Y. (2016). A character-level
decoder without explicit segmentation for neural ma-
chine translation. arXiv preprint arXiv:1603.06147.
Costa-Jussa, M. R. and Fonollosa, J. A. (2016). Character-
based neural machine translation. arXiv preprint
arXiv:1603.00810.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term
memory. Neural computation, 9(8):1735–1780.
Johnson, M., Schuster, M., Le, Q. V., Krikun, M., Wu, Y.,
Chen, Z., Thorat, N., Vi
´
egas, F., Wattenberg, M., Cor-
rado, G., et al. (2017). Google’s multilingual neural
machine translation system: Enabling zero-shot trans-
lation. Transactions of the Association for Computa-
tional Linguistics, 5:339–351.
Kim, Y., Jernite, Y., Sontag, D., and Rush, A. M. (2016).
Character-aware neural language models. In Thirtieth
AAAI Conference on Artificial Intelligence.
Kingma, D. P. and Ba, J. (2014). Adam: A
method for stochastic optimization. arXiv preprint
arXiv:1412.6980.
Koehn, P. and Knowles, R. (2017). Six challenges
for neural machine translation. arXiv preprint
arXiv:1706.03872.
Lee, J., Cho, K., and Hofmann, T. (2017). Fully character-
level neural machine translation without explicit seg-
mentation. Transactions of the Association for Com-
putational Linguistics, 5:365–378.
Ling, W., Trancoso, I., Dyer, C., and Black, A. W. (2015).
Character-based neural machine translation. arXiv
preprint arXiv:1511.04586.
Luong, M.-T. and Manning, C. D. (2016). Achiev-
ing open vocabulary neural machine translation
with hybrid word-character models. arXiv preprint
arXiv:1604.00788.
Ma, Q., Bojar, O., and Graham, Y. (2018). Results of
the wmt18 metrics shared task: Both characters and
embeddings achieve good performance. In Proceed-
ings of the Third Conference on Machine Translation:
Shared Task Papers, pages 671–688.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002).
Bleu: a method for automatic evaluation of machine
translation. In Proceedings of the 40th annual meeting
on association for computational linguistics, pages
311–318. Association for Computational Linguistics.
Popovi
´
c, M. (2015). chrf: character n-gram f-score for au-
tomatic mt evaluation. In Proceedings of the Tenth
Workshop on Statistical Machine Translation, pages
392–395.
Sennrich, R., Haddow, B., and Birch, A. (2015). Neural
machine translation of rare words with subword units.
arXiv preprint arXiv:1508.07909.
Skadin¸
ˇ
s, R., Tiedemann, J., Rozis, R., and Deksne, D.
(2014). Billions of parallel words for free: Building
and using the eu bookshop corpus. In Proceedings
of the Ninth International Conference on Language
Resources and Evaluation (LREC’14), pages 1850–
1855.
Srivastava, R. K., Greff, K., and Schmidhuber, J. (2015).
Training very deep networks. In Advances in neural
information processing systems, pages 2377–2385.
Wang, W., Peter, J.-T., Rosendahl, H., and Ney, H. (2016).
Character: Translation edit rate on character level.
In Proceedings of the First Conference on Machine
Translation: Volume 2, Shared Task Papers, pages
505–510.
Transfer Learning for Digital Heritage Collections: Comparing Neural Machine Translation at the Subword-level and Character-level
529