particular HT is of low quality without another human
having to read it, then it is possible to prioritise such
low quality HTs for retranslation or to be verified by
a human in charge of quality control. This is particu-
larly useful when we have a large number of transla-
tions to verify and would like to check the ones which
are most likely to be incorrect. To understand the
scores assigned by BiMWMD to translations of dif-
ferent grades, in Figure 3, we randomly select trans-
lation pairs with different grades and show the scores
predicted by BiMWMD, which was the best method
among the methods proposed in Section 3. We see
that high scores are assigned by BiMWMD method
for translations that are also rated as high quality by
the human judges, whereas low scores are assigned to
translations that are considered to be of low quality by
the judges.
5 CONCLUSION
We proposed different methods for automatically pre-
dicting the quality of human translations, without
having access to any gold standard reference trans-
lations. In particular, we proposed a broad range of
measures covering both symmetric and asymmetric
measures. Our experimental results show that Bidi-
rectional Minimum Word Mover Distance method is
in particular demonstrates a high degree of correlation
with grades assigned by a group of judges to a collec-
tion of human done translations. In future, we plan to
evaluate this method for other language pairs and in-
tegrate it into a translation quality assurance system.
REFERENCES
Arora, S., Liang, Y., and Ma, T. (2017). A simple but tough-
to-beat baseline for sentence embeddings. In Proc. of
ICLR.
Artetxe, M., Labaka, G., and Agirre, E. (2017). Learning
bilingual word embeddings with (almost) no bilingual
data. In Proc. of ACL, pages 451–462.
Artetxe, M., Labaka, G., and Agirre, E. (2018). A robust
self-learning method for fully unsupervised cross-
lingual mappings of word embeddings. In Proc. of
ACL, pages 789–798.
B
¨
ar, D., Biemann, C., Gurevych, I., and Zesch, T. (2012).
Ukp: Computing semantic textual similarity by com-
bining multiple content similarity measures. In Proc.
of SemEval, pages 435–440.
Brychc
´
ın, T. (2018). Linear transformations for cross-
lingual semantic textual similarity. arXiv:1807.04172.
Brychc
´
ın, T. and Svoboda, L. (2016). Uwb at semeval-2016
task 1: Semantic textual similarity using lexical, syn-
tactic, and semantic information. In Proc. of SemEval,
pages 588–594.
Chandar A P, S., Lauly, S., Larochelle, H., Khapra, M.,
Ravindran, B., Raykar, V. C., and Saha, A. (2014).
An autoencoder approach to learning bilingual word
representations. In Proc. of NIPS, pages 1853–1861.
Chen, X. and Cardie, C. (2018). Unsupervised multilingual
word embeddings. pages 261–270.
Conneau, A., Lample, G., Ranzato, M., Denoyer, L., and
J
´
egou, H. (2017). Word translation without parallel
data. arXiv:1710.04087v3.
Corley, C. and Mihalcea, R. (2005). Measuring the semantic
similarity of texts. In Proc. of ACL Workshop, pages
13–18.
Faruqui, M. and Dyer, C. (2014). Improving vector space
word representations using multilingual correlation.
In Proc. of EACL, pages 462–471.
Gouws, S., Bengio, Y., and Corrado, G. (2015). Bil-
bowa: Fast bilingual distributed representations with-
out word alignments. In Proc. of ICML.
Gouws, S. and Søgaard, A. (2015). Simple task-specific
bilingual word embeddings. In Proc. of NAACL HLT,
pages 1386–1390.
Grave, E., Bojanowski, P., Gupta, P., Joulin, A., and
Mikolov, T. (2018). Learning word vectors for 157
languages. In Proc. of LREC, pages 3483–3487.
Han, L. (2016). Machine translation evaluation resources
and methods: A survey. arXiv.
Harris, Z. S. (1954). Distributional structure. Word, pages
146–162.
Hermann, K. M. and Blunsom, P. (2014). Multilingual mod-
els for compositional distributed semantics. In Proc.
of ACL, pages 58–68.
Islam, A. and Inkpen, D. (2008). Semantic text similarity
using corpus-based word similarity and string similar-
ity. ACM Transactions on Knowledge Discovery from
Data (TKDD), 2(2):10.
Kusner, M., Sun, Y., Kolkin, N., and Weinberger, K. (2015).
From word embeddings to document distances. In
Proc. of ICML, pages 957–966.
Lauly, S., Boulanger, A., and Larochelle, H. (2014). Learn-
ing multilingual word representations using a bag-of-
words autoencoder. arXiv.
Li, Y., McLean, D., Bandar, Z. A., Crockett, K., et al.
(2016). Sentence similarity based on semantic nets
and corpus statistics. IEEE Transactions on Knowl-
edge & Data Engineering, pages 1138–1150.
Luong, T., Pham, H., and Manning, C. D. (2015). Bilin-
gual word representations with monolingual quality in
mind. In Proc. of VSMNLP Workshop, pages 151–159.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013).
Efficient estimation of word representations in vector
space. In Proc. of ICLR.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002).
Bleu: a method for automatic evaluation of machine
translation. In Prof. of ACL, pages 311–318.
Rubner, Y., Tomasi, C., and Guibas, L. J. (2000). The earth
mover’s distance as a metric for image retrieval. In-
ternational journal of computer vision, pages 99–12.
Unsupervised Evaluation of Human Translation Quality
63