Sylvain Raybaud, Caroline Lavecchia, David Langlois, Kamel Smaïli


A confidence measure is able to estimate the reliability of an hypothesis provided by a machine translation system. The problem of confidence measure can be seen as a process of testing: we want to decide whether the most probable sequence of words provided by the machine translation system is correct or not. In the following we describe several original word-level confidence measures for machine translation, based on mutual information, n-gram language model and lexical features language model. We evaluate how well they perform individually or together, and show that using a combination of confidence measures based on mutual information yields a classification error rate as low as 25.1% with an F-measure of 0.708.


  1. Akiba, Y., Sumita, E., Nakaiwa, H., Yamamoto, S., and Okuno, H. (2004). Using a mixture of n-best lists from multiple MT systems in rank-sum-based confidence measure for MT outputs. Proc. CoLing, pages 322-328.
  2. Blatz, J., Fitzgerald, E., Foster, G., Gandrabur, S., Goutte, C., Kulesza, A., Sanchis, A., and Ueffing, N. (2003). Confidence estimation for machine translation. final report, jhu/clsp summer workshop.
  3. Brown, P., Pietra, S., Pietra, V., and Mercer, R. (1994). The mathematic of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263-311.
  4. Culotta, A. and McCallum, A. (2004). Confidence estimation for information extraction. Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL).
  5. De Calmès, M. and Pérennou, G. (1998). Bdlex: a lexicon for spoken and written french. In Proceedings of 1st International Conference on Langage Resources & Evaluation.
  6. Duchateau, J., Demuynck, K., and Wambacq, P. (2002). Confidence scoring based on backward language models. Acoustics, Speech, and Signal Processing, 2002. Proceedings.(ICASSP'02). IEEE International Conference on, 1.
  7. Gandrabur, S., Foster, G., and Lapalme, G. (2006). Confidence estimation for NLP applications. ACM Transactions on Speech and Language Processing, 3(3):1-29.
  8. Guo, G., Huang, C., Jiang, H., and Wang, R. (2004). A comparative study on various confidence measures in large vocabulary speech recognition. 2004 International Symposium on Chinese Spoken Language Processing, pages 9-12.
  9. Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. MT Summit, 5.
  10. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al. (2007). Moses: Open source toolkit for statistical machine translation. Proceedings of the Annual Meeting of the Association for Computational Linguistics, demonstation session.
  11. Lavecchia, C., Smaili, K., Langlois, D., and Haton, J. (2007). Using inter-lingual triggers for machine translation. Eighth conference INTERSPEECH.
  12. Mauclair, J. (2006). Mesures de confiance en traitement automatique de la parole et applications. PhD thesis, LIUM, Le Mans, France.
  13. Moore, R. C. (2005). Association-based bilingual word alignment. In Proceedings of the ACL Workshop on Building and Using Parallel Texts, Ann Arbor, Michigan, pp. 1-8.
  14. Och, F. (2000). Giza++ tools for training statistical translation models.
  15. Razik, J. (2004). Mesures de Confiance trame-synchrones et locales en reconnaissance automatique de la parole. PhD thesis, LORIA, Nancy, FRANCE.
  16. Smaïli, K., Jamoussi, S., Langlois, D., and Haton, J. (2004). Statistical feature language model. Proc. ICSLP.
  17. Stolcke, A. (2002). SRILM - an extensible language modeling toolkit. pages 901-904.
  18. Ueffing, N. and Ney, H. (2004). Bayes decision rule and confidence measures for statistical machine translation. pages 70-81. Springer.
  19. Ueffing, N. and Ney, H. (2005). Word-level confidence estimation for machine translation using phrase-based translation models. Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 763-770.
  20. Uhrik, C. and Ward, W. (1997). Confidence Metrics Based on N-Gram Language Model Backoff Behaviors. In Fifth European Conference on Speech Communication and Technology. ISCA.

Paper Citation

in Harvard Style

Raybaud S., Lavecchia C., Langlois D. and Smaïli K. (2009). NEW CONFIDENCE MEASURES FOR STATISTICAL MACHINE TRANSLATION . In Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8111-66-1, pages 61-68. DOI: 10.5220/0001660600610068

in Bibtex Style

author={Sylvain Raybaud and Caroline Lavecchia and David Langlois and Kamel Smaïli},
booktitle={Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},

in EndNote Style

JO - Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
SN - 978-989-8111-66-1
AU - Raybaud S.
AU - Lavecchia C.
AU - Langlois D.
AU - Smaïli K.
PY - 2009
SP - 61
EP - 68
DO - 10.5220/0001660600610068