account word deletion nor word order, and neither do
our reference corpus (in which words are labelled as
correct or incorrect, but missing words are not indi-
cated). This serious drawback has to be addressed.
Assigning confidence scores to alignment might help
to this end. Second, we believe that in a context of
phrase-based translation, phrase-level confidence es-
timation would be more appropriate. Also many fea-
tures used in speech recognition or automatic transla-
tion could be used in confidence estimation: distant
models, word alignment, word spotting, etc... An-
other problem is the fusion of different classifiers. We
use a very simple single layer perceptron, but many
solutions have been proposed in literature to achieve
more appropriate merging. Finally, progress could
be made on classifiers’ evaluation: because classify-
ing a word as correct or incorrect is a very difficult
task even for a human translator, and because the re-
sults of such a task may vary according to the trans-
lator or worse, vary along time for a given translator,
we should combine different human-generated refer-
ences.
REFERENCES
Akiba, Y., Sumita, E., Nakaiwa, H., Yamamoto, S., and
Okuno, H. (2004). Using a mixture of n-best lists
from multiple MT systems in rank-sum-based confi-
dence measure for MT outputs. Proc. CoLing, pages
322–328.
Blatz, J., Fitzgerald, E., Foster, G., Gandrabur, S., Goutte,
C., Kulesza, A., Sanchis, A., and Ueffing, N. (2003).
Confidence estimation for machine translation. final
report, jhu/clsp summer workshop.
Brown, P., Pietra, S., Pietra, V., and Mercer, R. (1994).
The mathematic of statistical machine translation:
Parameter estimation. Computational Linguistics,
19(2):263–311.
Culotta, A. and McCallum, A. (2004). Confidence es-
timation for information extraction. Proceedings of
Human Language Technology Conference and North
American Chapter of the Association for Computa-
tional Linguistics (HLT-NAACL).
De Calm`es, M. and P´erennou, G. (1998). Bdlex: a lexi-
con for spoken and written french. In Proceedings of
1st International Conference on Langage Resources
& Evaluation.
Duchateau, J., Demuynck, K., and Wambacq, P. (2002).
Confidence scoring based on backward language
models. Acoustics, Speech, and Signal Processing,
2002. Proceedings.(ICASSP’02). IEEE International
Conference on, 1.
Gandrabur, S., Foster, G., and Lapalme, G. (2006). Confi-
dence estimation for NLP applications. ACM Transac-
tions on Speech and Language Processing, 3(3):1–29.
Guo, G., Huang, C., Jiang, H., and Wang, R. (2004). A
comparative study on various confidence measures in
large vocabulary speech recognition. 2004 Interna-
tional Symposium on Chinese Spoken Language Pro-
cessing, pages 9–12.
Koehn, P. (2005). Europarl: A parallel corpus for statistical
machine translation. MT Summit, 5.
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Fed-
erico, M., Bertoldi, N., Cowan, B., Shen, W., Moran,
C., Zens, R., et al. (2007). Moses: Open source toolkit
for statistical machine translation. Proceedings of the
Annual Meeting of the Association for Computational
Linguistics, demonstation session.
Lavecchia, C., Smaili, K., Langlois, D., and Haton, J.
(2007). Using inter-lingual triggers for machine trans-
lation. Eighth conference INTERSPEECH.
Mauclair, J. (2006). Mesures de confiance en traitement
automatique de la parole et applications. PhD thesis,
LIUM, Le Mans, France.
Moore, R. C. (2005). Association-based bilingual word
alignment. In Proceedings of the ACL Workshop on
Building and Using Parallel Texts, Ann Arbor, Michi-
gan, pp. 1-8.
Och, F. (2000). Giza++ tools for training statistical transla-
tion models.
Razik, J. (2004). Mesures de Confiance trame-synchrones et
locales en reconnaissance automatique de la parole.
PhD thesis, LORIA, Nancy, FRANCE.
Sma¨ıli, K., Jamoussi, S., Langlois, D., and Haton, J. (2004).
Statistical feature language model. Proc. ICSLP.
Stolcke, A. (2002). SRILM – an extensible language mod-
eling toolkit. pages 901–904.
Ueffing, N. and Ney, H. (2004). Bayes decision rule and
confidence measures for statistical machine transla-
tion. pages 70–81. Springer.
Ueffing, N. and Ney, H. (2005). Word-level confidence es-
timation for machine translation using phrase-based
translation models. Proceedings of the conference on
Human Language Technology and Empirical Methods
in Natural Language Processing, pages 763–770.
Uhrik, C. and Ward, W. (1997). Confidence Metrics Based
on N-Gram Language Model Backoff Behaviors. In
Fifth European Conference on Speech Communica-
tion and Technology. ISCA.
ICAART 2009 - International Conference on Agents and Artificial Intelligence
68