Word Alignment Quality in the IBM 2 Mixture Model

Jorge Civera, Alfons Juan

Abstract

Finite mixture modelling is a standard pattern recognition technique. However, in statistical machine translation (SMT), the use of mixture modelling is currently being explored. Two main advantages of the mixture approach are first, its flexibility to find an appropriate tradeoff between model complexity and the amount of training data available and second, its capability to learn specific probability distributions that better fit subsets of the training dataset. This latter advantage is even more important in SMT, since it is widely accepted that most state-of-the-art translation models proposed have limited application to restricted semantic domains. In this work, we revisit the mixture extension of the well-known M21 translation model. The M2 mixture model is evaluated on a word alignment large-scale task obtaining encouraging results that prove the applicability of finite mixture modelling in SMT.

Download


Paper Citation


in Harvard Style

Civera J. and Juan A. (2008). Word Alignment Quality in the IBM 2 Mixture Model . In Proceedings of the 8th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2008) ISBN 978-989-8111-42-5, pages 93-102. DOI: 10.5220/0001739700930102


in Bibtex Style

@conference{pris08,
author={Jorge Civera and Alfons Juan},
title={Word Alignment Quality in the IBM 2 Mixture Model},
booktitle={Proceedings of the 8th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2008)},
year={2008},
pages={93-102},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001739700930102},
isbn={978-989-8111-42-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2008)
TI - Word Alignment Quality in the IBM 2 Mixture Model
SN - 978-989-8111-42-5
AU - Civera J.
AU - Juan A.
PY - 2008
SP - 93
EP - 102
DO - 10.5220/0001739700930102