HMM INVERSION WITH FULL AND DIAGONAL COVARIANCE MATRICES FOR AUDIO-TO-VISUAL CONVERSION

Lucas D. Terissi; Juan C. Gómez

doi:10.5220/0001941001680173

HMM INVERSION WITH FULL AND DIAGONAL COVARIANCE MATRICES FOR AUDIO-TO-VISUAL CONVERSION

Lucas D. Terissi, Juan C. Gómez

2008

Abstract

A speech driven MPEG-4 compliant facial animation system is proposed in this paper. The main feature of the system is the audio-to-visual conversion based on the inversion of an Audio-Visual Hidden Markov Model. The Hidden Markov Model Inversion algorithm is derived for the general case of considering full covariance matrices for the audio-visual observations. A performance comparison with the more common case of considering diagonal covariance matrices is carried out. Experimental results show that the use of full covariance matrices is preferable since it leads to an accurate estimation of the visual parameters, yielding the same performance as in the case of using diagonal covariance matrices, but with a less complex model.

References

Baum, L. E. and Sell, G. R. (1968). Growth functions for transformations on manifolds. Pacific Journal of Mathematics, 27(2):211-227.
Brand, M. (1999). Voice puppetry. In Proceedings of SIGGRAPH, pages 21-28, Los Angeles, CA USA.
Chen, T. (2001). Audiovisual speech processing. IEEE Signal Processing Magazine, 18(1):9-21.
Choi, K., Luo, Y., and Hwang, J. (2001). Hidden Markov Model inversion for audio-to-visual conversion in an MPEG-4 facial animation system. Journal of VLSI Signal Processing, 29(1-2):51-61.
Fu, S., Gutierrez-Osuna, R., Esposito, A., Kakumanu, P., and Garcia, O. (2005). Audio/visual mapping with cross-modal Hidden Markov Models. IEEE Trans. on Multimedia, 7(2):243-252.
Hyvärinen, A., Karhunen, J., and Oja, E. (2001). Independent Component Analysis. John Wiley & Sons, Inc., New York.
ISO/IEC IS 14496-2, Visual (1999).
Moon, S. and Hwang, J. (1995). Noisy speech recognition using robust inversion of Hidden Markov Models. In Proceedings of IEEE International Conf. Acoust., Speech, Signal Processing, pages 145-148.
Ostermann, J. (2002). MPEG-4 Facial Animation - The Standard, Implementation and Applications, chapter Face Animation in MPEG-4, pages 17-56. John Wiley & Sons.
Rao, R., Chen, T., and Mersereau, R. (1998). Audioto-visual conversion for multimedia communication. IEEE Trans. on Industrial Electronics, 45(1):15-22.
Terissi, L. D. and Gómez, J. C. (2007). Facial motion tracking and animation: An ICA-based approach. In Proceedings of 15th European Signal Processing Conference, pages 292-296, PoznaÁ, Poland.
Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. on Information Theories, 13:260-269.
Xie, L. and Liu, Z.-Q. (2007). A coupled HMM approach to video-realistic speech animation. Pattern Recognition, 40:2325-2340.
Yamamoto, E., Nakamura, S., and Shikano, K. (1998). Lip movement synthesis from speech based on Hidden Markov Models. Speech Communication, 26(1- 2):105-115.

Download

Paper Citation

in Harvard Style

D. Terissi L. and C. Gómez J. (2008). HMM INVERSION WITH FULL AND DIAGONAL COVARIANCE MATRICES FOR AUDIO-TO-VISUAL CONVERSION . In Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2008) ISBN 978-989-8111-60-9, pages 168-173. DOI: 10.5220/0001941001680173

in Bibtex Style

@conference{sigmap08,
author={Lucas D. Terissi and Juan C. Gómez},
title={HMM INVERSION WITH FULL AND DIAGONAL COVARIANCE MATRICES FOR AUDIO-TO-VISUAL CONVERSION},
booktitle={Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2008)},
year={2008},
pages={168-173},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001941001680173},
isbn={978-989-8111-60-9},
}

in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2008)
TI - HMM INVERSION WITH FULL AND DIAGONAL COVARIANCE MATRICES FOR AUDIO-TO-VISUAL CONVERSION
SN - 978-989-8111-60-9
AU - D. Terissi L.
AU - C. Gómez J.
PY - 2008
SP - 168
EP - 173
DO - 10.5220/0001941001680173