INTERVENANT CLASSIFICATION IN AN AUDIOVISUAL DOCUMENT

Jeremy Philippeau, Julien Pinquier, Philippe Joly

2006

Abstract

This document deals with the definition of a new descriptor for audiovisual document indexing : the intervenant. We actually focus on its audiovisual localization, this is to say its place in an audiovisual sequence and its classification in 3 categories : IN, OUT or OFF. Based on the comparison of different analysis tools of both audio and video modes, we define a set of descriptors which can automatically be filled, potentially relevant to classify the intervenant localization. This decision is taken on the base of transition modeling between classes.

References

  1. Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. In IEEE Trans. Acoust. Speech Signal Process., volume 29, pages 254-272.
  2. Jaffre, G. and Joly, P. (2004). Costume: A new feature for automatic video content indexing. In RIAO 2004, pages 314-325, Avignon, France.
  3. Kijak, E. (2003). Structuration multimodale des videos de sports par modeles stochastiques. PhD thesis, Universite de Rennes 1.
  4. Kraaij, W., Smeaton, A., Over, P., and Arlandis, J. (2004). Trecvid 2004 - an introduction. In Proceedings of the TRECVID 2004 Workshop, pages 1-13, Gaithersburg, Maryland, USA.
  5. Mokbel, C., Jouvet, D., and J., M. (1995). Blind equalization using adaptitive filtering for improving speech recognition over telephone. In European Conference on Speech Communication and Technology, pages 817-820, Madrid, Spain.
  6. Potamianos, G., Graf, H., and Cosatto, E. (1998). An image transform approch for hmm based automatic lipreading. In Proceedings of the Internationnal Conference on Image Processing, volume 3, pages 173-177, Chicago.
  7. Potamianos, G., Neti, C., Luettin, J., and Matthews, I. (2004). Audio-visual automatic speech recognition: An overview. In Bailly, G., Vatikiotis-Bateson, E., and Perrier, P., editors, Issues in Visual and Audio-Visual Speech Processing. MIT Press.
  8. Tianhao, L., Q.-J. F. (2006). Analyze perceptual adaptation to spectrally-shifted vowels with gmm technique. In 10th Annual Fred S. Grodins Graduate Research Symposium, pages 120-121. USC School of Engineering.
Download


Paper Citation


in Harvard Style

Philippeau J., Pinquier J. and Joly P. (2006). INTERVENANT CLASSIFICATION IN AN AUDIOVISUAL DOCUMENT . In Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2006) ISBN 978-972-8865-64-1, pages 185-188. DOI: 10.5220/0001570801850188


in Bibtex Style

@conference{sigmap06,
author={Jeremy Philippeau and Julien Pinquier and Philippe Joly},
title={INTERVENANT CLASSIFICATION IN AN AUDIOVISUAL DOCUMENT},
booktitle={Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2006)},
year={2006},
pages={185-188},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001570801850188},
isbn={978-972-8865-64-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2006)
TI - INTERVENANT CLASSIFICATION IN AN AUDIOVISUAL DOCUMENT
SN - 978-972-8865-64-1
AU - Philippeau J.
AU - Pinquier J.
AU - Joly P.
PY - 2006
SP - 185
EP - 188
DO - 10.5220/0001570801850188