Smart Video Orchestration for Immersive Communication

Alaeddine Mihoub, Emmanuel Marilly


In the context of immersive communication and in order to enrich attentional immersion in videoconferences for remote attendants, the problem of camera orchestration has been evoked. It consists of selecting and displaying the most relevant view or camera. HMMs have been chosen to model the different video events and video orchestration models. A specific algorithm taking as input high level observations and enabling non expert users to train the videoconferencing system has been developed.


  1. Lavee G., Rivlin E., and Rudzsky M., “Understanding Video Events: A Survey of Methods for Automatic Interpretation of Semantic Occurrences in Video,” Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 39, no. 5, pp. 489 -504, Sep. 2009.
  2. Mayer, R. E., 2001, “Multimedia learning.” Cambridge University Press.
  3. Engleberg, I. N. and Wynn D. R., 2006, Working in Groups: Communication Principles and Strategies.
  4. Al-Hames M., Dielmann A., Gatica-Perez D., Reiter S., Renals S., Rigoll G., and Zhang D., 2006, “Multimodal Integration for Meeting Group Action Segmentation and Recognition,” in Machine Learning for Multimodal Interaction, vol. 3869, Springer Berlin Heidelberg, 2006, pp. 52-63.
  5. Al-Hames M., Hörnler B., Scheuermann C., and Rigoll G., 2006, “Using Audio, Visual, and Lexical Features in a Multi-modal Virtual Meeting Director,” in Machine Learning for Multimodal Interaction, vol. 4299, S. Renals, S. Bengio, and J. G. Fiscus, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, pp. 63-74.
  6. Hörnler B., Arsic D., Schuller B., and Rigoll G., 2009, “Boosting multi-modal camera selection with semantic features,” in Proceedings of the 2009 IEEE international conference on Multimedia and Expo, Piscataway, NJ, USA, 2009, pp. 1298-1301.
  7. Al-Hames M., Hornler B., Muller R., Schenk J., and Rigoll G., 2007, “Automatic Multi-Modal Meeting Camera Selection for Video-Conferences and Meeting Browsers,” in Multimedia and Expo, 2007 IEEE International Conference on, 2007, pp. 2074 -2077.
  8. Ding Y. and Fan G., 2006, “Camera View-Based American Football Video Analysis,” in Multimedia, 2006. ISM'06. Eighth IEEE International Symposium on, 2006, pp. 317 -322.
  9. Fourati N., Marilly E., 2012, “Gestures for natural interaction with video”, Electronic Imaging 2012, Jan. 2012, Proceedings of SPIE Vol. 8305.
  10. Cheung S. and Kamath C., 2004, ”Robust techniques for background subtraction in urban traffic video” Electronic Imaging: Visual Communications and Image, San Jose, California, January 20-22 2004.
  11. Hromada D., Tijus C., Poitrenaud S., Nadel J., 2010, "Zygomatic Smile Detection: The Semi-Supervised Haar Training of a Fast and Frugal System" in IEEE International Conference on Research, Innovation and Vision for the Future - RIVF , 2010.
  12. O'Gorman L., 2010, Latency in Speech Feature Analysis for Telepresence Event Coding" in 20th International Conference on Pattern Recognition (ICPR), Aug. 2010.
  13. Huang X. D., Ariki Y., and Jack M. A., 1990, “Hidden Markov Model for Speech Recognition.” Edmgurgh Univ. Press, 1990.
  14. Viterbi A., “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm,” Information Theory, IEEE Transactions on, vol. 13, no. 2, pp. 260 -269, Apr. 1967.
  15. Dempster A. P., Laird N. M., and Rubin D. B., 1977, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal statistical Society, Series B, vol. 39, no. 1, pp. 1-38, 1977.
  16. Baum L. E., Petrie T., Soules G., and Weiss N., 1970, “A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains,” The Annals of Mathematical Statistics, vol. 41, no. 1, pp. 164-171, Feb. 1970.

Paper Citation

in Harvard Style

Mihoub A. and Marilly E. (2013). Smart Video Orchestration for Immersive Communication . In Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013) ISBN 978-989-8565-47-1, pages 752-756. DOI: 10.5220/0004230307520756

in Bibtex Style

author={Alaeddine Mihoub and Emmanuel Marilly},
title={Smart Video Orchestration for Immersive Communication},
booktitle={Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)},

in EndNote Style

JO - Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)
TI - Smart Video Orchestration for Immersive Communication
SN - 978-989-8565-47-1
AU - Mihoub A.
AU - Marilly E.
PY - 2013
SP - 752
EP - 756
DO - 10.5220/0004230307520756