Audio/Visual Recurrences and Decision Trees for Unsupervised TV Program Structuring

Alina Elma Abduraman, Sid-Ahmed Berrani, Bernard Merialdo

Abstract

This paper addresses the problem of unsupervised TV program structuring. Program structuring allows direct and non linear access to the desired parts of a program. Our work addresses the structuring of programs like news, entertainment, shows, magazines... It is based on the detection of audio and visual recurrences. It proposes an effective classification and selection system, based on decision trees, that allows the detection of “separators” among these recurrences. Separators are short audio/visual sequences that delimit the different parts of a program. The decision trees are built based on attributes issued from techniques like applause detection, scenes segmentation, face/speaker detection and clustering. The approach has been evaluated on a 112 hours dataset corresponding to 169 episodes of TV programs.

References

  1. Abduraman, A. E., Berrani, S.-A., and Merialdo, B. (2011a). Audio recurrence contribution to a videobased tv program structuring approach. In IEEE Int. Symposium on Multimedia, Dana Point, CA, USA.
  2. Abduraman, A. E., Berrani, S.-A., Rault, J.-B., and Blouch, O. L. (2011b). From audio recurrences to tv program structuring. In 4th Int. Workshop on Automated Media Analysis and Production for Novel TV Services, Scottsdale, Arizona, USA.
  3. Ben, M. and Gravier, G. (2011). Unsupervised mining of audiovisually consistent segments in videos with application to structure analysis. In IEEE Int. Conf. on Multimedia and Exhibition, Barcelone, Espagne.
  4. Berrani, S.-A., Manson, G., and Lechat, P. (2008). A non-supervised approach for repeated sequence detection in tv broadcast streams. Image Communication, 23(7):525-537.
  5. Chaisorn, L., Chua, T.-S., and Lee, C.-H. (2003). A multimodal approach to story segmentation for news video. World Wide Web, 6(2):187-208.
  6. Claude Barras, Xuan Zhu, S. M. and Gauvain, J.-L. (2006). Multistage speaker diarization of broadcast news. IEEE Trans. On Audio, Speech and Language Processing, 14(5):1505-1512.
  7. Eickeler, S., Wallhoff, F., Iurgel, U., and Rigoll, G. (2001). Content based indexing of images and video using face detection and recognition methods. In IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Salt Lake City, Utah, USA.
  8. Garcia, C. and Delakis, M. (2004). Convolutional face finder: A neural architecture for fast and robust face detection. IEEE Trans. on Pattern Analysis and Machine Intelligence, 26(11):1408-1423.
  9. Goela, N., Wilson, K., Niu, F., and Divakaran, A. (2007). An svm framework for genre-independent scene change detection. In IEEE Int. Conf. on Multimedia and Expo, Beijing, China.
  10. Herley, C. (2006). Argos: automatically extracting repeating objects from multimedia streams. In IEEE Trans. on Multimedia, vol. 8, pages 115-129.
  11. Jacobs, A. (2006). Using self-similarity matrices for structure mining on news video. Advances in Artificial Intelligence, 3955:87-94.
  12. Kompatsiaris, D. Y., Merialdo, P. B., and Lian, D. S., editors (2011). TV Content Analysis: Techniques and Applications. CRC Press, Taylor Francis LLC.
  13. Misra, H., Hopfgartner, F., Goyal, A., Punitha, P., and Jose, J. M. (2010). Tv news story segmentation based on semantic coherence and content similarity. In 16th Int. Multimedia Modeling Conf., Chongqing, China.
  14. Muscariello, A., Gravier, G., and Bimbot, F. (2009). Audio keyword extraction by unsupervised word discovery. In Conf. of the Int. Speech Communication Association (Interspeech), Brighton UK.
  15. Scheirer, E. and Slaney, M. (1997). Construction and evaluation of a robust multifeature speech/music discriminator. In IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Munich, Germany.
  16. Sidiropoulos, P., Mezaris, V., Kompatsiaris, I., Meinedo, H., Bugalho, M., and Trancoso, I. (2010). On the use of audio events for improving video scene segmentation. In 11th Int. Workshop on Image Analysis for Multimedia Interactive Services, Desenzano del garda, Italy.
  17. Tjondronegoro, D. W. and Chen, Y.-P. P. (2010). Knowledge-discounted event detection in sports video. IEEE Trans. on Systems, Man, and Cybernetics, Part A: Systems and Humans, 40(5):1009-1024.
  18. Xie, L., Xu, P., Chang, S.-F., Divakaran, A., and Sun, H. (2004). Structure analysis of soccer video with domain knowledge and hidden markov models. Pattern Recognition Letters, 25(7):767 - 775.
  19. Zhai, Y. and Shah, M. (2006). Video scene segmentation using markov chain monte carlo. IEEE Trans. on Multimedia, 8(4):686 - 697.
Download


Paper Citation


in Harvard Style

Abduraman A., Berrani S. and Merialdo B. (2013). Audio/Visual Recurrences and Decision Trees for Unsupervised TV Program Structuring . In Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013) ISBN 978-989-8565-47-1, pages 701-708. DOI: 10.5220/0004300307010708


in Bibtex Style

@conference{visapp13,
author={Alina Elma Abduraman and Sid-Ahmed Berrani and Bernard Merialdo},
title={Audio/Visual Recurrences and Decision Trees for Unsupervised TV Program Structuring},
booktitle={Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)},
year={2013},
pages={701-708},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004300307010708},
isbn={978-989-8565-47-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)
TI - Audio/Visual Recurrences and Decision Trees for Unsupervised TV Program Structuring
SN - 978-989-8565-47-1
AU - Abduraman A.
AU - Berrani S.
AU - Merialdo B.
PY - 2013
SP - 701
EP - 708
DO - 10.5220/0004300307010708