A Complete Framework for Fully-automatic People Indexing in Generic Videos

Dario Cazzato; Marco Leo; Cosimo Distante

doi:10.5220/0004653502480255

A Complete Framework for Fully-automatic People Indexing in Generic Videos

Dario Cazzato, Marco Leo, Cosimo Distante

2014

Abstract

Face indexing is a very popular research topic and it has been investigated over the last 10 years. It can be used for a wide range of applications such as automatic video content analysis, data mining, video annotation and labeling, etc. In this work a fully automated framework that can detect how many people are present in a generic video (even having low resolution and/or taken from a mobile camera) is presented. It also extracts the intervals of frames in which each person appears. The main contributions of the proposed work are that no initializations neither a priory knowledge about the scene contents are required. Moreover, this approach introduces a generalized version of the k-means method that, through different statistical indices, automatically determines the number of people in the scene.

References

Arandjelovic, O. and Cipolla, R. (2006). Automatic cast listing in feature-length films with anisotropic manifold space. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2, pages 1513-1520. IEEE.
Bansal, N., Blum, A., and Chawla, S. (2004). Correlation clustering. Machine Learning, 56(1-3):89-113.
CaliÁski, T. and Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-theory and Methods, 3(1):1-27.
Choi, J. Y., Plataniotis, K. N., and Ro, Y. M. (2010). Face annotation for online personal videos using color feature fusion based face recognition. In Multimedia and Expo (ICME), 2010 IEEE International Conference on, pages 1190-1195. IEEE.
Davies, D. L. and Bouldin, D. W. (1979). A cluster separation measure. Pattern Analysis and Machine Intelligence, IEEE Transactions on, (2):224-227.
Delezoide, B., Nouri, D., and Hamlaoui, S. (2011). On-line characters identification in movies. In Content-Based Multimedia Indexing (CBMI), 2011 9th International Workshop on, pages 169-174. IEEE.
Foucher, S. and Gagnon, L. (2007). Automatic detection and clustering of actor faces based on spectral clustering techniques. In Computer and Robot Vision, 2007. CRV'07. Fourth Canadian Conference on, pages 113- 122. IEEE.
Görür, D. and Rasmussen, C. E. (2010). Dirichlet process gaussian mixture models: Choice of the base distribution. Journal of Computer Science and Technology, 25(4):653-664.
Hao, P. and Kamata, S.-i. (2011). Multi balanced trees for face retrieval from image database. In Signal and Image Processing Applications (ICSIPA), 2011 IEEE International Conference on, pages 484-489. IEEE.
Hartigan, J. A. (1975). Clustering algorithms. John Wiley & Sons, Inc.
Hu, W., Xie, N., Li, L., Zeng, X., and Maybank, S. (2011). A survey on visual content-based video indexing and retrieval. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 41(6):797-819.
Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32(3):241-254.
Kaufman, L. and Rousseeuw, P. J. (2009). Finding groups in data: an introduction to cluster analysis, volume 344. Wiley-Interscience.
Krzanowski, W. J. and Lai, Y. (1988). A criterion for determining the number of groups in a data set using sumof-squares clustering. Biometrics, pages 23-34.
MacQueen, J. et al. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1, page 14. California, USA.
Pham, P., Moens, M.-F., and Tuytelaars, T. (2008). Linking names and faces: Seeing the problem in different ways. In Proceedings of the 10th European conference on computer vision: workshop faces in'reallife'images: detection, alignment, and recognition, pages 68-81.
Prinosil, J. (2011). Blind face indexing in video. In Telecommunications and Signal Processing (TSP), 2011 34th International Conference on, pages 575- 578. IEEE.
Satoh, S., Nakamura, Y., and Kanade, T. (1999). Name-it: Naming and detecting faces in news videos. MultiMedia, IEEE, 6(1):22-35.
Sivic, J., Everingham, M., and Zisserman, A. (2009). who are you?-learning person specific classifiers from video. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 1145- 1152. IEEE.
Turk, M. and Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1):71-86.
Viola, P. and Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, volume 1, pages I-511. IEEE.
Wold, S., Esbensen, K., and Geladi, P. (1987). Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1):37-52.
Zhu, C., Wen, F., and Sun, J. (2011). A rank-order distance based clustering algorithm for face tagging. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 481-488. IEEE.

Download

Paper Citation

in Harvard Style

Cazzato D., Leo M. and Distante C. (2014). A Complete Framework for Fully-automatic People Indexing in Generic Videos . In Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014) ISBN 978-989-758-004-8, pages 248-255. DOI: 10.5220/0004653502480255

in Bibtex Style

@conference{visapp14,
author={Dario Cazzato and Marco Leo and Cosimo Distante},
title={A Complete Framework for Fully-automatic People Indexing in Generic Videos},
booktitle={Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)},
year={2014},
pages={248-255},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004653502480255},
isbn={978-989-758-004-8},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)
TI - A Complete Framework for Fully-automatic People Indexing in Generic Videos
SN - 978-989-758-004-8
AU - Cazzato D.
AU - Leo M.
AU - Distante C.
PY - 2014
SP - 248
EP - 255
DO - 10.5220/0004653502480255