Real-time Accurate Pedestrian Detection and Tracking in Challenging Surveillance Videos

Kristof Van Beeck, Toon Goedemé


This paper proposes a novel approach for real-time robust pedestrian tracking in surveillance images. Such images are challenging to analyse since the overall image quality is low (e.g. low resolution and high compression). Furthermore often birds-eye viewpoint wide-angle lenses are used to achieve maximum coverage with a minimal amount of cameras. These specific viewpoints make it difficult - or even unfeasible - to directly apply existing pedestrian detection techniques. Moreover, real-time processing speeds are required. To overcome these problems we introduce a pedestrian detection and tracking framework which exploits and integrates these scene constraints to achieve excellent accuracy results. We performed extensive experiments on challenging real-life video sequences concerning both speed and accuracy. We show that our approach achieves excellent accuracy results while still meeting the stringent real-time demands needed for these surveillance applications, using only a single-core CPU implementation.


  1. Benenson, R., Mathias, M., Timofte, R., and Van Gool, L. (2012a). Fast stixels computation for fast pedestrian detection. In ECCV, CVVT workshop, pages 11-20.
  2. (2012b). Pedestrian detection at 100 frames per second. In Proceedings of CVPR, pages 2903-2910.
  3. Benenson, R., Mathias, M., Tuytelaars, T., and Van Gool, L. (2013). Seeking the strongest rigid detector. In Proc. of CVPR, pages 3666-3673, Portland, Oregon.
  4. Benenson, R., Omran, M., Hosang, J., and Schiele, B. (2014). Ten years of pedestrian detection, what have we learned? In ECCV, CVRSUAD workshop.
  5. Benezeth, Y., Jodoin, P.-M., Emile, B., Laurent, H., and Rosenberger, C. (2008). Review and evaluation of commonly-implemented background subtraction algorithms. In Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, pages 1-4. IEEE.
  6. Benfold, B. and Reid, I. (2011). Stable multi-target tracking in real-time surveillance video. In CVPR, pages 3457- 3464.
  7. Breitenstein, M. D., Reichlin, F., Leibe, B., Koller-Meier, E., and Van Gool, L. (2011). Online multiperson tracking-by-detection from a single, uncalibrated camera. IEEE PAMI, 33(9):1820-1833.
  8. CAVIAR project (2005). The CAVIAR project: Context aware vision using image-based active recognition.
  9. Cho, H., Rybski, P., Bar-Hillel, A., and Zhang, W. (2012). Real-time pedestrian detection with deformable part models. In IEEE Intelligent Vehicles Symposium, pages 1035-1042.
  10. Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of CVPR, volume 2, pages 886-893.
  11. Dollár, P., Appel, R., Belongie, S., and Perona, P. (2014). Fast feature pyramids for object detection.
  12. Dollár, P., Belongie, S., and Perona, P. (2010). The fastest pedestrian detector in the west. In Proceedings of BMVC, pages 68.1-68.11.
  13. Dollár, P., Tu, Z., Perona, P., and Belongie, S. (2009a). Integral channel features. In Proc. of BMVC, pages 91.1- 91.11.
  14. Dollár, P., Wojek, C., Schiele, B., and Perona, P. (2009b). Pedestrian detection: A benchmark. In Proceedings of CVPR, pages 304-311.
  15. Dollár, P., Wojek, C., Schiele, B., and Perona, P. (2012). Pedestrian detection: An evaluation of the state of the art. In IEEE PAMI, 34:743-761.
  16. Felzenszwalb, P., Girschick, R., and McAllester, D. (2010). Cascade object detection with deformable part models. In Proceedings of CVPR, pages 2241-2248.
  17. Felzenszwalb, P., McAllester, D., and Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In Proceedings of CVPR.
  18. Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014a). Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition.
  19. Girshick, R., Felzenszwalb, P., and McAllester, D. (2012). Discriminatively trained deformable part models, release 5. IEEE Transactions on Pattern Analysis and Machine Intelligence.
  20. Girshick, R. B., Iandola, F. N., Darrell, T., and Malik, J. (2014b). Deformable part models are convolutional neural networks. CoRR, abs/1409.5403.
  21. Girshick, R. B. and Malik, J. (2013). Training deformable part models with decorrelated features. In IEEE International Conference on Computer Vision, ICCV 2013, Sydney, Australia, December 1-8, 2013.
  22. Kalman, R. (1960). A new approach to linear filtering and prediction problems. In Transaction of the ASME Journal of Basic Engineering, volume 82.
  23. Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, pages 1097-1105. Curran Associates, Inc.
  24. Leykin, A. and Hammoud, R. (2010). Pedestrian tracking by fusion of thermal-visible surveillance videos. Machine Vision and Applications, 21(4):587-595.
  25. Orts-Escolano, S., Garcia-Rodriguez, J., Morell, V., Cazorla, M., Azorin, J., and Garcia-Chamizo, J. M. (2014). Parallel computational intelligence-based multi-camera surveillance system. Journal of Sensor and Actuator Networks, 3(2):95-112.
  26. Parks, D. H. and Fels, S. S. (2008). Evaluation of background subtraction algorithms with post-processing. In Advanced Video and Signal Based Surveillance, 2008. AVSS'08. IEEE Fifth International Conference on, pages 192-199. IEEE.
  27. Pedersoli, M., Gonzalez, J., Hu, X., and Roca, X. (2013). Toward real-time pedestrian detection based on a deformable template model. In IEEE ITS.
  28. Rogez, G., Orrite, C., Guerrero, J. J., and Torr, P. H. S. (2014a). Exploiting projective geometry for viewinvariant monocular human motion analysis in manmade environments. Computer Vision and Image Understanding, 120:126-140.
  29. Rogez, G., Rihan, J., Guerrero, J. J., and Orrite, C. (2014b). Monocular 3D gait tracking in surveillance scenes. IEEE Transactions on Systems Man and Cybernetics Part B (Cybernetics), 44(6):894-909.
  30. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. (2014). Imagenet large scale visual recognition challenge.
  31. Singh, V. K., Wu, B., and Nevatia, R. (2008). Pedestrian tracking by associating tracklets using detection residuals. In Motion and video Computing, 2008. WMVC 2008. IEEE Workshop on, pages 1-8. IEEE.
  32. Van Beeck, K., Tuytelaars, T., and Goedemé, T. (2012). A warping window approach to real-time vision-based pedestrian detection in a truck's blind spot zone. In Proceedings of ICINCO.
  33. Zivkovic, Z. and van der Heijden, F. (2006). Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern recognition letters, 27(7):773-780.

Paper Citation

in Harvard Style

Van Beeck K. and Goedemé T. (2015). Real-time Accurate Pedestrian Detection and Tracking in Challenging Surveillance Videos . In Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2015) ISBN 978-989-758-091-8, pages 325-334. DOI: 10.5220/0005308703250334

in Bibtex Style

author={Kristof Van Beeck and Toon Goedemé},
title={Real-time Accurate Pedestrian Detection and Tracking in Challenging Surveillance Videos},
booktitle={Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2015)},

in EndNote Style

JO - Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2015)
TI - Real-time Accurate Pedestrian Detection and Tracking in Challenging Surveillance Videos
SN - 978-989-758-091-8
AU - Van Beeck K.
AU - Goedemé T.
PY - 2015
SP - 325
EP - 334
DO - 10.5220/0005308703250334