A Novel Framework for Computing Unique People Count from Monocular Videos

Satarupa Mukherjee, Nilanjan Ray


Counting the unique number of people in a video (i.e., counting a person only once while the person is within the field of view), is required in many significant video analytic applications, such as transit passenger and pedestrian volume count in railway stations, malls and road intersections. The principal roadblock in this application is the occlusion. In my PhD thesis, we engineer a novel and straightforward solution to the problem by combining machine learning techniques with simple pixel motion tracking. We estimate the influx and/or the outflux rate of unique people in a region of interest within a monocular video. The unique count is then obtained by summing the influx and/or the outflux rates. Our proposed framework avoids people detection and people tracking that are plagued by occlusions. Also, it is online in nature without error accumulation so that unique people count can be obtained between any two time points in a streaming video. We validate the framework on 19 publicly available monocular videos. Occlusions are abundant in these videos, yet we obtain more than 95% accuracy for most of these videos. We also extend our proposed framework beyond monocular videos and apply it on multiple views of a publicly available dataset with about 99% accuracy.


  1. Barnich, O. and Droogenbroeck, M. V. (2011). Vibe: A universal background subtraction algorithmfor video sequences. Transactions in Image Processing, 20:1709- 1724.
  2. Black, J. and Ellis, T. (2006). Multi camera image tracking. Image and Vision Computing, 24:1256-1267.
  3. Box, P. C. and Oppenlander, J. C. (2010). Manual of traffic engineering studies. Technical report, Institute of Transportation Engineers.
  4. Brostow, G. J. and Cipolla, R. (2006). Unsupervised bayesian detection of independent motion in crowds. In CVPR, pages 594-601.
  5. Chan, A. B., Liang, Z. S. J., and Vasconcelos, N. (2008). Privacy preserving crowd montioring: counting people without people models or tracking. In CVPR, pages 1-7.
  6. Chan, A. B. and Vasconcelos, N. (2009). Bayesian poisson regression for crowd counting. In ICCV, pages 1-7.
  7. Chan, A. B. and Vasconcelos, N. (2012). Counting people with low-level features and bayesian regression. In TIP, volume 21, pages 2160-2177.
  8. Chateau, T., GayBelille, V., Chausse, F., and Lapreste, J. (2006). Real-time tracking with classifiers. In ECCV, pages 218-231.
  9. Cong, Y., Gong, H., Zhu, S., and Tang, Y. (2009). Flow mosaicking: Real-time pedestrian counting without scenespecific learning. In CVPR, volume 1093-1100.
  10. Conte, D., Foggia, P., Percannella, G., Tufano, F., and Vento, M. (2010). Counting moving people in videos by salient points detection. In ICPR, pages 1743-1746.
  11. Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR, pages 886-893.
  12. Harasse, S., Bonnaud, L., and Desvignes, M. (2005). People counting in transport vehicles. Transactions on Engineering, Computing and Technology, 4:221-224.
  13. Horn, B. and Schunck, B. (1981). Determining optical flow. Artificial Intelligence, 17:185-203.
  14. Hou, C. P., Zhang, C. S., Wu, Y., and Nie, F. P. (2010). Multiple view semi-supervised dimensionality reduction. Pattern Recognition, 43:720-730.
  15. Kim, J., Choi, K., Choi, B., and Ko, S. (2002). Realtime vision-based people counting system for the security door. ITC-CSCC, pages 1418-1421.
  16. Kim, Y. S., Lee, G. G., Yoon, J. Y., Kim, J. J., and Kim, W. Y. (2008). A method of counting pedestrians in crowded scenes. In International Conf. on Intelligent Computing, pages 1117-1126.
  17. Krahnstoever, N., Yu, T., Patwardhan, K. A., and Gao, D. (2008). Multi-camera person tracking in crowded environments. In PETS workshop, pages 1-7.
  18. Lempitsky, V. and Zisserman, A. (2010). Learning to count objects in images. NIPS.
  19. Ma, Z. and A.B.Chan (2013). Crossing the line: Crowd counting by integer programming with local features. In CVPR, pages 2539-2576.
  20. McFarlane, N. J. B. and Schofield, C. P. (1995). Segmentation and tracking of piglets in images. Machine Vision and Applications, 8:187-193.
  21. Ojala, T., Pietikainen, M., and Maenpaa, T. (2002). Multiresolution gray scale and rotation invariant texture analysis with local binary patterns. In PAMI, volume 24, pages 971-987.
  22. Rabaud, V. and Belongie, S. J. (2006). Counting crowded moving objects. In CVPR, pages 705-711.
  23. Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA.
  24. Smola, A. and Scholkopf, B. (1998). A tutorial on support vector regression. NeuroCOLT Technical Report NC-TR-98-030, Royal Holloway College, University of London, UK.
  25. Stauffer, C. and Grimson, W. (1999). Adaptive background mixture models for real-time tracking. In CVPR, pages 246-252.
  26. Tan, B., Zhang, J., and Wang, L. (2011). Semi-supervised elastic net for pedestrian counting. Pattern Recognition, 44:2297-2304.
  27. Zeng, C. and Ma, H. (2010). Robust head-shoulder detection by pca-based multilevel hog-lbp detector for people counting. In ICPR, pages 2069-2072.
  28. Zhu, X. J. (2005). Semi-supervised learning literature survey. Technical Report Technical Report 1530, University of Wisconsin-Madison.

Paper Citation

in Harvard Style

Mukherjee S. and Ray N. (2014). A Novel Framework for Computing Unique People Count from Monocular Videos . In Doctoral Consortium - DCVISIGRAPP, (VISIGRAPP 2014) ISBN Not Available, pages 3-14

in Bibtex Style

author={Satarupa Mukherjee and Nilanjan Ray},
title={A Novel Framework for Computing Unique People Count from Monocular Videos},
booktitle={Doctoral Consortium - DCVISIGRAPP, (VISIGRAPP 2014)},
isbn={Not Available},

in EndNote Style

JO - Doctoral Consortium - DCVISIGRAPP, (VISIGRAPP 2014)
TI - A Novel Framework for Computing Unique People Count from Monocular Videos
SN - Not Available
AU - Mukherjee S.
AU - Ray N.
PY - 2014
SP - 3
EP - 14
DO -