6 CONCLUSIONS
The counting accuracy of a standard Yolo2 detection
pipeline depends on a pre-fixed NMS threshold and
results from a precision and recall trade-off. Higher
NMS thresholds increase the number of true positive
detections, resulting in high recall rates. However, the
number of unfiltered redundant detections will
increase, resulting in lower precision and accuracy.
In this paper, we have explored a new detection
pipeline to mitigate this limitation. A PSE algorithm
can be added to the final stage of a current detection
pipeline to filter further redundant detections. The
three-step detection pipeline is flexible and adaptable
to different scenarios. A higher NMS filtering
threshold may be set to keep all true detections,
resulting in a higher recall rate. In addition, the PSE
algorithm removes redundant detentions, eventually
resulting in higher precision and accuracy rates.
The three-stage detection pipeline reduces
substantially the accuracy variance, allowing it to
perform better in multiple different scenarios. In
addition, the low accuracy variance achieved makes
it easier to pre-define the NMS threshold as it has a
limited impact on the pipeline’s performance.
Finally, the PSE algorithm can be properly trained
and added to any detection pipeline to remove
redundant detections other than the pedestrian
detection application described in this work.
ACKNOWLEDGMENT
This work was supported by the Macao Science and
Technology Development Fund (Fundo para o
Desenvolvimento das Ciências e da Tecnologia) of
Macao SAR under grant number 138/2016/A3.
REFERENCES
Benfold, B. and Reid, I. (2011) ‘Stable Multi-Target
Tracking in Real-Time Surveillance Video’, IEEE
Conference on Computer Vision and Pattern Recognition.
Byeon, Y.-H. and Kwak, K.-C. (2017) ‘A Performance
Comparison of Pedestrian Detection Using Faster
RCNN and ACF’, in 2017 6th IIAI International
Congress on Advanced Applied Informatics (IIAI-AAI).
IEEE, pp. 858–863.
Dalal, N. and Triggs, B. (2005) ‘Histograms of Oriented
Gradients for Human Detection’, in Computer Vision
and Pattern Recognition, 2005. CVPR 2005. IEEE
Computer Society Conference on, pp. 886–893. doi:
10.1109/CVPR.2005.177.
Devernay, F. (1995) ‘A non-maxima suppression method
for edge detection with sub-pixel accuracy’. INRIA.
Dollar, P. et al. (2014) ‘Fast feature pyramids for object
detection’, IEEE Transactions on Pattern Analysis and
Machine Intelligence, 36(8), pp. 1532–1545. doi:
10.1109/TPAMI.2014.2300479.
Everingham, M. et al. (2010) ‘The pascal visual object
classes (VOC) challenge’, International Journal of
Computer Vision, 88(2), pp. 303–338. doi:
10.1007/s11263-009-0275-4.
Ferryman, J. and Shahrokni, A. (2009) ‘PETS2009: Dataset
and challenge’, Pets, pp. 1–6. doi: 10.1109/PETS-
WINTER.2009.5399556.
Fleuret, F. et al. (2008) ‘Multicamera people tracking with
a probabilistic occupancy map’, IEEE Transactions on
Pattern Analysis and Machine Intelligence, 30(2), pp.
267–282. doi: 10.1109/TPAMI.2007.1174.
He, K. et al. (2016) ‘Deep Residual Learning for Image
Recognition’, in 2016 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pp. 770–778.
doi: 10.1109/CVPR.2016.90.
Hosang, J., Benenson, R. and Schiele, B. (2016) ‘A convnet
for non-maximum suppression’, Lecture Notes in
Computer Science (including subseries Lecture Notes
in Artificial Intelligence and Lecture Notes in
Bioinformatics), 9796 LNCS, pp. 192–204. doi:
10.1007/978-3-319-45886-1_16.
Hosang, J., Benenson, R. and Schiele, B. (2017) ‘Learning
non-maximum suppression’, Proceedings - 30th IEEE
Conference on Computer Vision and Pattern
Recognition, CVPR 2017, 2017–Janua, pp. 6469–6477.
doi: 10.1109/CVPR.2017.685.
Liu, W. et al. (2016) ‘SSD: Single shot multibox detector’,
Lecture Notes in Computer Science (including
subseries Lecture Notes in Artificial Intelligence and
Lecture Notes in Bioinformatics), 9905 LNCS(1), pp.
21–37. doi: 10.1007/978-3-319-46448-0_2.
Raghavachari, C. et al. (2015) ‘A Comparative Study of
Vision Based Human Detection Techniques in People
Counting Applications’, Procedia Computer Science.
Elsevier Masson SAS, 58, pp. 461–469. doi:
10.1016/j.procs.2015.08.064.
Redmon, J. et al. (2016) ‘You Only Look Once: Unified,
Real-Time Object Detection’, Proceedings of the IEEE
conference on computer vision and pattern recognition,
pp. 779–788. doi: 10.1016/j.nima.2015.05.028.
Redmon, J. and Farhadi, A. (2016) ‘YOLO9000: Better,
Faster, Stronger’. doi: 10.1109/CVPR.2017.690.
Redmon, J. and Farhadi, A. (2018) ‘YOLOv3: An
Incremental Improvement’. doi:
10.1109/CVPR.2017.690.
Szegedy, C. et al. (2015) ‘Going deeper with convolutions’,
Proceedings of the IEEE Computer Society Conference
on Computer Vision and Pattern Recognition, 07–12–
June, pp. 1–9. doi: 10.1109/CVPR.2015.7298594.
Szegedy, C. et al. (2016) ‘Rethinking the inception
architecture for computer vision’, in Proceedings of the
IEEE Conference on Computer Vision and Pattern
Recognition, pp. 2818–2826.
Pedestrian Similarity Extraction to Improve People Counting Accuracy
555