evaluation on part A are shown in Table 5. Our MUD-
i
k
NN network slightly outperforms the state-of-the-art
approaches on this part. The results of our evaluation
on part B are shown in Table 6. Here our network per-
forms on par or slightly worse than the best-performing
methods. Notably, our method appears perform better
on denser crowd images, and ShanghaiTech Part B is
by far the least dense dataset we tested.
The third dataset we evaluated our approach on
is the UCF-CC-50 dataset (Idrees et al., 2013). We
followed the standard evaluation metric for this dataset
of a five-fold cross-evaluation. The results of our eval-
uation on this dataset can be seen in Table 7.
Overall, our network performed favorably com-
pared with existing approaches. An advantage to our
approach is that the our modifications can be applied
to the architectures we’re comparing against. The most
relevant comparison is between the i
k
NN version of
the MUD network, and the density map version. Here,
the i
k
NN approach always outperformed the density
version. We speculate that the state-of-the-art meth-
ods we have compared with, along with other general-
purpose CNNs, could be improved through the use of
ikNN labels and upsampling map modules.
6 CONCLUSIONS
We have presented a new form of labeling for crowd
counting data, the i
k
NN map. We have compared
this labeling scheme to commonly accepted labeling
approach for crowd counting, the density map. We
show that using the i
k
NN map with an existing state-
of-the-art network improves the accuracy of the net-
work compared to density map labelings. We have
demonstrated the improvements gained by using in-
creased label resolutions, and provide an upsampling
map module which can be generally used by other
crowd counting architectures. These approaches can
be used a drop-in replacement in other crowd counting
architectures, as we have done for DenseNet, which
resulted in a network which performs favorably com-
pared with the state-of-the-art.
ACKNOWLEDGMENTS
The research is supported by National Science Foun-
dation through Awards PFI #1827505 and SCC-
Planning #1737533, and Bentley Systems, Incorpo-
rated, through a CUNY-Bentley Collaborative Re-
search Agreement (CRA). Additional support is pro-
vided by the Intelligence Community Center of Aca-
demic Excellence (IC CAE) at Rutgers University.
REFERENCES
Arteta, C., Lempitsky, V., and Zisserman, A. (2016). Count-
ing in the wild. In European conference on computer
vision, pages 483–498. Springer.
Badrinarayanan, V., Kendall, A., and Cipolla, R. (2017). Seg-
net: A deep convolutional encoder-decoder architec-
ture for image segmentation. IEEE transactions on pat-
tern analysis and machine intelligence, 39(12):2481–
2495.
Cao, X., Wang, Z., Zhao, Y., and Su, F. (2018). Scale
aggregation network for accurate and efficient crowd
counting. In Proceedings of the European Conference
on Computer Vision (ECCV), pages 734–750.
Chan, A. B., Liang, Z.-S. J., and Vasconcelos, N. (2008). Pri-
vacy preserving crowd monitoring: Counting people
without people models or tracking. In Computer Vi-
sion and Pattern Recognition, 2008. CVPR 2008. IEEE
Conference on, pages 1–7. IEEE.
Chen, K., Gong, S., Xiang, T., and Change Loy, C. (2013).
Cumulative attribute space for age and crowd density
estimation. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 2467–
2474.
Chen, K., Loy, C. C., Gong, S., and Xiang, T. (2012). Fea-
ture mining for localised crowd counting. In BMVC,
volume 1, page 3.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger,
K. Q. (2017). Densely connected convolutional net-
works. In CVPR, volume 1, page 3.
Idrees, H., Saleemi, I., Seibert, C., and Shah, M. (2013).
Multi-source multi-scale counting in extremely dense
crowd images. In Proceedings of the IEEE confer-
ence on computer vision and pattern recognition, pages
2547–2554.
Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed,
S., Rajpoot, N., and Shah, M. (2018). Composition loss
for counting, density map estimation and localization
in dense crowds. arXiv preprint arXiv:1808.01050.
Laradji, I. H., Rostamzadeh, N., Pinheiro, P. O., Vazquez,
D., and Schmidt, M. (2018). Where are the blobs:
Counting by localization with point supervision. In
Proceedings of the European Conference on Computer
Vision (ECCV), pages 547–562.
Lempitsky, V. and Zisserman, A. (2010). Learning to count
objects in images. In Advances in neural information
processing systems, pages 1324–1332.
Li, Y., Zhang, X., and Chen, D. (2018). Csrnet: Dilated
convolutional neural networks for understanding the
highly congested scenes. In Proceedings of the IEEE
conference on computer vision and pattern recognition,
pages 1091–1100.
Lin, Z. and Davis, L. S. (2010). Shape-based human detec-
tion and segmentation via hierarchical part-template
matching. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 32(4):604–618.
VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications
194