Counting People in Crowds Using Multiple Column Neural Networks
Christian Massao Konishi and Helio Pedrini
Institute of Computing, University of Campinas, Campinas, Brazil
Keywords:
Crowd Counting, Generative Adversarial Networks, Deep Learning, Activation Maps.
Abstract:
Crowd counting through images is a research field of great interest for its various applications, such as surveil-
lance camera images monitoring, urban planning. In this work, a model (MCNN-U) based on Generative
Adversarial Networks (GANs) with Wasserstein cost and Multiple Column Neural Networks (MCNNs) is
proposed to obtain better estimates of the number of people. The model was evaluated using two crowd count-
ing databases, UCF-CC-50 and ShanghaiTech. In the first database, the reduction in the mean absolute error
was greater than 30%, whereas the gains in efficiency were smaller in the second database. An adaptation of
the LayerCAM method was also proposed for the crowd counter network visualization.
1 INTRODUCTION
Obtaining an adequate estimate of the number of peo-
ple present in an image has several practical appli-
cations. Counting a few tens of individuals is sim-
ple enough to be done manually, however, in large
crowds, such as public manifestations, musical events
and sporting events, using a crowd counting model
may not rarely be the only viable option, allowing
for better urban planning, event planning, and surveil-
lance of crowds.
An intuitive way to model an object counter is to
train a detector and, using it, determine the amount
present in the image (Li et al., 2008). However, these
models cannot adequately handle large densities of
people (Gao et al., 2020), because they rely on rec-
ognizing some body part, such as head or shoulders,
that may be partially occluded in a crowd. Other mod-
els (Zhang et al., 2016; Lempitsky and Zisserman,
2010) do not seek to detect and localize the position of
each person, they aim to calculate the quantity of ob-
jects in an image by estimating the density in a given
region of the image.
One difficulty in these models is dealing with vari-
ations in image conditions, such as lighting, density,
and size of the people. The use of a convolutional
neural network with filters of different scales, such
as a Multi Column Neural Network (MCNN) (Zhang
et al., 2016) is an alternative for these scenarios,
since it can handle variations in the size of people
in a single image and variations caused by differ-
ent image dimensions. On the other hand, a limita-
tion of the MCNN is that its output is a density map
of smaller height and width than the original image,
which causes information loss, inherent to the model
itself.
In this work, modifications to the MCNN were
proposed, both in terms of architecture and training,
aiming to obtain density maps that are more faithful
to the reference maps (ground truth). For this, in ad-
dition to the neural network that estimates the den-
sity of people in the image, a second network was
added, whose role is to evaluate the output of the first
one when compared to real densities. This approach
is an application of Generative Adversarial Networks
(GANs), more precisely, the Wasserstein-GAN (Ar-
jovsky et al., 2017), in the context of counting people
in crowds by density maps. The proposed model for
the estimator is based on an MCNN, but introduces a
series of modifications to improve the quality of the
output (Section 4), recovering the original image di-
mension and adding more possible connections be-
tween the various levels of the network.
2 RELATED CONCEPTS
The crowd counting problem consists in estimating
the number of people present in an image or a video.
Although other approaches exist (object detection, re-
gression), the most modern models have been based
on Fully Convolutional Networks (FCN) (Gao et al.,
2020), a class of Convolutional Neural Networks
(CNN) that does not feature densely connected layers.
Konishi, C. and Pedrini, H.
Counting People in Crowds Using Multiple Column Neural Networks.
DOI: 10.5220/0011704000003417
In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 4: VISAPP, pages
363-370
ISBN: 978-989-758-634-7; ISSN: 2184-4321
Copyright
c
2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)
363