this time without using manual annotations neither in
the source domain nor in the target one.
6 CONCLUSIONS
In this article, we tackle the problem of determin-
ing the density and the number of objects present in
large sets of images. Building on a CNN-based den-
sity estimator, the proposed methodology can gener-
alize to new data sources for which there are no an-
notations available. We achieve this generalization by
exploiting an Unsupervised Domain Adaptation strat-
egy, whereby a discriminator attached to the output
forces similar density distribution in the target and
source domains. Experiments show a significant im-
provement relative to the performance of the model
without domain adaptation. To the best of our knowl-
edge, we are the first to introduce a UDA scheme for
counting to reduce the gap between the source and the
target domain without using additional labels. Given
the conventional structure of the estimator, the im-
provement obtained by just monitoring the output en-
tails a great capacity to generalize learned knowledge,
thus suggesting the application of similar principles to
the inner layers of the network.
Another contribution is represented by the cre-
ation of two new per-pixel annotated datasets made
available to the scientific community. One of the two
novel datasets is a synthetic dataset created from a
photo-realistic video game. Here the labels are auto-
matically assigned while interacting with the API of
the graphical engine. Using this synthetic dataset, we
demonstrated that it is possible to train a model with a
precisely annotated and automatically generated syn-
thetic dataset and perform UDA toward a real-world
scenario, obtaining very good performance without
using additional manual annotations.
In our view, this work’s outcome opens new per-
spectives to deal with the scalability of learning meth-
ods for large physical systems with scarce supervisory
resources.
ACKNOWLEDGEMENTS
This work was partially supported by H2020 project
AI4EU under GA 825619 and by H2020 project
AI4media under GA 951911.
REFERENCES
Aich, S. and Stavness, I. (2018). Improving object
counting with heatmap regulation. arXiv preprint
arXiv:1803.05494.
Amato, G., Bolettieri, P., Moroni, D., Carrara, F., Ciampi,
L., Pieri, G., Gennaro, C., Leone, G. R., and Vairo, C.
(2018). A wireless smart camera network for parking
monitoring. In 2018 IEEE Globecom Workshops (GC
Wkshps), pages 1–6.
Amato, G., Ciampi, L., Falchi, F., and Gennaro, C. (2019).
Counting vehicles with deep learning in onboard uav
imagery. In 2019 IEEE Symposium on Computers and
Communications (ISCC), pages 1–6.
Amato, G., Ciampi, L., Falchi, F., Gennaro, C., and
Messina, N. (2019). Learning pedestrian detection
from virtual worlds. In Ricci, E., Rota Bul
`
o, S.,
Snoek, C., Lanz, O., Messelodi, S., and Sebe, N., ed-
itors, Image Analysis and Processing – ICIAP 2019,
pages 302–312, Cham. Springer International Pub-
lishing.
Boominathan, L., Kruthiventi, S. S. S., and Babu, R. V.
(2016). Crowdnet: A deep convolutional network for
dense crowd counting. In Proceedings of the 24th
ACM International Conference on Multimedia, MM
’16, page 640–644, New York, NY, USA. Association
for Computing Machinery.
Chen, Y., Li, W., Chen, X., and Gool, L. V. (2019). Learn-
ing semantic segmentation from synthetic data: A ge-
ometrically guided input-output adaptation approach.
In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pages 1841–1850.
Ciampi, L., Amato, G., Falchi, F., Gennaro, C., and Rabitti,
F. (2018). Counting vehicles with cameras. In SEBD.
Ciampi, L., Messina, N., Falchi, F., Gennaro, C., and Am-
ato, G. (2020a). Virtual to real adaptation of pedes-
trian detectors. Sensors, 20(18):5250.
Ciampi, L., Santiago, C., Costeira, J. P., Gennaro, C., and
Amato, G. (2020b). Unsupervised vehicle counting
via multiple camera domain adaptation. In Saffiotti,
A., Serafini, L., and Lukowicz, P., editors, Proceed-
ings of the First International Workshop on New Foun-
dations for Human-Centered AI (NeHuAI) co-located
with 24th European Conference on Artificial Intelli-
gence (ECAI 2020), Santiago de Compostella, Spain,
September 4, 2020, volume 2659 of CEUR Workshop
Proceedings, pages 82–85. CEUR-WS.org.
Ganin, Y. and Lempitsky, V. (2015). Unsupervised domain
adaptation by backpropagation. volume 37 of Pro-
ceedings of Machine Learning Research, pages 1180–
1189, Lille, France. PMLR.
Guerrero-G
´
omez-Olmedo, R., Torre-Jim
´
enez, B., L
´
opez-
Sastre, R., Maldonado-Basc
´
on, S., and O
˜
noro-Rubio,
D. (2015). Extremely overlapping vehicle counting.
In Paredes, R., Cardoso, J. S., and Pardo, X. M., ed-
itors, Pattern Recognition and Image Analysis, pages
423–431, Cham. Springer International Publishing.
He, K., Gkioxari, G., Doll
´
ar, P., and Girshick, R. (2017).
Mask r-cnn. In 2017 IEEE International Conference
on Computer Vision (ICCV), pages 2980–2988.
VISAPP 2021 - 16th International Conference on Computer Vision Theory and Applications
194