Table 3: Accuracy comparison of crowd counting models over several datasets (trained using RGB-only versus TIR-only
versus RGB+TIR images). All TIR images are generated using Pix2Pix GAN. GAME(0) is equivalent to MAE.
DroneRGBT ShanghaiTech Pat-B CARPK
Input Model GAME0 GAME1 GAME2 GAME0 GAME1 GAME2 GAME0 GAME1 GAME2
MCNN
RGB 17.9 24.2 42.0 26.4 34.4 55.2 10.1 21.2 43.4
TIR 22.5 27.5 51.3 35.2 38.2 66.7 16.8 28.0 49.0
RGB+TIR 16.2 21.0 35.1 23.2 31.5 48.5 8.9 19.6 36.1
DroneNet
RGB 11.3 22.1 32.7 22.4 30.2 41.9 9.0 20.5 40.1
TIR 18.6 25.2 40.3 29.2 33.4 52.8 15.3 27.6 46.2
RGB+TIR 10.1 18.8 28.4 20.0 29.7 38.3 8.1 18.5 26.8
MMCount
RGB 10.8 21.1 32.4 21.6 28.2 40.1 8.2 18.3 37.5
TIR 16.0 23.3 40.6 27.7 33.6 49.9 15.0 25.6 42.8
RGB+TIR 9.2 18.0 26.0 18.2 28.0 36.4 7.8 15.2 25.0
REFERENCES
Aich, S. and Stavness, I. (2018). Global sum pool-
ing: A generalization trick for object counting with
small datasets of large images. arXiv preprint
arXiv:1805.11123.
Aljohani, A. A. and Alharbe, N. R. (2022). Generating syn-
thetic images for healthcare with novel deep pix2pix
gan. Electronics.
Boominathan, L., Kruthiventi, S. S. S., and Babu, R. V.
(2016). Crowdnet: A deep convolutional network for
dense crowd counting. Proceedings of the 24th ACM
international conference on Multimedia.
Cao, X., Wang, Z., Zhao, Y., and Su, F. (2018). Scale aggre-
gation network for accurate and efficient crowd count-
ing. In ECCV.
Chen, Z., Cheng, J., Yuan, Y., Liao, D., Li, Y., and Lv, J.
(2020). Deep density-aware count regressor. In ECAI.
de Lima, D. C., Saqui, D., Mpinda, S. A. T., and Saito, J. H.
(2022). Pix2pix network to estimate agricultural near
infrared images from rgb data. Canadian Journal of
Remote Sensing, 48:299 – 315.
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Courville, A. C., and Ben-
gio, Y. (2014). Generative adversarial nets. In NIPS.
Guerrero-G
´
omez-Olmedo, R., Torre-Jim
´
enez, B., L
´
opez-
Sastre, R. J., Maldonado-Basc
´
on, S., and O
˜
noro-
Rubio, D. (2015). Extremely overlapping vehicle
counting. In IbPRIA.
Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed,
S. A., Rajpoot, N. M., and Shah, M. (2018). Compo-
sition loss for counting, density map estimation and
localization in dense crowds. ArXiv, abs/1808.01050.
Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2016).
Image-to-image translation with conditional adversar-
ial networks. 2017 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pages 5967–
5976.
Jiang, X., Xiao, Z., Zhang, B., Zhen, X., Cao, X., Doer-
mann, D. S., and Shao, L. (2019). Crowd counting
and density estimation by trellis encoder-decoder net-
works. 2019 IEEE/CVF Conference on Computer Vi-
sion and Pattern Recognition (CVPR), pages 6126–
6135.
Karras, T., Laine, S., and Aila, T. (2021). A style-based
generator architecture for generative adversarial net-
works. IEEE Transactions on Pattern Analysis & Ma-
chine Intelligence, 43(12):4217–4228.
Khan, M. A., Menouar, H., and Hamila, R. (2022). Re-
visiting crowd counting: State-of-the-art, trends, and
future perspectives. ArXiv, abs/2209.07271.
Khan, M. A., Menouar, H., and Hamila, R. (2023a). Crowd
counting in harsh weather using image denoising with
pix2pix gans. In 2023 38th International Confer-
ence on Image and Vision Computing New Zealand
(IVCNZ), pages 1–6. IEEE.
Khan, M. A., Menouar, H., and Hamila, R. (2023b). Lcd-
net: A lightweight crowd density estimation model for
real-time video surveillance. J. Real-Time Image Pro-
cess., 20(2).
Khan, M. A., Menouar, H., and Hamila, R. (2023c). Visual
crowd analysis: Open research problems. AI Maga-
zine, 44(3):296–311.
Li, Y., Zhang, X., and Chen, D. (2018). Csrnet: Dilated
convolutional neural networks for understanding the
highly congested scenes. 2018 IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages
1091–1100.
Liu, W., Salzmann, M., and Fua, P. V. (2019). Context-
aware crowd counting. 2019 IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR),
pages 5094–5103.
Liu, Z., Wu, W., Tan, Y., and Zhang, G. (2023). Rgb-t multi-
modal crowd counting based on transformer.
Mirza, M. and Osindero, S. (2014). Conditional generative
adversarial nets. ArXiv, abs/1411.1784.
VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications
812