
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. (2009). Imagenet: A large-scale hierarchical
image database. In 2009 IEEE conference on com-
puter vision and pattern recognition, pages 248–255.
Ieee.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn,
D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer,
M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby,
N. (2021). An image is worth 16x16 words: Trans-
formers for image recognition at scale. In 9th Interna-
tional Conference on Learning Representations, ICLR
2021, Virtual Event, Austria, May 3-7, 2021. OpenRe-
view.net.
Khan, S. H., Naseer, M., Hayat, M., Zamir, S. W., Khan,
F. S., and Shah, M. (2022). Transformers in vision: A
survey. ACM Comput. Surv., 54(10s):200:1–200:41.
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C.,
Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C.,
Lo, W., Doll
´
ar, P., and Girshick, R. B. (2023). Seg-
ment anything. CoRR, abs/2304.02643.
Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick,
R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L.,
and Doll
´
ar, P. (2015). Microsoft coco: Common ob-
jects in context.
Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning,
J., Cao, Y., Zhang, Z., Dong, L., Wei, F., and Guo,
B. (2022a). Swin transformer V2: scaling up capacity
and resolution. In IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition, CVPR 2022,
New Orleans, LA, USA, June 18-24, 2022, pages
11999–12009. IEEE.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin,
S., and Guo, B. (2021). Swin transformer: Hierarchi-
cal vision transformer using shifted windows. In 2021
IEEE/CVF International Conference on Computer Vi-
sion, ICCV 2021, Montreal, QC, Canada, October 10-
17, 2021, pages 9992–10002. IEEE.
Liu, Z., Mao, H., Wu, C., Feichtenhofer, C., Darrell, T.,
and Xie, S. (2022b). A convnet for the 2020s. In
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition, CVPR 2022, New Orleans, LA,
USA, June 18-24, 2022, pages 11966–11976. IEEE.
Loddo, A., Ruberto, C. D., and Kocher, M. (2018). Recent
advances of malaria parasites detection systems based
on mathematical morphology. Sensors, 18(2):513.
Loh, D. R., Yong, W. X., Yapeter, J., Subburaj, K., and
Chandramohanadas, R. (2021). A deep learning ap-
proach to the screening of malaria infection: Auto-
mated and rapid cell counting, object detection and
instance segmentation using mask R-CNN. Comput.
Medical Imaging Graph., 88:101845.
Mukherjee, S., Chatterjee, S., Bandyopadhyay, O., and
Biswas, A. (2021). Detection of malaria parasites
in thin blood smears using cnn-based approach. In
Mandal, J. K., Mukherjee, I., Bakshi, S., Chatterji, S.,
and Sa, P. K., editors, Computational Intelligence and
Machine Learning, pages 19–27, Singapore. Springer
Singapore.
Narejo, S., Pandey, B., Esenarro Vargas, D., Rodriguez, C.,
and Anjum, M. (2021). Weapon detection using yolo
v3 for smart surveillance system. Mathematical Prob-
lems in Engineering, 2021:1–9.
Sengar, N., Burget, R., and Dutta, M. (2022). A vision
transformer based approach for analysis of plasmod-
ium vivax life cycle for malaria prediction using thin
blood smear microscopic images. Computer Methods
and Programs in Biomedicine, 224:106996.
Sultani, W., Nawaz, W., Javed, S., Danish, M. S., Saadia,
A., and Ali, M. (2022). Towards low-cost and ef-
ficient malaria detection. In IEEE/CVF Conference
on Computer Vision and Pattern Recognition, CVPR
2022, New Orleans, LA, USA, June 18-24, 2022,
pages 20655–20664. IEEE.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L., and Polosukhin, I.
(2017). Attention is all you need. In Guyon, I., von
Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R.,
Vishwanathan, S. V. N., and Garnett, R., editors, Ad-
vances in Neural Information Processing Systems 30:
Annual Conference on Neural Information Processing
Systems 2017, December 4-9, 2017, Long Beach, CA,
USA, pages 5998–6008.
Wang, C., Bochkovskiy, A., and Liao, H. M. (2022).
Yolov7: Trainable bag-of-freebies sets new state-
of-the-art for real-time object detectors. CoRR,
abs/2207.02696.
WHO, W. H. O. (2022). World Malaria Report 2022.
Woo, S., Park, J., Lee, J., and Kweon, I. S. (2018). CBAM:
convolutional block attention module. In Ferrari, V.,
Hebert, M., Sminchisescu, C., and Weiss, Y., editors,
Computer Vision - ECCV 2018 - 15th European Con-
ference, Munich, Germany, September 8-14, 2018,
Proceedings, Part VII, volume 11211 of Lecture Notes
in Computer Science, pages 3–19. Springer.
Zedda, L., Loddo, A., and Di Ruberto, C. (2022). A deep
learning based framework for malaria diagnosis on
high variation data set. In Image Analysis and Pro-
cessing - ICIAP 2022 - 21st International Conference,
Lecce, Italy, May 23-27, 2022, Proceedings, Part II,
volume 13232 of Lecture Notes in Computer Science,
pages 358–370. Springer.
Zhao, X., Ding, W., An, Y., Du, Y., Yu, T., Li, M., Tang, M.,
and Wang, J. (2023). Fast Segment Anything.
Zou, Z., Chen, K., Shi, Z., Guo, Y., and Ye, J. (2023). Ob-
ject detection in 20 years: A survey. Proc. IEEE,
111(3):257–276.
VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications
374