Krizhevsky, A. and Hinton, G. (2009). Learning multiple
layers of features from tiny images. Technical Re-
port 0, University of Toronto, Toronto, Ontario.
Lin, T., Goyal, P., Girshick, R. B., He, K., and Doll
´
ar, P.
(2017a). Focal loss for dense object detection. CoRR,
abs/1708.02002.
Lin, T., Maire, M., Belongie, S. J., Bourdev, L. D., Girshick,
R. B., Hays, J., Perona, P., Ramanan, D., Doll
´
ar, P.,
and Zitnick, C. L. (2014). Microsoft COCO: common
objects in context. CoRR, abs/1405.0312.
Lin, T.-Y., Doll
´
ar, P., Girshick, R. B., He, K., Hariharan,
B., and Belongie, S. J. (2017b). Feature pyramid net-
works for object detection. 2017 IEEE Conference
on Computer Vision and Pattern Recognition (CVPR),
pages 936–944.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Dollar, P.
(2017c). Focal loss for dense object detection. In
Proceedings of the IEEE International Conference on
Computer Vision (ICCV).
Liu, G., Han, J., and Rong, W. (2021a). Feedback-driven
loss function for small object detection. Image and
Vision Computing, 111:104197.
Liu, J., Gu, Y., Han, S., Zhang, Z., Guo, J., and Cheng, X.
(2021b). Feature rescaling and fusion for tiny object
detection. IEEE Access, 9:62946–62955.
Liu, Q., Tan, Z., Chen, D., Chu, Q., Dai, X., Chen, Y., Liu,
M., Yuan, L., and Yu, N. (2022). Reduce information
loss in transformers for pluralistic image inpainting.
ArXiv, abs/2205.05076.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S. E.,
Fu, C., and Berg, A. C. (2015). SSD: single shot multi-
box detector. CoRR, abs/1512.02325.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin,
S., and Guo, B. (2021c). Swin transformer: Hierar-
chical vision transformer using shifted windows. In
Proceedings of the IEEE/CVF International Confer-
ence on Computer Vision (ICCV).
London, K. C. (2022). King’s computational research,
engineering and technology environment (CREATE).
https://doi.org/10.18742/rnvf-m076/.
Luo, S., Li, X., Zhu, R., and Zhang, X. (2019). Sfa: Small
faces attention face detector. IEEE Access, 7:171609–
171620.
Pan, S. J. and Yang, Q. (2010). A survey on transfer learn-
ing. IEEE Transactions on Knowledge and Data En-
gineering, 22(10):1345–1359.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.
(2015). You Only Look Once: Unified, Real-
Time Object Detection. arXiv e-prints, page
arXiv:1506.02640.
Redmon, J., Divvala, S. K., Girshick, R. B., and Farhadi, A.
(2015). You only look once: Unified, real-time object
detection. CoRR, abs/1506.02640.
Redmon, J. and Farhadi, A. (2018). Yolov3: An incremental
improvement. CoRR, abs/1804.02767.
Ren, S., He, K., Girshick, R. B., and Sun, J. (2015). Faster
R-CNN: towards real-time object detection with re-
gion proposal networks. CoRR, abs/1506.01497.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L.,
van den Driessche, G., Schrittwieser, J., Antonoglou,
I., Panneershelvam, V., Lanctot, M., Dieleman, S.,
Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I.,
Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel,
T., and Hassabis, D. (2016). Mastering the game of
Go with deep neural networks and tree search. Na-
ture, 529(7587):484–489.
Singh, B., Najibi, M., and Davis, L. S. (2018). SNIPER:
Efficient multi-scale training. NeurIPS.
Springenberg, J. T., Dosovitskiy, A., Brox, T., and Ried-
miller, M. A. (2015). Striving for simplicity: The all
convolutional net. CoRR, abs/1412.6806.
Tan, M., Pang, R., and Le, Q. V. (2020). Efficientdet: Scal-
able and efficient object detection. 2020 IEEE/CVF
Conference on Computer Vision and Pattern Recogni-
tion (CVPR), pages 10778–10787.
Tang, X., Du, D. K., He, Z., and Liu, J. (2018). Pyra-
midbox: A context-assisted single shot face detector.
In Proceedings of the European Conference on Com-
puter Vision (ECCV).
Tian, Z., Shen, C., Chen, H., and He, T. (2019). Fcos:
Fully convolutional one-stage object detection. 2019
IEEE/CVF International Conference on Computer Vi-
sion (ICCV), pages 9626–9635.
Tong, K. and Wu, Y. (2022). Deep learning-based detec-
tion from the perspective of small or tiny objects: A
survey. Image and Vision Computing, 123:104471.
ucas-vg (2020). TOV mmdetection.
Wang, C.-Y., Bochkovskiy, A., and Liao, H.-Y. M. (2021a).
Scaled-YOLOv4: Scaling cross stage partial network.
In Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition (CVPR), pages
13029–13038.
Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao,
Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W., and
Xiao, B. (2021b). Deep high-resolution representation
learning for visual recognition. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 43:3349–
3364.
Xu, C., Wang, J., Yang, W., Yu, H., Yu, L., and Xia, G.-S.
(2022). Rfla: Gaussian receptive based label assign-
ment for tiny object detection. In European Confer-
ence on Computer Vision (ECCV).
Yang, S., Luo, P., Loy, C. C., and Tang, X. (2016). Wider
face: A face detection benchmark. In IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR).
Yang, Z., Chai, X., Wang, R., Guo, W., Wang, W., Pu, L.,
and Chen, X. (2019). Prior knowledge guided small
object detection on high-resolution images. In 2019
IEEE International Conference on Image Processing
(ICIP), pages 86–90.
Yu, X., Gong, Y., Jiang, N., Ye, Q., and Han, Z. (2020).
Scale match for tiny person detection. 2020 IEEE
Winter Conference on Applications of Computer Vi-
sion (WACV), pages 1246–1254.
Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J., and Yoo,
Y. (2019). Cutmix: Regularization strategy to train
strong classifiers with localizable features. In Pro-
ceedings of the IEEE/CVF International Conference
on Computer Vision (ICCV).
Rethinking the Backbone Architecture for Tiny Object Detection
113