
part model. In 2008 IEEE conference on computer
vision and pattern recognition, pages 1–8. Ieee.
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE
international conference on computer vision, pages
1440–1448.
Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014).
Rich feature hierarchies for accurate object detec-
tion and semantic segmentation. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 580–587.
Gochoo, M., Otgonbold, M.-E., Ganbold, E., Hsieh, J.-W.,
Chang, M.-C., Chen, P.-Y., Dorj, B., Al Jassmi, H.,
Batnasan, G., Alnajjar, F., Abduljabbar, M., and Lin,
F.-P. (2023). Fisheye8k: A benchmark and dataset for
fisheye camera object detection. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition (CVPR) Workshops.
Gou, J., Yu, B., Maybank, S. J., and Tao, D. (2021). Knowl-
edge distillation: A survey. International Journal of
Computer Vision, 129:1789–1819.
He, K., Chen, X., Xie, S., Li, Y., Doll
´
ar, P., and Girshick,
R. (2022). Masked autoencoders are scalable vision
learners. In Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition, pages
16000–16009.
Jia, X., Tong, Y., Qiao, H., Li, M., Tong, J., and Liang,
B. (2023). Fast and accurate object detector for au-
tonomous driving based on improved yolov5. Scien-
tific reports, 13(1):1–13.
Jocher, G., Chaurasia, A., Qiu, J., and Ultralytics (2023).
Ultralytics yolov8: State-of-the-art model for real-
time object detection, segmentation, and classifica-
tion. https://github.com/ultralytics/ultralytics. Ac-
cessed: 2023-08-28.
Jocher, G., Chaurasia, A., Stoken, A., Borovec, J., Kwon,
Y., Michael, K., Fang, J., Yifu, Z., Wong, C., Montes,
D., et al. (2022). ultralytics/yolov5: v7. 0-yolov5 sota
realtime instance segmentation. Zenodo.
Ju, R.-Y. and Cai, W. (2023). Fracture detection in pe-
diatric wrist trauma x-ray images using yolov8 algo-
rithm. arXiv preprint arXiv:2304.05071.
Kannala, J. and Brandt, S. S. (2006). A generic camera
model and calibration method for conventional, wide-
angle, and fish-eye lenses. IEEE transactions on pat-
tern analysis and machine intelligence, 28(8):1335–
1340.
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C.,
Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C.,
Lo, W.-Y., et al. (2023). Segment anything. arXiv
preprint arXiv:2304.02643.
Kolbeinsson, B. and Mikolajczyk, K. (2023). DDOS: The
drone depth and obstacle segmentation dataset. arXiv
preprint arXiv:2312.12494.
Kong, T., Sun, F., Liu, H., Jiang, Y., Li, L., and Shi,
J. (2020). Foveabox: Beyound anchor-based object
detection. IEEE Transactions on Image Processing,
29:7389–7398.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
agenet classification with deep convolutional neural
networks. Advances in neural information processing
systems, 25.
Law, H. and Deng, J. (2018). Cornernet: Detecting objects
as paired keypoints. In Proceedings of the European
conference on computer vision (ECCV), pages 734–
750.
Li, T., Tong, G., Tang, H., Li, B., and Chen, B. (2020).
Fisheyedet: A self-study and contour-based object
detector in fisheye images. IEEE Access, 8:71739–
71751.
Li, Y., Mao, H., Girshick, R., and He, K. (2022). Exploring
plain vision transformer backbones for object detec-
tion. pages 280–296.
Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., and Sun,
J. (2017). Light-head r-cnn: In defense of two-stage
object detector. arXiv preprint arXiv:1711.07264.
Lin, T.-Y., Doll
´
ar, P., Girshick, R., He, K., Hariharan, B.,
and Belongie, S. (2017a). Feature pyramid networks
for object detection. In Proceedings of the IEEE con-
ference on computer vision and pattern recognition,
pages 2117–2125.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll
´
ar, P.
(2017b). Focal loss for dense object detection. In
Proceedings of the IEEE international conference on
computer vision, pages 2980–2988.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,
Ramanan, D., Doll
´
ar, P., and Zitnick, C. L. (2014).
Microsoft coco: Common objects in context. In Com-
puter Vision–ECCV 2014: 13th European Confer-
ence, Zurich, Switzerland, September 6-12, 2014, Pro-
ceedings, Part V 13, pages 740–755. Springer.
Liu, H., Duan, X., Lou, H., Gu, J., Chen, H., and Bi,
L. (2023). Improved gbs-yolov5 algorithm based on
yolov5 applied to uav intelligent traffic. Scientific Re-
ports, 13(1):9577.
Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018). Path ag-
gregation network for instance segmentation. In Pro-
ceedings of the IEEE conference on computer vision
and pattern recognition, pages 8759–8768.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,
Fu, C.-Y., and Berg, A. C. (2016). Ssd: Single shot
multibox detector. In Computer Vision–ECCV 2016:
14th European Conference, Amsterdam, The Nether-
lands, October 11–14, 2016, Proceedings, Part I 14,
pages 21–37. Springer.
Lyu, Y., Vosselman, G., Xia, G.-S., Yilmaz, A., and Yang,
M. Y. (2020). Uavid: A semantic segmentation dataset
for uav imagery. ISPRS Journal of Photogrammetry
and Remote Sensing, 165:108 – 119.
Purkait, P., Zhao, C., and Zach, C. (2017). Spp-net: Deep
absolute pose regression with synthetic views. arXiv
preprint arXiv:1712.03452.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G.,
Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark,
J., et al. (2021). Learning transferable visual models
from natural language supervision. pages 8748–8763.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.
(2016). You only look once: Unified, real-time object
detection. In Proceedings of the IEEE conference on
ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods
556