In essence, the experimental findings presented in
this chapter are a direct consequence of the C2f
module and DFL function integrated into the
YOLOv5 architecture. The results not only validate
the proposed methods but also highlight the
significance of these enhancements in achieving high
accuracy in complex object detection scenarios, a
crucial factor in the applicability of such models in
real-world settings.
4 CONCLUSIONS
This study enhances object detection by innovatively
modifying the YOLOv5 architecture. This article
introduces the C2f module for improving gradient
flow and feature learning and integrates the
distribution focus loss with the CIoU loss function for
fine localization and classification. This fusion solves
the inherent class imbalance and complex spatial
relationships in dense object scenes. Additionally,
this article rigorously evaluates the enhanced model
using the COCO128 dataset, demonstrating
substantial improvements compared to the original
YOLOv5. Key indicators such as mAP, recall, and
bounding box prediction accuracy demonstrate
enhanced effectiveness. Emphasis was placed on the
role of C2f modules in ensuring lightweight yet
accurate architecture, as well as the contribution of
DFL+CIoU loss functions in addressing changes in
scale and complex object relationships. Future work
will aim to scale the model for broader real-world
applications, focusing on enhancing its robustness
across varying object sizes and environmental
conditions. The objective is to optimize performance
through advanced data augmentation and expanded
dataset diversity, ensuring the model's adaptability
and effectiveness in real-world deployments. This
research lays the groundwork for future
advancements in object detection, paving the way for
more nuanced and robust detection models.
REFERENCES
Girshick, R., Donahue, J., Darrell, T., & Malik, J., 2014.
Rich feature hierarchies for accurate object detection
and semantic segmentation, CVPR, pp. 580–587.
He, K., Zhang, X., Ren, S., & Sun, J. 2015. Spatial pyramid
pooling in deep convolutional networks for visual
recognition. IEEE transactions on pattern analysis and
machine intelligence, vol, 37(9), pp: 1904-1916.
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B.,
& Belongie, S. 2017. Feature pyramid networks for
object detection. In Proceedings of the IEEE conference
on computer vision and pattern recognition pp. 2117-
2125.
Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. 2017.
Focal loss for dense object detection. In Proceedings of
the IEEE international conference on computer vision.
pp. 2980-2988.
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P.,
Ramanan, D., & Zitnick, C. L. 2014. Microsoft coco:
Common objects in context. In Computer Vision–
ECCV 2014: 13th European Conference, pp. 740-755.
Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. 2018. Path
aggregation network for instance segmentation. In
Proceedings of the IEEE conference on computer vision
and pattern recognition, pp. 8759-8768.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,
Fu, Y.C., & Berg, A.C., 2016. Ssd: Single shot
multibox detector. ECCV. pp. 21–37.
Misra, D., 2019. Mish: A self regularized non-monotonic
activation function. arXiv:1908.08681.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A., 2016.
You only look once: Unified, real-time object detection.
CVPR, pp. 779–788.
Redmon, J., & Farhadi, A., 2017. YOLO9000: better, faster,
stronger. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pp. 7263-7271.
Redmon, J., & Farhadi, A., 2018. Yolov3: An incremental
improvement. arXiv:1804.02767.
Sun, Y., Chen, G., Zhou, T., Zhang, Y., & Liu, N. 2021.
Context-aware cross-level fusion network for
camouflaged object detection. arXiv:2105.12555.
Wang, C. Y., Liao, H. Y. M., Wu, Y. H., Chen, P. Y., Hsieh,
J. W., & Yeh, I. H. 2020. CSPNet: A new backbone that
can enhance learning capability of CNN. In
Proceedings of the IEEE/CVF conference on computer
vision and pattern recognition workshops, pp. 390-391.
Zhang, X., Zeng, H., Guo, S., & Zhang, L. 2022. Efficient
long-range attention network for image super-
resolution. In European conference on computer vision,
pp. 649-667.
Zheng, Z., Wang, P., Ren, D., Liu, W., Ye, R., Hu, Q., &
Zuo, W., 2021. Enhancing geometric factors in model
learning and inference for object detection and instance
segmentation. IEEE transactions on cybernetics, vol.
52(8), pp: 8574-8586.