continuously optimize algorithms by improving
network architectures, enhancing original data, and
optimizing loss functions, leading to significant
improvements in both accuracy and speed. With deep
learning's ongoing advancement, the application
scope of object detection is becoming increasingly
widespread.
As algorithms for object detection grounded in
deep learning continue to develop and be applied, the
domain of object detection has made significant
progress. However, numerous challenges remain
unresolved, including the detection of tiny objects,
insufficient robustness, and model architecture
optimization.
Small object detection is a critical aspect of object
detection, as realistic scenes from the real world
involve detecting objects of different scales,
especially small objects. Due to the small size,
indistinct features, and low contrast of small objects,
accurately detecting small targets becomes
challenging. Therefore, one of the key future
approaches is to further optimize small object
detection by using attention processes, multi-scale
detection methods, and feature enhancement
techniques.
In real-world scenarios, real images are prone to
occlusion, blurring, changes in lighting, noise, and
other external variations that can hinder effective
object detection. Addressing how to make models
more adaptable to specific real-world scenarios is a
significant challenge. Therefore, continually
improving model performance through methods like
incorporating contextual information, selective
parameter sharing, and complementary feature fusion
is crucial to adapt to specific scene-based object
detection requirements.
The underlying network architecture is the
foundation of object detection algorithms, and
optimizing the network architecture has always been
an important area of study for object detection.
Currently, the selection of network architectures has
some randomness, displaying different performances
for different detection tasks. Therefore, enhancing the
processing efficiency of network architectures is an
important future direction.
There has been considerable research on 3D
object detection, but most algorithms are not yet
mature. Conducting precise 3D object detection using
high-precision LiDAR point clouds is expensive and
sensitive to weather conditions. Therefore, how to
elevate 2D images to 3D for detection has become a
research direction. One approach is to address this
problem by using methods such as inverse perspective
mapping (IPM) and orthogonal feature transformation
(OFT) to convert perspective images into bird's-eye
views (BEV). Another approach involves obtaining
relationships through overall size and inter-keypoint
size.
REFERENCES
L. Juan and O. Gwun, "A comparison of SIFT, PCA-SIFT,
and SURF," in International Journal of Image
Processing (IJIP), vol. 3, no. 4, 2013, pp. 143-152.
X. Y. Wang, T. X. Han, and S. C. Yan, "An HOG-LBP
human detector with partial occlusion handling," in
Proceedings of the 12th IEEE International
Conference on Computer Vision, Kyoto, Japan: IEEE,
2009), pp. 32-39.
R. Girshick, J. Donahue, T. Darrell, J. Malik, "Rich Feature
Hierarchies for Accurate Object Detection and
Semantic Segmentation," in Proceedings of 2014
IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), Columbus, OH, USA: IEEE,
2014, pp. 580-587.
T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D.
Ramanan, et al. "Microsoft COCO: Common Objects
in Context," in Computer Vision–ECCV 2014: 13th
European Conference, Zurich, Switzerland,
September 6-12, 2014, Proceedings, Part V 13, pp.
740-755.
K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun, "Spatial Pyramid
Pooling in Deep Convolutional Networks for Visual
Recognition," in IEEE Transactions on Pattern
Analysis and Machine Intelligence (2015, 37(9)), pp.
1904-1916.
G. Papandreou, I. Kokkinos, P.-A. Savalle, "Modeling
Local and Global Deformations in Deep Learning:
Epitomic Convolution, Multiple Instance Learning,
and Sliding Window Detection," in Proceedings of
the IEEE conference on computer vision and pattern
recognition (2015), pp. 390-399.
J. R. R. Uijlings, K. E. A. van de Sande, Gevers T,
Smeulders A. W. M., "Selective Search for Object
Recognition," in International Journal of Computer
Vision (2013, 104(2)), pp. 154-171.
J. Redmon, S. Divvala, R. Girshick, A. Farhadi, "You Only
Look Once: Unified, Real-Time Object Detection," in
Proceedings of 2016 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), Las Vegas,
NV, USA: IEEE, 2016, pp. 779-788.
J. Redmon, A. Farhadi, "YOLO9000: Better, Faster,
Stronger," in Proceedings of 2017 IEEE Conference
on Computer Vision and Pattern Recognition
(CVPR), Honolulu, HI, USA: IEEE, 2017, pp. 6517-
6525.
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y.
Fu, A. C. Berg, "SSD: Single Shot MultiBox
Detector," in Proceedings of the 14th European
Conference on Computer Vision (ECCV),
Amsterdam: Springer, 2016, pp. 21-37.
Evolution of Object Detection Algorithms Utilizing Deep Learning
187