Table 1: Comparison with State-of-the-Art Methods.
Model mAP Score
SSD 0.634
Faster-RCNN 0.627
YOLOv5 0.573
YOLOv6 0.666
Proposed Model 0.732
tomated quality control processes in metal manufac-
turing industries.
However, our approach also has limitations. For
instance, the current model may struggle with detect-
ing highly irregular defects due to limitations in the
training data. In addition, the localization and the
classification processes are not fast enough due to
the complex details of the transformer architecture.
The multi-head attention mechanism and the numer-
ous layers in the transformer architecture significantly
increase computational overhead, which further slows
down the processing speed and decreases computa-
tional efficiency.
Addressing these limitations could involve adding
more samples to Multi-DET dataset to introduce these
variations and incorporating further development on
the vision transformer to optimize the detection and
classification processes.
6 CONCLUSIONS
Automated defect detection on metal surfaces is a cru-
cial research area as it contributes to various indus-
tries, like automotive and construction. Manual in-
spection methods are slow and subjective, calling for
automated systems. This study proposes using Vi-
sion Transformers to overcome the limitations of tra-
ditional methods. ViTs, with their attention mech-
anisms, can capture complex defect patterns effec-
tively. The research focuses on defect classification
and localization, using pre-trained ViTs and transfer
learning. By automating defect detection, the ap-
proach aims to improve product quality and reduce
errors in metal manufacturing. The study addresses
a research gap in applying ViTs to metal surface de-
fect detection, contributing to the field. The promis-
ing results demonstrate accurate defect classification
and precise defect localization. The proposed model
achieved 93.5% accuracy in defect detection outper-
forming YOLO-based methods with a mean average
precision of 0.732. These results demonstrate the
model’s performance and its potential impact across
multiple industries.
Our methodology offers a promising approach for
addressing the challenges posed by metal defects in
manufacturing and reshaping industries. However,
there is still room for improvement, particularly in
addressing the model’s capability for detecting ex-
tremely overlapping and irregular shapes of defects.
This can be done by adding degrees of freedom to the
model while augmenting the training dataset. In ad-
dition, optimizing the model to work in real-time will
levitate the model’s performance. This limitation is
due to the complexity of the ViT. Despite the effec-
tiveness, ViTs are known for their high computational
demands in terms of memory and processing power.
This can be challenging when deploying the model in
real-time industrial settings.
Ultimately, this research paves the way for more
effective defect detection, ensuring the production of
high-quality metal products, and reducing operational
challenges in various industries.
ACKNOWLEDGEMENTS
We would like to extend our sincere gratitude to Eng.
Fatma Youssef, for her invaluable help and guidance
throughout this project. Her expertise and thoughtful
advice have played a crucial role in shaping the path
and achievements of this research.
REFERENCES
Bochkovskiy, A., Wang, C.-Y., and Liao, H.-Y. M. (2020).
Yolov4: Optimal speed and accuracy of object detec-
tion.
Dixit, K. (2020). Neu-det neu surface defect database.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn,
D., Zhai, X., Unterthiner, T., Dehghani, M., Min-
derer, M., Heigold, G., Gelly, S., Uszkoreit, J., and
Houlsby, N. (2020). An image is worth 16x16 words:
Transformers for image recognition at scale. CoRR,
abs/2010.11929.
Fang, X., Luo, Q., Zhou, B., Li, C., and Tian, L. (2020).
Research progress of automated visual surface defect
detection for industrial metal planar materials. Sen-
sors, 20(18):5136.
Girshick, R., Lin, T., Dollar, P., Belongie, S., et al. (2017).
Feature pyramid networks for object detection. Face-
book AI Research (FAIR)(19 April 2017).
He, T., Zhang, Z., Zhang, H., Zhang, Z., Xie, J., and Li,
M. (2019). Bag of tricks for image classification with
convolutional neural networks. In Proceedings of the
IEEE/CVF conference on computer vision and pattern
recognition, pages 558–567.
Konovalenko, I., Maruschak, P., Brezinov
´
a, J., Pren-
tkovskis, O., and Brezina, J. (2022). Research of u-
net-based cnn architectures for metal surface defect
detection. Machines, 10(5).
ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics
44