
adjusting the color and thickness of the augmentation,
can impact the model’s performance. Additionally,
model scaling shows that larger pre-trained models
can improve accuracy, especially in low-shot scenar-
ios. We hope that the ideas and empirical findings
presented in this paper provide valuable insights for
unlocking potential industrial applications that only
have access to limited data. In the future, we plan to
enable the model to learn the visual augmentation pol-
icy from the data itself, rather than relying on manual
decisions, in order to simplify the overall process.
REFERENCES
Benotsmane, R., Dud
´
as, L., and Kov
´
acs, G. (2020). Survey
on new trends of robotic tools in the automotive in-
dustry. In Vehicle and automotive engineering, pages
443–457. Springer.
Chien, P.-C., Liao, P., Fukuzawa, E., and Ohya, J. (2024).
Classifying cable tendency with semantic segmenta-
tion by utilizing real and simulated rgb data. In Pro-
ceedings of the IEEE/CVF Winter Conference on Ap-
plications of Computer Vision, pages 8430–8438.
Dosovitskiy, A. (2020). An image is worth 16x16 words:
Transformers for image recognition at scale. arXiv
preprint arXiv:2010.11929.
Gao, P., Geng, S., Zhang, R., Ma, T., Fang, R., Zhang,
Y., Li, H., and Qiao, Y. (2024). Clip-adapter: Bet-
ter vision-language models with feature adapters. In-
ternational Journal of Computer Vision, 132(2):581–
595.
Gu, X., Lin, T.-Y., Kuo, W., and Cui, Y. (2021).
Open-vocabulary object detection via vision and
language knowledge distillation. arXiv preprint
arXiv:2104.13921.
Heisler, P., Utsch, D., Kuhn, M., and Franke, J. (2021).
Optimization of wire harness assembly using human–
robot-collaboration. Procedia CIRP, 97:260–265.
Hermansson, T., Bohlin, R., Carlson, J. S., and S
¨
oderberg,
R. (2013). Automatic assembly path planning for
wiring harness installations. Journal of manufactur-
ing systems, 32(3):417–422.
Hessel, J., Holtzman, A., Forbes, M., Bras, R. L., and
Choi, Y. (2021). Clipscore: A reference-free eval-
uation metric for image captioning. arXiv preprint
arXiv:2104.08718.
Karabegovi
´
c, I., Karabegovi
´
c, E., Mahmi
´
c, M., and Husak,
E. (2018). Innovative automation of production pro-
cesses in the automotive industry. International Jour-
nal of Engineering.
Kwon, G. and Ye, J. C. (2022). Clipstyler: Image style
transfer with a single text condition. In Proceedings
of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pages 18062–18071.
Li, S., Yuan, M., Wang, W., Cao, F., Shi, H., Zhang, Y.,
and Meng, X. (2024). Enhanced yolo-and wearable-
based inspection system for automotive wire harness
assembly. Applied Sciences, 14(7):2942.
Liang, F., Wu, B., Dai, X., Li, K., Zhao, Y., Zhang,
H., Zhang, P., Vajda, P., and Marculescu, D. (2023).
Open-vocabulary semantic segmentation with mask-
adapted clip. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition,
pages 7061–7070.
Liao, P. and Nakano, G. (2025). Bridgeclip: Automatic
bridge inspection by utilizing vision-language model.
In International Conference on Pattern Recognition,
pages 61–76. Springer.
L
¨
uddecke, T. and Ecker, A. (2022). Image segmentation
using text and image prompts. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition, pages 7086–7096.
Mokady, R., Hertz, A., and Bermano, A. H. (2021). Clip-
cap: Clip prefix for image captioning. arXiv preprint
arXiv:2111.09734.
Navas-Reascos, G. E., Romero, D., Stahre, J., and
Caballero-Ruiz, A. (2022). Wire harness assembly
process supported by collaborative robots: Literature
review and call for r&d. Robotics, 11(3):65.
Nguyen, H. G., Habiboglu, R., and Franke, J. (2022). En-
abling deep learning using synthetic data: A case
study for the automotive wiring harness manufactur-
ing. Procedia CIRP, 107:1263–1268.
Nguyen, H. G., Kuhn, M., and Franke, J. (2021). Manu-
facturing automation for automotive wiring harnesses.
Procedia Cirp, 97:379–384.
Papulov
´
a, Z., Ga
ˇ
zov
´
a, A., and
ˇ
Sufliarsk
`
y, L. (2022). Im-
plementation of automation technologies of industry
4.0 in automotive manufacturing companies. Proce-
dia Computer Science, 200:1488–1497.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G.,
Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark,
J., et al. (2021). Learning transferable visual models
from natural language supervision. In International
conference on machine learning, pages 8748–8763.
PMLR.
Shtedritski, A., Rupprecht, C., and Vedaldi, A. (2023).
What does clip know about a red circle? visual
prompt engineering for vlms. In Proceedings of the
IEEE/CVF International Conference on Computer Vi-
sion, pages 11987–11997.
Sun, X., Hu, P., and Saenko, K. (2022). Dualcoop: Fast
adaptation to multi-label recognition with limited an-
notations. Advances in Neural Information Processing
Systems, 35:30569–30582.
Trommnau, J., K
¨
uhnle, J., Siegert, J., Inderka, R., and
Bauernhansl, T. (2019). Overview of the state of the
art in the production process of automotive wire har-
nesses, current research and future trends. Procedia
CIRP, 81:387–392.
Wang, H., Salunkhe, O., Quadrini, W., L
¨
amkull, D., Ore,
F., Despeisse, M., Fumagalli, L., Stahre, J., and Jo-
hansson, B. (2024). A systematic literature review of
computer vision applications in robotized wire har-
ness assembly. Advanced Engineering Informatics,
62:102596.
FFAD: Fixed-Position Few-Shot Anomaly Detection for Wire Harness Utilizing Vision-Language Models
655