
face vehicles by IMU-assisted semantic segmentation.
Robotics and Autonomous Systems, 104:1–13.
Bovcon, B., Muhovi
ˇ
c, J., Per
ˇ
s, J., and Kristan, M. (2019).
The MaSTr1325 dataset for training deep USV obsta-
cle detection models. In 2019 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS),
pages 3431–3438.
Bovcon, B., Muhovi
ˇ
c, J., Vranac, D., Mozeti
ˇ
c, D., Per
ˇ
s,
J., and Kristan, M. (2022). MODS—A USV-Oriented
Object Detection and Obstacle Segmentation Bench-
mark. IEEE Transactions on Intelligent Transporta-
tion Systems, 23(8):13403–13418.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei,
L. (2009). ImageNet: A large-scale hierarchical im-
age database. In 2009 IEEE Conference on Computer
Vision and Pattern Recognition, pages 248–255.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn,
D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer,
M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby,
N. (2021). An Image is Worth 16x16 Words: Trans-
formers for Image Recognition at Scale. In 9th In-
ternational Conference on Learning Representations,
ICLR 2021.
Gundogdu, E., Solmaz, B., Y
¨
ucesoy, V., and Koc¸, A.
(2017). MARVEL: A Large-Scale Image Dataset for
Maritime Vessels. In Computer Vision – ACCV 2016,
pages 165–180.
He, K., Gkioxari, G., Doll
´
ar, P., and Girshick, R. B. (2017).
Mask R-CNN. In ICCV, pages 2980–2988.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep Resid-
ual Learning for Image Recognition. In Proceedings
of 2016 IEEE Conference on Computer Vision and
Pattern Recognition, CVPR ’16, pages 770–778.
Hearst, M., Dumais, S., Osuna, E., Platt, J., and Scholkopf,
B. (1998). Support Vector Machines. IEEE Intelligent
Systems and their Applications, 13(4):18–28.
Jocher, G. (2020). ultralytics/yolov5: v3.1 - Bug Fixes
and Performance Improvements. https://github.com/
ultralytics/yolov5.
Kiefer, B. et al. (2023). 1st Workshop on Maritime Com-
puter Vision (MaCVi) 2023: Challenge Results. In
Proceedings of the IEEE/CVF Winter Conference on
Applications of Computer Vision (WACV) Workshops,
pages 265–302.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
ageNet Classification with Deep Convolutional Neu-
ral Networks. In Advances in Neural Information Pro-
cessing Systems, volume 25.
Leclerc, M., Tharmarasa, R., Florea, M. C., Boury-Brisset,
A.-C., Kirubarajan, T., and Duclos-Hindi
´
e, N. (2018).
Ship Classification Using Deep Learning Techniques
for Maritime Target Tracking. In 2018 21st Interna-
tional Conference on Information Fusion (FUSION),
pages 737–744.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S.,
and Guo, B. (2021). Swin Transformer: Hierarchical
Vision Transformer using Shifted Windows. CoRR,
abs/2103.14030.
Loshchilov, I. and Hutter, F. (2019). Decoupled Weight De-
cay Regularization. In International Conference on
Learning Representations.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.
(2016). You Only Look Once: Unified, Real-Time
Object Detection. In 2016 IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR), pages
779–788.
Salem, M. H., Li, Y., and Liu, Z. (2022). Transfer Learning
on EfficientNet for Maritime Visible Image Classifica-
tion. In 2022 7th International Conference on Signal
and Image Processing (ICSIP), pages 514–520.
Salem, M. H., Li, Y., Liu, Z., and AbdelTawab, A. M.
(2023). A Transfer Learning and Optimized CNN
Based Maritime Vessel Classification System. Applied
Sciences, 13(3).
Sandler, M., Howard, A. G., Zhu, M., Zhmoginov, A., and
Chen, L. (2018). Inverted Residuals and Linear Bot-
tlenecks: Mobile Networks for Classification, Detec-
tion and Segmentation. CoRR, abs/1801.04381.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna,
Z. (2015). Rethinking the Inception Architecture for
Computer Vision. 2016 IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR), pages
2818–2826.
Tan, M. and Le, Q. (2019). EfficientNet: Rethinking Model
Scaling for Convolutional Neural Networks. In Pro-
ceedings of the 36th International Conference on Ma-
chine Learning, volume 97 of Proceedings of Machine
Learning Research, pages 6105–6114.
Ter
ˇ
sek, M.,
ˇ
Zust, L., and Kristan, M. (2023). eWaSR
– An Embedded-Compute-Ready Maritime Obstacle
Detection Network. Sensors, 23(12):5386.
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles,
A., and Jegou, H. (2021). Training data-efficient im-
age transformers & distillation through attention.
Woo, S., Debnath, S., Hu, R., Chen, X., Liu, Z., Kweon,
I. S., and Xie, S. (2023). ConvNeXt V2: Co-designing
and Scaling ConvNets with Masked Autoencoders. In
2023 IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), pages 16133–16142.
Zhu, X., Lyu, S., Wang, X., and Zhao, Q. (2021). TPH-
YOLOv5: Improved YOLOv5 Based on Transformer
Prediction Head for Object Detection on Drone-
captured Scenarios. In 2021 IEEE/CVF International
Conference on Computer Vision Workshops (ICCVW),
pages 2778–2788.
ˇ
Zust, L. and Kristan, M. (2022). Temporal Context for Ro-
bust Maritime Obstacle Detection. In 2022 IEEE/RSJ
International Conference on Intelligent Robots and
Systems (IROS), pages 6340–6346.
VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications
122