clearly outperformed the rest of techniques. That
leads to the conclusion that our network was little
affected by overfitting. If all lighting conditions are
taken into account, our method had the best overall
performance.
All in all, although the methods are not directly
comparable, the results demonstrate that employing a
triplet architecture during the training of a CNN im-
proves its performance in the localization task.
7 CONCLUSIONS
Throughout this work, we propose a framework to
perform visual localization with a triplet architecture
and we analyze the main factors that influence the
training process, which are the choice of the triplet
loss function, the triplet sample selection criteria and
the batch size. The experiments reveal that, despite
the fact that triplet architectures have demonstrated
to improve substantially the performance of the net-
work, the right selection of the studied parameters is
a key factor to fully exploit their potential.
In future works, this study could be extended to
outdoor environments, which are much more unstruc-
tured and challenging. Furthermore, we will explore
the use of quadruplet architectures to tackle visual lo-
calization, which are composed of four branches of
CNNs and are able to learn similarities and differ-
ences amongst four images. Finally, we will address
the visual compass problem in order to fully locate
the robot in the floor plane.
ACKNOWLEDGEMENTS
This work is part of the project TED2021-130901B-
I00 funded by MCIN/AEI/10.13039/501100011033
and by the European Union “NextGenera-
tionEU”/PRTR. The work is also part of the
project PROMETEO/2021/075 funded by Generalitat
Valenciana.
REFERENCES
Arroyo, R., Alcantarilla, P. F., Bergasa, L. M., and Romera,
E. (2016). Fusion and binarization of cnn features
for robust topological localization across seasons. In
2016 IEEE/RSJ International Conference on Intelli-
gent Robots and Systems (IROS), pages 4656–4663.
Benyahia, S., Meftah, B., and L
´
ezoray, O. (2022). Multi-
features extraction based on deep learning for skin le-
sion classification. Tissue and Cell, 74:101701.
Brosh, E., Friedmann, M., Kadar, I., Yitzhak Lavy, L., Levi,
E. . . ., and Darrell, T. (2019). Accurate visual local-
ization for automotive applications. In Proceedings
of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR) Workshops.
Cabrera, J. J., Cebollada, S., Flores, M., Reinoso,
´
O., and
Pay
´
a, L. (2022). Training, optimization and validation
of a cnn for room retrieval and description of omnidi-
rectional images. SN Computer Science, 3(4):271.
Cabrera, J. J., Rom
´
an, V., Gil, A., Reinoso, O., and Pay
´
a, L.
(2024). An experimental evaluation of siamese neu-
ral networks for robot localization using omnidirec-
tional imaging in indoor environments. Artificial In-
telligence Review, 57(8):198.
Cebollada, S., Pay
´
a, L., Jiang, X., and Reinoso, O. (2022).
Development and use of a convolutional neural net-
work for hierarchical appearance-based localization.
Artificial Intelligence Review, 55(4):2847–2874.
Cebollada, S., Pay
´
a, L., Mayol, W., and Reinoso, O. (2019).
Evaluation of clustering methods in compression of
topological models and visual place recognition us-
ing global appearance descriptors. Applied Sciences,
9(3):377.
Flores, M., Valiente, D., Peidr
´
o, A., Reinoso, O., and Pay
´
a,
L. (2024). Generating a full spherical view by mod-
eling the relation between two fisheye images. The
Visual Computer, pages 1–26.
Foroughi, F., Chen, Z., and Wang, J. (2021). A cnn-based
system for mobile robot navigation in indoor envi-
ronments via visual localization with a small dataset.
World Electric Vehicle Journal, 12(3).
Hermans, A., Beyer, L., and Leibe, B. (2017). In defense
of the triplet loss for person re-identification. arXiv
preprint arXiv:1703.07737.
Olid, D., F
´
acil, J. M., and Civera, J. (2018). Single-
view place recognition under seasonal changes. arXiv
preprint arXiv:1808.06516.
Pronobis, A. and Caputo, B. (2009). Cold: The cosy
localization database. The International Journal of
Robotics Research, 28(5):588–594.
Rostkowska, M. and Skrzypczy
´
nski, P. (2023). Optimizing
appearance-based localization with catadioptric cam-
eras: Small-footprint models for real-time inference
on edge devices. Sensors, 23(14):6485.
Simonyan, K. and Zisserman, A. (2014). Very deep con-
volutional networks for large-scale image recognition.
arXiv preprint arXiv:1409.1556.
Sun, Y., Cheng, C., Zhang, Y., Zhang, C., Zheng, L., Wang,
Z., and Wei, Y. (2020). Circle loss: A unified perspec-
tive of pair similarity optimization. In Proceedings
of the IEEE/CVF conference on computer vision and
pattern recognition, pages 6398–6407.
Uy, M. A. and Lee, G. H. (2018). Pointnetvlad: Deep point
cloud based retrieval for large-scale place recognition.
In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR).
Wang, J., Zhou, F., Wen, S., Liu, X., and Lin, Y. (2017).
Deep metric learning with angular loss. In Proceed-
ings of the IEEE international conference on com-
puter vision, pages 2593–2601.
ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics
132