tors, called LF, is the encoder output whilst the other,
called LGID, is the decoder output. Both descriptors
can be compared among different images to decide
if they are likely to close a loop. The experimental
results show the ability of the presented approach to
detect loop candidates and compare favourably to a
previously existing method.
Since our proposal has shown to be trainable with
a small set of images and the architecture is simple
enough to provide on-line sets of loop candidates,
it constitutes a promising approach to be embedded
into a full underwater visual SLAM system. Even
though it cannot replace a full SLAM loop closing
layer, since it does not compute the relative motion
between images, it can strongly reduce the compu-
tational load of such module by feeding it only with
images that most likely will close a loop.
Accordingly, our lines of future research are as
follows. First, we are now working on a strategy
to confirm or deny the loop candidates as well as to
compute the relative motion between the confirmed
loops. In this way, our proposal could be embedded
into a full SLAM system. Second, even though small
datasets have shown to be sufficient to reach good
results, larger datasets would probably lead to better
candidate sets. For this reason, we are also working
on a method to synthetically generate loops from un-
derwater imagery. This would constitute a weakly su-
pervised approach and would allow training the NN
with arbitrarily large datasets. Our final goal would
be to integrate the whole loop detection not only into
a SLAM system but also into a Multi-Session SLAM
system, which will definitely prove the ability of our
proposal to detect loops in a really challenging sce-
nario.
ACKNOWLEDGEMENTS
This work is partially supported by the Spanish Min-
istry of Economy and Competitiveness under contract
DPI2017-86372-C3-3-R (AEI,FEDER,UE).
REFERENCES
Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., and
Sivic, J. (2018). NetVLAD: CNN Architecture for
Weakly Supervised Place Recognition. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence,
40(6):1437–1451.
Bonin-Font, F., Burguera, A., and Oliver, G. (2013). New
solutions in underwater imaging and vision systems.
In Imaging Marine Life: Macrophotography and Mi-
croscopy Approaches for Marine Biology, pages 23–
47.
Burguera, A. and Bonin-Font, F. (2019). A trajectory-based
approach to multi-session underwater visual slam us-
ing global image signatures. MDPI Journal of Marine
Science and Engineering, 7(8).
Burguera, A., Bonin-Font, F., and Oliver, G. (2015).
Trajectory-based visual localization in underwa-
ter surveying missions. Sensors (Switzerland),
15(1):1708–1735.
Ciarfuglia, T. A., Costante, G., Valigi, P., and Ricci, E.
(2012). A discriminative approach for appearance
based loop closing. In IEEE International Conference
on Intelligent Robots and Systems, pages 3837–3843.
Durrant-Whyte, H. and Bailey, T. (2006). Simultaneous lo-
calization and mapping (SLAM): part I The Essential
Algorithms. Robotics & Automation Magazine, 2:99–
110.
Jaakkola, T. S. and Haussler, D. (1999). Exploiting gener-
ative models in discriminative classifiers. In Proceed-
ings of the Conference on Advances in Neural Infor-
mation Processing Systems, pages 487—-493.
J
´
egou, H., Douze, M., Schmid, C., and P
´
erez, P. (2010).
Aggregating local descriptors into a compact image
representation. In Proceedings of the IEEE Computer
Society Conference on Computer Vision and Pattern
Recognition, pages 3304–3311.
Kingma, D. P. and Welling, M. (2014). Auto-Encoding
Variational Bayes (VAE, reparameterization trick).
ICLR 2014, (Ml):1–14.
Merril, N. and Huang, G. (2018). Lightweight Unsuper-
vised Deep Loop Closure. In Robotics: Science and
Systems.
Mur-Artal, R. and Tardos, J. D. (2017). ORB-SLAM2:
An Open-Source SLAM System for Monocular,
Stereo, and RGB-D Cameras. IEEE Transactions on
Robotics, 33(5):1255–1262.
Negre-Carrasco, P. L., Bonin-Font, F., and Oliver-Codina,
G. (2016). Global image signature for visual loop-
closure detection. Autonomous Robots, 40(8):1403–
1417.
Perronnin, F. and Dance, C. (2007). Fisher kernels on visual
vocabularies for image categorization. In Proceedings
of the IEEE Computer Society Conference on Com-
puter Vision and Pattern Recognition.
Rezende, D. J., Mohamed, S., and Wierstra, D. (2014).
Stochastic Back-propagation and Variational Infer-
ence in Deep Latent Gaussian Models. Proceedings
of The 31st . . . , 32:1278–1286.
Taketomi, T., Uchiyama, H., and Ikeda, S. (2017). Visual
SLAM algorithms: a survey from 2010 to 2016. IPSJ
Transactions on Computer Vision and Applications,
9(1).
Towards Visual Loop Detection in Underwater Robotics using a Deep Neural Network
673