ber of matches to bound the complexity of the VIO
optimization while maintaining maximum accuracy.
Our extensive tests on the EuRoC datasets show that
our system achieves state-of-the-art odometry perfor-
mance according to relative translation and rotation
errors, at the cost of a slight increase of computational
complexity. Now, our tests highlight different ways in
which this work could be improved:
1. A dedicated, deep feature-based loop closure sys-
tem could be appended for full SLAM capability.
2. As shown in the tests, the local map-based match-
ing of HFNet-SLAM and ORB-SLAM could po-
tentially improve the global consistency of the tra-
jectories returned by our system.
3. The current implementation of the feature tracker
is in Python and PyTorch; important speed im-
provements could be obtained by switching to a
C++ / TensorRT implementation.
Following open science principles and to stimulate
work in the directions mentioned hereabove, we open
source the code of our system (upon acceptance).
REFERENCES
Bruno, H. M. S. and Colombini, E. (2020). Lift-slam:
a deep-learning feature-based monocular visual slam
method. Neurocomputing, 455:97–110.
Burri, M., Nikolic, J., Gohl, P., Schneider, T., Rehder, J.,
Omari, S., Achtelik, M. W., and Siegwart, R. (2016).
The euroc micro aerial vehicle datasets. The Interna-
tional Journal of Robotics Research.
Campos, C., Elvira, R., Gomez, J. J., Montiel, J. M. M.,
and Tardos, J. D. (2021). ORB-SLAM3: An accu-
rate open-source library for visual, visual-inertial and
multi-map SLAM. IEEE Transactions on Robotics,
37(6):1874–1890.
Davison, A. J., Reid, I. D., Molton, N., and Stasse, O.
(2007). Monoslam: Real-time single camera slam.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 29:1052–1067.
DeTone, D., Malisiewicz, T., and Rabinovich, A. (2018).
Superpoint: Self-supervised interest point detection
and description. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR) Workshops.
Geiger, A., Lenz, P., Stiller, C., and Urtasun, R. (2013).
Vision meets robotics: The kitti dataset. International
Journal of Robotics Research (IJRR).
Geneva, P., Eckenhoff, K., Lee, W., Yang, Y., and Huang,
G. (2020). OpenVINS: A research platform for visual-
inertial estimation. In Proc. of the IEEE Interna-
tional Conference on Robotics and Automation, Paris,
France.
Han, X., Tao, Y., Li, Z., Cen, R., and Xue, F. (2020). Super-
pointvo: A lightweight visual odometry based on cnn
feature extraction. In 2020 5th International Confer-
ence on Automation, Control and Robotics Engineer-
ing (CACRE), pages 685–691.
Kang, R., Shi, J., Li, X., Liu, Y., and Liu, X. (2019). Df-
slam: A deep-learning enhanced visual slam system
based on deep local features. ArXiv, abs/1901.07223.
Li, D., Shi, X., Long, Q., Liu, S., Yang, W., Wang, F., Wei,
Q., and Qiao, F. (2020). DXSLAM: A robust and ef-
ficient visual SLAM system with deep features. arXiv
preprint arXiv:2008.05416.
Lin, J., Zheng, C., Xu, W., and Zhang, F. (2021). R2live: A
robust, real-time, lidar-inertial-visual tightly-coupled
state estimator and mapping. IEEE Robotics and Au-
tomation Letters, 6:7469–7476.
Lindenberger, P., Sarlin, P.-E., and Pollefeys, M. (2023).
LightGlue: Local Feature Matching at Light Speed.
In ICCV.
Liu, L. and Aitken, J. M. (2023). Hfnet-slam: An accu-
rate and real-time monocular slam system with deep
features. Sensors, 23(4).
Mourikis, A. I. and Roumeliotis, S. I. (2007). A multi-state
constraint kalman filter for vision-aided inertial navi-
gation. Proceedings 2007 IEEE International Confer-
ence on Robotics and Automation, pages 3565–3572.
Qin, T., Li, P., and Shen, S. (2018). Vins-mono: A robust
and versatile monocular visual-inertial state estimator.
IEEE Transactions on Robotics, 34(4):1004–1020.
Qin, T., Pan, J., Cao, S., and Shen, S. (2019). A general
optimization-based framework for local odometry es-
timation with multiple sensors.
Sarlin, P.-E., Cadena, C., Siegwart, R., and Dymczyk, M.
(2019). From coarse to fine: Robust hierarchical local-
ization at large scale. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recogni-
tion, pages 12716–12725.
Sturm, J., Engelhard, N., Endres, F., Burgard, W., and Cre-
mers, D. (2012). A benchmark for the evaluation of
rgb-d slam systems. In Proc. of the International Con-
ference on Intelligent Robot Systems (IROS).
Tang, J., Ericson, L., Folkesson, J., and Jensfelt, P. (2019).
Gcnv2: Efficient correspondence prediction for real-
time slam. IEEE Robotics and Automation Letters,
4(4):3505–3512.
Yi, K., Trulls, E., Lepetit, V., and Fua, P. (2016). Lift:
Learned invariant feature transform. volume 9910,
pages 467–483.
Zhang, Z. and Scaramuzza, D. (2018). A tutorial on quanti-
tative trajectory evaluation for visual(-inertial) odom-
etry. In 2018 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS), pages 7244–
7251.
Zhao, Z., Song, T., Xing, B., Lei, Y., and Wang, Z. (2022).
Pli-vins: Visual-inertial slam based on point-line fea-
ture fusion in indoor environment. Sensors, 22(14).
Practical Deep Feature-Based Visual-Inertial Odometry
247