we use the templates for all locomotion speeds avail-
able for a given subject, keep the head motion trans-
lation to the natural scale τ = 1 and rotation rate to
ρ = 2. The speeds used are 2, 3, 4, 6, 8, 10 and 12
km/h, in the same four maps.
We show the performances of DSO, ORB-
SLAM3 and DeepV2D in Figure 12. DSO lost track-
ing in many cases at higher speeds, which further
highlights its sensitivity to fast camera motion. Re-
call that all subjects started running (resulting in a
greater amplitude of headset motion) at the limit of
8 km/h, which is close to the average speed at which
DSO starts to fail in our sequences (7.5 km/h). The
performance of ORB-SLAM3 corresponds to previ-
ous observations: increased viewpoint changes lead
to a deterioration of the performance.
6 CONCLUSION
With our headset motion model and synthetic dataset
generator, we performed experiments on image se-
quences exhibiting various aspects of headset motion
parameterized to test the limits of monocular visual
odometry methods. Our three main findings are:
1. The main challenge for monocular visual odome-
try on headsets is rotation. Even slight rotations
can have important negative effects on the per-
formance of the three classes of algorithms that
we have studied. The head translation induced by
walking does not significantly deteriorate the per-
formance of monocular visual odometry.
2. Among the three methods that we have evalu-
ated, the feature-based ORB-SLAM3 is by far the
most robust. Deep learning-based methods such
as DeepV2D have a strong potential, for example
with their ability to estimate dense depth maps,
but it is critical to implement systems to ensure
that the properties of conventional features such as
invariance to translation or rotation, high localiza-
tion accuracy or robustness to image degradations
(noise, motion blur, etc.) are conserved.
3. Developing a robust measure of uncertainty for
the predicted poses is critical for deep learning-
based methods, since by default they will always
produce a prediction and not fall into a “tracking
lost” state, thereby failing silently.
REFERENCES
Agrawal, P., Carreira, J., and Malik, J. (2015). Learning to
see by moving. CoRR, abs/1505.01596.
Burri, M., Nikolic, J., Gohl, P., Schneider, T., Rehder, J.,
Omari, S., Achtelik, M. W., and Siegwart, R. (2016).
The euroc micro aerial vehicle datasets. The Interna-
tional Journal of Robotics Research.
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., and
Koltun, V. (2017). CARLA: An open urban driving
simulator. In Proceedings of the 1st Annual Confer-
ence on Robot Learning, pages 1–16.
Engel, J., Koltun, V., and Cremers, D. (2018). Direct sparse
odometry. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 40:611–625.
Epic Games (2020). Unreal engine.
Gaidon, A., L
´
opez, A. M., and Perronnin, F. (2018). The
reasonable effectiveness of synthetic visual data. In-
ternational Journal of Computer Vision.
Geiger, A., Lenz, P., and Urtasun, R. (2012). Are we ready
for autonomous driving? the kitti vision benchmark
suite. In Conference on Computer Vision and Pattern
Recognition (CVPR).
Godard, C., Mac Aodha, O., Firman, M., and Brostow, G. J.
(2019). Digging into self-supervised monocular depth
prediction.
Hirasaki, E., Moore, S., Raphan, T., and Cohen, B. (1999).
Effects of walking velocity on vertical head and eye
movements during locomotion. Experimental brain
research, 127:117–30.
Jamil, Z., Gulraiz, A., Qureshi, W. S., and Lin, C. (2019).
Human head motion modeling using monocular cues
for interactive robotic applications. In 2019 Interna-
tional Conference on Robotics and Automation in In-
dustry (ICRAI), pages 1–5.
Kristyanto, B., Nugraha, B. B., Pamosoaji, A. K., and Nu-
groho, K. A. (2015). Head and neck movement: Simu-
lation and kinematics analysis. Procedia Manufactur-
ing, 4:359 – 372. Industrial Engineering and Service
Science 2015, IESS 2015.
Mur-Artal, R. and Tard
´
os, J. D. (2016). ORB-SLAM2: an
open-source SLAM system for monocular, stereo and
RGB-D cameras. CoRR, abs/1610.06475.
Qiu, W., Zhong, F., Zhang, Y., Qiao, S., Xiao, Z., Kim, T. S.,
Wang, Y., and Yuille, A. (2017). Unrealcv: Virtual
worlds for computer vision. ACM Multimedia Open
Source Software Competition.
Shah, S., Dey, D., Lovett, C., and Kapoor, A. (2017). Air-
sim: High-fidelity visual and physical simulation for
autonomous vehicles. In Field and Service Robotics.
Stewart, C. V. (1999). Robust parameter estimation in com-
puter vision. SIAM Rev., 41(3):513–537.
Teed, Z. and Deng, J. (2020). Deepv2d: Video to depth with
differentiable structure from motion. In International
Conference on Learning Representations (ICLR).
Yang, N., von Stumberg, L., Wang, R., and Cremers, D.
(2020). D3vo: Deep depth, deep pose and deep uncer-
tainty for monocular visual odometry. In IEEE CVPR.
Zhan, H., Weerasekera, C. S., Bian, J., and Reid, I. (2020).
Visual odometry revisited: What should be learnt?
2020 IEEE International Conference on Robotics and
Automation (ICRA), pages 4203–4210.
Zhou, T., Brown, M., Snavely, N., and Lowe, D. G. (2017).
Unsupervised learning of depth and ego-motion from
video. In CVPR.
Evaluating the Impact of Head Motion on Monocular Visual Odometry with Synthetic Data
843