Figure 6: Plot illustrating how number of objects in scene
do not affect the time-elapsed in our optimization formula-
tion from Sec. 3.3.
single-view metrology cues do not hold, such as on
extremely graded/steep roads. Currently, BirdSLAM
accounts for the case where ego-motion initialization
from off-the-shelf SLAM systems like ORB can be
highly erroneous by constraining it with the stationary
cues from the environment. Another potentially inter-
esting work could be to improve the fault-tolerance
of the BirdSLAM system by taking into account the
case when both ego-motion initialization from off-
the-shelf SLAM systems as well as stationary points
in the environment are highly erroneous.
REFERENCES
Agarwal, S., Mierle, K., and Others. Ceres solver. http:
//ceres-solver.org.
Ansari, J. A., Sharma, S., Majumdar, A., Murthy, J. K., and
Krishna, K. M. (2018). The earth ain’t flat: Monocular
reconstruction of vehicles on steep and graded roads
from a moving camera. In IROS.
Bailey, T. and Durrant-Whyte, H. (2006). Simultaneous lo-
calization and mapping (slam): Part ii. IEEE robotics
& automation magazine, 13(3):108–117.
Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., and
Urtasun, R. (2016). Monocular 3d object detection
for autonomous driving. In CVPR.
Costeira, J. and Kanade, T. (1995). A multi-body factoriza-
tion method for motion analysis. In ICCV.
Davison, A. J., Reid, I. D., Molton, N., and Stasse, O.
(2007). Monoslam: Real-time single camera slam.
IEEE Transactions on Pattern Analysis and Machine
Intelligence.
Dellaert, F. (2012). Factor graphs and gtsam: A hands-
on introduction. Technical report, Georgia Institute
of Technology.
Durrant-Whyte, H. and Bailey, T. (2006). Simultaneous lo-
calization and mapping: part i. IEEE robotics & au-
tomation magazine, 13(2):99–110.
Engel, J., Sch
¨
ops, T., and Cremers, D. (2014). LSD-SLAM:
Large-scale direct monocular SLAM. In ECCV.
Fitzgibbon, A. W. and Zisserman, A. (2000). Multibody
structure and motion: 3-d reconstruction of indepen-
dently moving objects. In ECCV.
Geiger, A., Lenz, P., Stiller, C., and Urtasun, R. (2013).
Vision meets robotics: The kitti dataset. IJRR.
Godard, C., Mac Aodha, O., Firman, M., and Brostow, G.
(2018). Digging into self-supervised monocular depth
estimation. arXiv preprint.
Grisetti, G., K
¨
ummerle, R., Strasdat, H., and Konolige, K.
(2011). g2o: a general framework for (hyper) graph
optimization. In ICRA.
Han, M. and Kanade, T. (2001). Multiple motion scene
reconstruction from uncalibrated views. In ICCV.
Klein, G. and Murray, D. (2009). Parallel tracking and map-
ping on a camera phone. In 2009 8th IEEE Interna-
tional Symposium on Mixed and Augmented Reality,
pages 83–86. IEEE.
Kundu, A., Krishna, K. M., and Jawahar, C. (2011). Real-
time multibody visual slam with a smoothly moving
monocular camera. In ICCV.
Li, P., Qin, T., et al. (2018). Stereo vision-based seman-
tic 3d object and ego-motion tracking for autonomous
driving. In ECCV.
Machline, M., Zelnik-Manor, L., and Irani, M. (2002).
Multi-body segmentation: Revisiting motion consis-
tency. In ECCV Workshop on Vision and Modeling of
Dynamic Scenes.
Mur-Artal, R., Montiel, J. M. M., and Tardos, J. D. (2015).
Orb-slam: a versatile and accurate monocular slam
system. IEEE transactions on robotics, 31(5):1147–
1163.
Mur-Artal, R. and Tard
´
os, J. D. (2017). Orb-slam2: An
open-source slam system for monocular, stereo, and
rgb-d cameras. IEEE Transactions on Robotics.
Murthy, J. K., Krishna, G. S., Chhaya, F., and Krishna,
K. M. (2017a). Reconstructing vehicles from a sin-
gle image: Shape priors for road scene understanding.
In ICRA.
Murthy, J. K., Sharma, S., and Krishna, K. M. (2017b).
Shape priors for real-time monocular object localiza-
tion in dynamic environments. In IROS.
Nair, G. B., Daga, S., Sajnani, R., Ramesh, A., Ansari, J. A.,
and Krishna, K. M. (2020). Multi-object monocular
slam for dynamic environments. arXiv preprint.
Namdev, R., Krishna, K. M., and Jawahar, C. V. (2013).
Multibody vslam with relative scale solution for curvi-
linear motion reconstruction. In ICRA.
Paszke, A., Chaurasia, A., Kim, S., and Culurciello, E.
(2016). Enet: A deep neural network architecture for
real-time semantic segmentation. arXiv preprint.
Qi, C. R., Liu, W., Wu, C., Su, H., and Guibas, L. J. (2018).
Frustum pointnets for 3d object detection from rgb-
d data. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pages 918–
927.
Ranftl, R., Vineet, V., Chen, Q., and Koltun, V. (2016).
Dense monocular depth estimation in complex dy-
namic scenes. In CVPR.
Reddy, N. D., Abbasnejad, I., Reddy, S., Mondal, A. K.,
and Devalla, V. (2016). Incremental real-time multi-
body vslam with trajectory optimization using stereo
camera. In IROS.
Roddick, T., Kendall, A., and Cipolla, R. (2019). Ortho-
graphic feature transform for monocular 3d object de-
tection. British Machine Vision Conference.
VISAPP 2021 - 16th International Conference on Computer Vision Theory and Applications
720