Godard, C., Mac Aodha, O., and Brostow, G. J. (2017). Un-
supervised monocular depth estimation with left-right
consistency. In CVPR, pages 270–279.
He, K., Gkioxari, G., Doll
´
ar, P., and Girshick, R. (2017).
Mask r-cnn. In ICCV, pages 2961–2969.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In CVPR, pages
770–778.
Hejrati, M. and Ramanan, D. (2012). Analyzing 3d objects
in cluttered images. In NIPS, pages 593–601.
Ku, J., Mozifian, M., Lee, J., Harakeh, A., and Waslander,
S. L. (2018). Joint 3d proposal generation and object
detection from view aggregation. In IROS, pages 1–8.
IEEE.
Lang, A. H., Vora, S., Caesar, H., Zhou, L., Yang, J., and
Beijbom, O. (2019). Pointpillars: Fast encoders for
object detection from point clouds. In CVPR, pages
12697–12705.
Lin, T.-Y., Doll
´
ar, P., Girshick, R., He, K., Hariharan, B.,
and Belongie, S. (2017a). Feature pyramid networks
for object detection. In CVPR, pages 2117–2125.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll
´
ar, P.
(2017b). Focal loss for dense object detection. In
ICCV, pages 2980–2988.
Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018). Path
aggregation network for instance segmentation. In
CVPR, pages 8759–8768.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,
Fu, C.-Y., and Berg, A. C. (2016). Ssd: Single shot
multibox detector. In ECCV, pages 21–37. Springer.
Martinez, J., Hossain, R., Romero, J., and Little, J. J.
(2017). A simple yet effective baseline for 3d human
pose estimation. In ICCV, pages 2640–2649.
Mehta, D., Rhodin, H., Casas, D., Fua, P., Sotnychenko,
O., Xu, W., and Theobalt, C. (2017a). Monocular 3d
human pose estimation in the wild using improved cnn
supervision. In 3DV, pages 506–516. IEEE.
Mehta, D., Sridhar, S., Sotnychenko, O., Rhodin, H.,
Shafiei, M., Seidel, H.-P., Xu, W., Casas, D., and
Theobalt, C. (2017b). Vnect: Real-time 3d human
pose estimation with a single rgb camera. ACM ToG,
36(4):44.
Mousavian, A., Anguelov, D., Flynn, J., and Kosecka, J.
(2017). 3d bounding box estimation using deep learn-
ing and geometry. In CVPR, pages 7074–7082.
Qin, Z., Wang, J., and Lu, Y. (2019). Monogrnet: A ge-
ometric reasoning network for monocular 3d object
localization. In AAAI, volume 33, pages 8851–8858.
Ramakrishna, V., Kanade, T., and Sheikh, Y. (2012). Recon-
structing 3d human pose from 2d image landmarks. In
ECCV, pages 573–586. Springer.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.
(2016). You only look once: Unified, real-time ob-
ject detection. In CVPR, pages 779–788.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster
r-cnn: Towards real-time object detection with region
proposal networks. In NIPS, pages 91–99.
Roddick, T., Kendall, A., and Cipolla, R. (2018). Ortho-
graphic feature transform for monocular 3d object de-
tection. arXiv preprint arXiv:1811.08188.
Simonelli, A., Bul
`
o, S. R. R., Porzi, L., L
´
opez-Antequera,
M., and Kontschieder, P. (2019). Disentangling
monocular 3d object detection. arXiv preprint
arXiv:1905.12365.
Tompson, J. J., Jain, A., LeCun, Y., and Bregler, C. (2014).
Joint training of a convolutional network and a graph-
ical model for human pose estimation. In NIPS, pages
1799–1807.
Wu, J., Xue, T., Lim, J. J., Tian, Y., Tenenbaum, J. B.,
Torralba, A., and Freeman, W. T. (2016). Single im-
age 3d interpreter network. In ECCV, pages 365–382.
Springer.
Xiang, Y., Choi, W., Lin, Y., and Savarese, S. (2015). Data-
driven 3d voxel patterns for object category recogni-
tion. In ICCV.
Xu, B. and Chen, Z. (2018). Multi-level fusion based 3d
object detection from monocular images. In CVPR,
pages 2345–2353.
Yan, Y., Mao, Y., and Li, B. (2018). Second:
Sparsely embedded convolutional detection. Sensors,
18(10):3337.
Yang, B., Luo, W., and Urtasun, R. (2018). Pixor: Real-
time 3d object detection from point clouds. In CVPR,
pages 7652–7660.
Zhou, Y. and Tuzel, O. (2018). Voxelnet: End-to-end learn-
ing for point cloud based 3d object detection. In
CVPR, pages 4490–4499.
Monocular 3D Object Detection via Geometric Reasoning on Keypoints
659