ACKNOWLEDGMENT
This work was funded by grant BE 2556/16-1 (Re-
search Unit FOR 2535Anticipating Human Behavior)
of the German Research Foundation (DFG).
REFERENCES
Billings, G. and Johnson-Roberson, M. (2018). SilhoNet:
An RGB method for 3D object pose estimation and
grasp planning. arXiv preprint arXiv:1809.06893.
Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton,
J., and Rother, C. (2014). Learning 6D object pose
estimation using 3D object coordinates. In European
Conference on Computer Vision (ECCV), pages 536–
551. Springer.
Brachmann, E., Michel, F., Krull, A., Ying Yang, M.,
Gumhold, S., et al. (2016). Uncertainty-driven 6D
pose estimation of objects and scenes from a single
RGB image. In IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pages 3364–3372.
Do, T., Cai, M., Pham, T., and Reid, I. D. (2018). Deep-
6DPose: Recovering 6D object pose from a single
RGB image. In European Conference on Computer
Vision (ECCV).
Girshick, R. (2015). Fast R-CNN. In IEEE International
Conference on Computer Vision (ICCV), pages 1440–
1448.
Hinterstoisser, S., Cagniart, C., Ilic, S., Sturm, P., Navab,
N., Fua, P., and Lepetit, V. (2012a). Gradient re-
sponse maps for real-time detection of textureless ob-
jects. IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence, 34(5):876–888.
Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski,
G., Konolige, K., and Navab, N. (2012b). Model
based training, detection and pose estimation of
texture-less 3D objects in heavily cluttered scenes. In
Asian Conference on Computer Vision (ACCV), pages
548–562. Springer.
Iocchi, L., Holz, D., Ruiz-del Solar, J., Sugiura, K., and
Van Der Zant, T. (2015). Robocup@ home: Analysis
and results of evolving competitions for domestic and
service robots. Artificial Intelligence, 229:258–281.
Jafari, O. H., Mustikovela, S. K., Pertsch, K., Brachmann,
E., and Rother, C. (2017). iPose: instance-aware 6D
pose estimation of partly occluded objects. CoRR
abs/1712.01924.
Kandel, E. R., Schwartz, J. H., Jessell, T. M., of Biochem-
istry, D., Jessell, M. B. T., Siegelbaum, S., and Hud-
speth, A. (2000). Principles of neural science, vol-
ume 4. McGraw-hill New York.
Krull, A., Brachmann, E., Michel, F., Ying Yang, M.,
Gumhold, S., and Rother, C. (2015). Learning
analysis-by-synthesis for 6D pose estimation in RGB-
D images. In International Conference on Computer
Vision (ICCV), pages 954–962.
Lam, S. K., Pitrou, A., and Seibert, S. (2015). Numba: A
LLVM-based python JIT compiler. In Second Work-
shop on the LLVM Compiler Infrastructure in HPC.
ACM.
Li, Y., Wang, G., Ji, X., Xiang, Y., and Fox, D. (2018).
DeepIM: Deep iterative matching for 6D pose esti-
mation. In European Conference on Computer Vision
(ECCV).
Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll
´
ar, P.
(2017). Focal loss for dense object detection. In IEEE
International Conference on Computer Vision (ICCV),
pages 2980–2988.
Lowe, D. G. (2004). Distinctive image features from scale-
invariant keypoints. International Journal of Com-
puter Vision (IJCV), 60(2):91–110.
Markley, F. L., Cheng, Y., Crassidis, J. L., and Oshman, Y.
(2007). Averaging quaternions. Journal of Guidance,
Control, and Dynamics, 30(4):1193–1197.
Oberweger, M., Rad, M., and Lepetit, V. (2018). Mak-
ing deep heatmaps robust to partial occlusions for 3D
object pose estimation. In European Conference on
Computer Vision (ECCV), pages 125–141.
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E.,
DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and
Lerer, A. (2017). Automatic differentiation in pytorch.
In NIPS-W.
Peng, S., Liu, Y., Huang, Q., Zhou, X., and Bao, H. (2019).
Pvnet: Pixel-wise voting network for 6dof pose es-
timation. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages
4561–4570.
Rad, M. and Lepetit, V. (2017). BB8: A scalable, accu-
rate, robust to partial occlusion method for predict-
ing the 3D poses of challenging objects without using
depth. In International Conference on Computer Vi-
sion (ICCV).
Rad, M., Oberweger, M., and Lepetit, V. (2018). Feature
mapping for learning fast and accurate 3D pose infer-
ence from synthetic images. In Conference on Com-
puter Vision and Pattern Recognition (CVPR).
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.
(2016). You only look once: Unified, real-time ob-
ject detection. In Conference on Computer Vision and
Pattern Recognition (CVPR), pages 779–788.
Simonyan, K. and Zisserman, A. (2014). Very deep con-
volutional networks for large-scale image recognition.
arXiv preprint arXiv:1409.1556.
Tekin, B., Sinha, S. N., and Fua, P. (2018). Real-time seam-
less single shot 6D object pose prediction. In IEEE
Conference on Computer Vision and Pattern Recogni-
tion (CVPR).
Tremblay, J., To, T., Sundaralingam, B., Xiang, Y., Fox, D.,
and Birchfield, S. (2018). Deep object pose estimation
for semantic robotic grasping of household objects. In
Conference on Robot Learning (CoRL).
Wagner, D., Reitmayr, G., Mulloni, A., Drummond, T., and
Schmalstieg, D. (2008). Pose tracking from natural
features on mobile phones. In Int. Symp. on Mixed
and Augmented Reality (ISMAR), pages 125–134.
ConvPoseCNN: Dense Convolutional 6D Object Pose Estimation
171