observations (i.e. the number of queries). As Fig. 2
shows, increasing the number of queries results in the
decrease of average pose error from 67.18º to 44.78º.
As a reference, we computed the average error of the
VGG16 network illustrated by yellow in Fig. 2. These
values are ranging from 67.18º to 63.59º significantly
higher than the VGG+HMM technique. To highlight
the information added by the orientation sensor we
made tests where the transition probabilities were set
constant. This is named VGG+mHMM and shown by
green dotted lines in Fig. 2. There is no significant
difference between VGG16 and VGG+mHMM as
expected.
Interestingly, regarding the average object-level
recognition rate based on 4 queries, the VGG+HMM
method achieved 99.7% and the VGG16 resulted in
99.1%, which is a small difference thanks to the good
general recognition abilities of VGG16.
Figure 3: Average orientation error for each object, in case
of two queries, for VGG16 only (yellow), VGG+HMM
(blue), and VGG+mHMM with constant transition
probabilities (green dotted).
5 CONCLUSIONS
In our paper we discussed a probabilistic approach to
enhance the pose estimation capabilities of simple
classification networks such as VGG16. We utilized
the orientation sensor to estimate the transition
probabilities between poses thus HMMs could be
used to estimate the most probable pose sequences.
The improvement over the applied CNN is significant
as shown by experiments using 40 randomly chosen
objects of the COIL-100 dataset. In future we plan the
investigate how to fuse the model with more
sophisticated CNNs such as PoseCNN or SSD-6D.
ACKNOWLEDGEMENTS
We acknowledge the financial support of the
Széchenyi 2020 program under the project EFOP-
3.6.1-16-2016-00015, and the Hungarian Research Fund,
grant OTKA K 120367. We are grateful to the NVIDIA
Corporation for supporting our research with GPUs
obtained by the NVIDIA GPU Grant Program.
REFERENCES
Collet, A., Martinez, M., & Srinivasa, S. S. (2011). The
MOPED framework: Object recognition and pose
estimation for manipulation. International Journal of
Robotics Research, 30(10), 1284–1306.
Correll, N., Bekris, K. E., Berenson, D., Brock, O., Causo,
A., Hauser, K., … Wurman, P. R. (2018). Analysis and
observations from the first Amazon picking challenge.
IEEE Transactions on Automation Science and
Engineering, 15(1), 172–188.
Czúni, L., & Rashad, M. (2017). The Fusion of Optical and
Orientation Information in a Markovian Framework for
3D Object Retrieval. In International Conference on
Image Analysis and Processing, pp. 26–36.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014).
Rich feature hierarchies for accurate object detection
and semantic segmentation. Proceedings of the IEEE
Computer Society Conference on Computer Vision and
Pattern Recognition, 580–587.
He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017).
Mask R-CNN. Proceedings of the IEEE International
Conference on Computer Vision, 2980–2988.
Hinterstoisser, S., Holzer, S., Cagniart, C., Ilic, S.,
Konolige, K., Navab, N., & Lepetit, V. (2011).
Multimodal templates for real-time detection of
texture-less objects in heavily cluttered scenes.
Proceedings of the IEEE International Conference on
Computer Vision, 858–865.
Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski,
G., Konolige, K., & Navab, N. (2013). Model based
training, detection and pose estimation of texture-less
3D objects in heavily cluttered scenes. Lecture Notes in
Computer Science, 7724 LNCS(PART 1), 548–562.
Hodaň, T., Haluza, P., Obdrzalek, Š., Matas, J., Lourakis,
M., & Zabulis, X. (2017). T-LESS: An RGB-D dataset
for 6D pose estimation of texture-less objects.
Proceedings - 2017 IEEE Winter Conference on
Applications of Computer Vision, WACV 2017, 880–
888.
Kehl, W., Manhardt, F., Tombari, F., Ilic, S., & Navab, N.
(2017). SSD-6D: Making RGB-Based 3D Detection
and 6D Pose Estimation Great Again. Proceedings of
the IEEE International Conference on Computer
Vision, 1530–1538.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,
Fu, C. Y., & Berg, A. C. (2016). SSD: Single shot
multibox detector. Lecture Notes in Computer Science
(Including Subseries Lecture Notes in Artificial
Intelligence and Lecture Notes in Bioinformatics), 9905
LNCS, 21–37.
Nöll, T., Pagani, A., & Stricker, D. (2011). Markerless
camera pose estimation - An overview. OpenAccess