Installation Procedure: With knowledge of the dif-
ferent parts of the system, a program would be able
to guide the user through an installation. The user
should select various aspects of the pose estimation
in an intuitive way, potentially selecting on the screen
where the object is to be detected, whether a detec-
tion is correct and so forth. The system should also be
flexible enough to allow for easily adapting to small
changes in objects variants instead of starting the pro-
cess from scratch each time. Ideally, a database would
store already found solutions intelligently and would
match an existing task to earlier found solutions. This
gives a good starting point for a given problem.
5 CONCLUSION
We have shown an analysis of the current situation
of the applications of pose estimation in industry and
academia. Our results show that the average setup
time for pose estimation systems in the industry is 1–
2 weeks, and 2–3 weeks in academia. Despite many
improvements, there still exist many obstacles before
pose estimation can be installed with a reasonable
setup time. This currently limits the application of
vision systems to large production sizes.
The results indicate that in addition to the algo-
rithmic problems of pose estimation, the actual imple-
mentation of the system is still a challenge. In section
4, we proposed a scaling of the usability of vision al-
gorithms. These levels indicate how much time and
expertise it takes to set up a pose estimation system.
At the top level, the system can be quickly set up by a
non-expert, which is currently not possible.
To achieve this level of usability, the parameters of
vision algorithms needs to be condensed, both to re-
duce the number of parameters, but also to make them
more intuitive. There is also a need for a more effi-
cient framework for setting up pose estimation sys-
tems, guiding the user through the complete setup
from image acquisition to the final position. Here
the simulation of sensors and systems before setting
up costly hardware solutions could play an important
role.
This paper shows that when evaluating the quality
of pose estimation systems, the aspect of setup time
might be an important criterion besides the percentage
of recognized objects and the pose accuracy.
ACKNOWLEDGEMENTS
This work has been supported by the H2020 project
ReconCell (H2020-FoF-680431).
REFERENCES
Adaptive-Vision. Adaptive vision. https://www.adaptive-
vision.com/en/. Accessed: 2017-08-17.
Alahi, A., Ortiz, R., and Vandergheynst, P. (2012). Freak:
Fast retina keypoint. In Computer Vision and Pat-
tern Recognition (CVPR), 2012 IEEE Conference on,
pages 510–517. Ieee.
Bay, H., Ess, A., Tuytelaars, T., and Van Gool, L. (2008).
Speeded-up robust features (surf). Computer vision
and image understanding, 110(3):346–359.
Buch, A. G., Petersen, H. G., and Kr
¨
uger, N. (2016). Lo-
cal shape feature fusion for improved matching, pose
estimation and 3d object recognition. SpringerPlus,
5(1):297.
Cognex. Cognex vidi suite. http://www.cognex.
com/products/deep-learning/dl-
software/?id=19178&langtype=1033. Accessed:
2017-09-15.
Collet, A., Martinez, M., and Srinivasa, S. S. (2011). The
moped framework: Object recognition and pose esti-
mation for manipulation. The International Journal of
Robotics Research, 30(10):1284–1306.
Drost, B., Ulrich, M., Navab, N., and Ilic, S. (2010). Model
globally, match locally: Efficient and robust 3d object
recognition. In Computer Vision and Pattern Recogni-
tion (CVPR), 2010 IEEE Conference on, pages 998–
1005. Ieee.
Fischler, M. A. and Bolles, R. C. (1981). Random sample
consensus: a paradigm for model fitting with appli-
cations to image analysis and automated cartography.
Communications of the ACM, 24(6):381–395.
Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014).
Rich feature hierarchies for accurate object detec-
tion and semantic segmentation. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 580–587.
Gossow, D., Weikersdorfer, D., and Beetz, M. (2012).
Distinctive texture features from perspective-invariant
keypoints. In Pattern Recognition (ICPR), 2012
21st International Conference on, pages 2764–2767.
IEEE.
Guo, Y., Bennamoun, M., Sohel, F., Lu, M., and Wan,
J. (2014). 3d object recognition in cluttered scenes
with local surface features: A survey. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence,
36(11):2270–2287.
Gupta, S., Arbel
´
aez, P., Girshick, R., and Malik, J. (2015).
Aligning 3d models to rgb-d images of cluttered
scenes. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages
4731–4740.
Hinterstoisser, S., Cagniart, C., Ilic, S., Sturm, P., Navab,
N., Fua, P., and Lepetit, V. (2012). Gradient response
maps for real-time detection of textureless objects.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 34(5):876–888.
Itseez (2015). Open source computer vision library.
https://github.com/itseez/opencv.
VISAPP 2018 - International Conference on Computer Vision Theory and Applications
204