performance. Note that, although the overall perfor-
mance of the evolutionary reinforcement algorithm is
superior to the random agent, it shows fewer success-
ful first picks. This indicates, that a higher improve-
ment can be achieved by allowing for more agents
and iterations (which requires a compensation of the
run-time by parallelization). The simple agent shows
a better successful pick performance than the ran-
dom agent over all noise levels. The best performing
method in this metric is our policy gradient agent.
We evaluated the performance of all three metrics
additionally on our industrial dataset, which showed
comparable results in all categories.
7 CONCLUSIONS
Industrial production processes have a continuously
increasing need for flexible and dynamic robotic so-
lutions. Today, this often requires long and tedious
configurations by well-trained engineers. Such pro-
cesses are both costly and time consuming. We target
this problem by learning optimized solutions, appli-
cable for a wide range of industrial tasks and environ-
ments. As an applicable precedent for such industrial
robotic engineering tasks, utilize the field of robotic
picking.
We demonstrated our systematic approach of for-
mulating procedural knowledge in building blocks
and creating standardized interfaces. We evaluated
specific configuration algorithms which were tasked
to choose such building blocks in an optimal order.
This was enabled by another standardized interface
for learning algorithms, namely the utilization of the
OpenAI Gym interface.
We showed, that an improvement of performance
with respect to a random or simple approach (as
could be performed by an engineered pipeline) can be
achieved for the task of 6D pose estimation for indus-
trial robotic picking for various scenarios (datasets).
From all evaluated configuration algorithms, the pol-
icy gradient approach achieved the most superior per-
formance. We successfully demonstrated the gen-
eral feasibility of our approach on the public bop-
benchmark.
We demonstrated the setup and use of config-
uration algorithms and formally modeled building
blocks, both utilizing standardized interfaces (the
OpenAI Gym interface and a framework for hierar-
chical modeling respectively). This standardization
enables the dynamic connection of a wide range of
formally modeled building blocks and configuration
algorithms. The more elements are available through
these frameworks, the more powerful our solution be-
comes. This will allow for a dynamic adaption to
a vast range of environments and objects with com-
plex shapes, surface reflection behaviors and textures.
Hence, one aspect of future work will lie in dimen-
sional scaling, such that our system holds numer-
ous elements (building blocks and configuration al-
gorithms). Other aspects of future work will cover
the improvement of computational time of different
algorithmic components or the increase of computa-
tional power (e.g. by parallelization, using services
such as server clusters), enabling the evaluation of
a wider range of learning algorithms, as well as the
transfer of our algorithmic ideas to different areas of
industrial robotic applications.
ACKNOWLEDGEMENTS
This work was carried out within the Siemens AI Lab
Residency Program.
REFERENCES
Arun, K. S., Huang, T. S., and Blostein, S. D. (1987). Least-
squares fitting of two 3-d point sets. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence,
PAMI-9(5):698–700.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J.,
Schulman, J., Tang, J., and Zaremba, W. (2016). Ope-
nai gym.
Chen, G., Han, K., Shi, B., Matsushita, Y., and Wong,
K.-Y. K. (2020). Deep photometric stereo for non-
lambertian surfaces.
Choi, C. and Christensen, H. I. (2016). Rgb-d object pose
estimation in unstructured environments. Robotics
Auton. Syst., 75:595–613.
Dietrich, V., Kast, B., Fiegert, M., Albrecht, S., and Beetz,
M. (2019). Automatic configuration of the structure
and parameterization of perception pipelines. In 2019
19th International Conference on Advanced Robotics
(ICAR), pages 312–319.
El-Shamouty, M., Kleeberger, K., Laemmle, A., and Hu-
ber, M. (01 Nov. 2019). Simulation-driven machine
learning for robotics and automation. tm - Technis-
ches Messen, 86(11):673 – 684.
Hailin Jin, Soatto, S., and Yezzi, A. J. (2003). Multi-
view stereo beyond lambert. In 2003 IEEE Computer
Society Conference on Computer Vision and Pattern
Recognition, 2003. Proceedings., volume 1, pages I–
I.
Hoda
ˇ
n, T., Sundermeyer, M., Drost, B., Labb
´
e, Y., Brach-
mann, E., Michel, F., Rother, C., and Matas, J. (2020).
BOP challenge 2020 on 6D object localization. Euro-
pean Conference on Computer Vision Workshops (EC-
CVW).
ICINCO 2021 - 18th International Conference on Informatics in Control, Automation and Robotics
58