3D Pose Estimation of Bin Picking Object using Deep Learning and
3D Matching
Junesuk Lee
1
, Sangseung Kang
2
and Soon-Yong Park
1
1
School of Computer Science and Engineering, Kyungpook National University, Daegu, South Korea
2
Intelligent Robotics Research Division, Electronics and Telecommunications Research Institute, Daejeon, South Korea
Keywords: Bin Picking, Pose Estimation, Object Detection, Deep Learning, 3D Matching.
Abstract: In this paper, we propose a method to estimate 3D pose information of an object in a randomly piled-up
environment by using image data obtained from an RGB-D camera. The proposed method consists of two
modules: object detection by deep learning, and pose estimation by Iterative Closest Point (ICP) algorithm.
In the first module, we propose an image encoding method to generate three channel images by integrating
depth and infrared images captured by the camera. We use these encoded images as both the input data and
training data set in a deep learning-based object detection step. Also, we propose a depth-based filtering
method to improve the precision of object detection and to reduce the number of false positives by pre-
processing input data. ICP-based 3D pose estimation is done in the second module, where we applied a
plane-fitting method to increase the accuracy of the estimated pose.
1 INTRODUCTION
With the rapid development of modern visual
recognition technology, many advanced systems
have been introduced to automate the works of
assembly lines in large industries. Such automation
is achieved by implementing high-tech robots,
mainly on the seek of increasing productivity and
efficiency. Consequently, the topic bin-picking has
started to attract the attention of many researchers.
In computer vision society, this topic is defined as
“the method of estimating the pose of randomly
piled-up objects, and sending pose data to robots to
act accordingly.”
From the past to the present, a large number of
bin-picking research works have been actively
conducted. Kuo et al. proposed an automatic system
for object detection and pose estimation using a
single depth map (Kuo et al., 2014). Object detection
is based on matching key-points (using RANSAC
algorithm (Schnable et al., 2007)) extracted from the
depth image, where pose estimation is achieved by
applying ICP algorithm (Besl and McKay, 1992).
Wu et al. introduced a method to estimate object
pose by using a CAD model, where they applied a
voxel grid filter (Skotheim et al, 2012) to reduce the
total computation time (Wu et al., 2015). Wada et al.
proposed a Convolution Neural Network-based
(CNN) object recognition and splitting method for
objects that are stacked in narrow spaces (Wada et
al., 2016). Radhakrishnamurthy et al. researched
about an automated stereo bin-picking system and
proposed the ATOT (Acclimatized Top Object
Threshold) algorithm to identify the top-most object
in a pile of occluded objects (Radhakrishnamurthy et
al., 2017). Instead of using a threshold value for
binarization (Otsu, 1979) through trial-and-error,
they advanced their algorithm to find the correct
threshold value automatically. He et al. proposed a
pipeline to reduce the number of false positives in
object detection (He et al, 2017). They used template
matching and clustering algorithms to detect objects,
and their point cloud processing algorithm to
estimate the object pose.
Even though these existing methods are capable
of obtaining promising results, most of them have
two common drawbacks: Unstable corresponding
point matchings in object detection, and insufficient
3D point data acquisition in ICP-based pose
estimation. In this paper, we address these
drawbacks and introduce an effective bin-picking
system by utilizing computer vision, and deep-
learning techniques. We divide our approach into
two modules, an object detection module, and a pose
estimation module. In the first module, we propose