some gaps and absences in the depth map. This
situation resulted in object boundaries not being
properly determined. To solve this problem, they
developed a three-way (trilateral) filter that includes
distance, RGB values, and boundary information.
They apply warping, error cleaning, and affine
mapping to eliminate holes caused by depth sensor
data acquisition. After this step, they used the
probabilistic boundary detector (Pbd) component. Pb
(probability of boundary) is a method that uses only
color information and extracts the boundary-priority
map of the object according to color differences. By
adding depth attribute as a parameter, it calculates
the probability that a pixel is at the boundary and its
orientation. In the last step, they segmented the
object using graph-cut and separated it from the
scene. During the evaluation of their work, they
measured the contribution of the depth information
that they added to the basic methods. They have
increased the performance of the segmentation.
Mishra (Mishra and Aloimonos, 2012, Mishra et
al., 2012) developed a segmentation strategy that
separates simple objects from the scene by using
color, texture and depth knowledge. They defined
the simple object as a compact zone surrounded by
depth and contact boundary. They improved on the
fixation-based segmentation method (Mishra et al.,
2009), which they found the most suitable closed
contour around the given point in the scene. In their
recent work, they have proposed a fixing strategy
that selects points from within an object, as well as a
method that allows the selection of closed curves
only for objects. The visual cues (color, depth)
around each edge pixel indicate whether that pixel is
at the object boundary. This information is kept in
the probabilistic boundary edge map. An edge finder
(Martin et al., 2004) with local brightness, texture
and color cues, was used when calculating this map.
If the segmentation process is examined
sequentially, it is seen that first a probabilistic
boundary edge map is obtained. By selecting the
most probable edge pixels in this map, the object
side is determined. By considering the object sides
of the boundaries in this map, fixation points are
selected on that side. Closed curves are obtained
from the determined points by the fixed-based
segmentation method and the resultant object
segments are obtained. They have studied the
experimental results in a comprehensive dataset (Lai
et al., 2011), quantitatively and qualitatively. During
quantitative analysis, they measured the
segmentation accuracy of segmented objects as a
single closed region and achieved a success rate of
over 90%.
Richtsfeld and colleagues have performed
numerous studies (Richtsfeld et al., 2012a),
(Richtsfeld et al., 2014) for object segmentation on
RGB-D data. In their study on the implementation of
Gestalt principles for object segmentation
(Richtsfeld et al., 2012b), they have defined the
relationships between surface patches on a 3D image
based on Gestalt principles in order to build a
learning-based structure. The scene structure is
rapidly abstracted by plane fitting on a 3D point
cloud. The fast and commonly used RANSAC
method is used for this purpose. But for curved
objects, it is necessary to soften and bend the
surfaces. At this point, they used the mathematical
construct called NURBS (non-uniform rational B-
splines), which is widely used in the field of
graphics. This structure enables to display all kinds
of conic sections (spherical, cylindrical, ellipsoid,
etc.). The plane placed on the point cloud is matched
to the cloud by minimizing the length of the nearest
point. On the geometric structure, a final model
selection is applied to determine the surface patches
to be used in object segmentation.
In their article describing perceptual grouping for
object segmentation on RGBD data (Richtsfeld et
al., 2014), they presented a comprehensive study
combining previous works. Once the surfaces have
been determined, the relationships between adjacent
surfaces, which are based on the Gestalt principles
mentioned above, are calculated to group them.
These features are surface color, the similarity of
relative size and texture amount, color similarity at
3D surface boundaries, average curvature and
curvature variance at 3D surface boundaries, average
depth and variance at 2D surface boundaries. Two
attribute vectors have been defined from the
previous relations: neighbouring and non-
neighbouring. These two vectors were used for
training SVM on hand-labeled RGB-D image sets.
The surface pairs of the same object are selected as
positive samples, while the surface samples of two
different objects or the object-background pairs are
negative samples. In the decision making, SVM
gives the probability value of each vector, as well as
producing binary results. They defined a graph
where surface patches represent the nodes and
probability values obtained from SVM represents
the edges. Finally using the graph-cut, object
segmentation was performed. As a result of the
detailed analysis of the segmented objects in
different complex scenes separated by different
categories, they achieved a success rate of over 90%.
They stated that on the same data sets, Mishra et al.
remained around 65%.
VISAPP 2018 - International Conference on Computer Vision Theory and Applications
380