2 RELATED WORKS
In order to upsample the depth data captured by a
ToF depth camera, several approaches have been pro-
posed which can be divided into two groups. The first
one deals with the instability of depth data provided
by the RGB-D camera by using several depth images
for reducing variations over each pixel depth value
(Camplani and Salgado, 2012) (Dolson et al., 2010).
However, these methods can not cope with numerous
movement of objects in captured scenes or require the
camera to be stationary.
The second group applies upsampling methods on
only one pair of depth and color images for inter-
polating depth data while reducing structural noise.
Among these methods, Joint Bilateral Upsampling
(Kopf et al., 2007) and the interpolation method
based on the optimization of a Markov Random Field
(Diebel and Thrun, 2005) are the most popular ap-
proaches. They exploit information from RGB im-
ages to improve the resolution of depth data under the
assumption that depth discontinuities are often related
to color changes in the corresponding regions in the
color image. However the depth data captured around
object boundaries is not reliable and heavily contam-
inated with noise.
(Chan et al., 2008) solved this problem by intro-
ducing a noise-aware bilateral filter, which blends the
results of standard upsampling and joint bilateral fil-
tering depending on the depth map’s regional struc-
ture. The drawback of this method is it can some-
times smooth the fine details of depth maps. (Park
et al., 2011) proposed a high quality depth map up-
sampling method. Since it extends nonlocal means
filtering with an additional edge weighting scheme, it
requires a lot of computational time.
(Matsuo and Aoki, 2013) presented a depth im-
age interpolation method by estimating tangent planes
based on superpixel segmentation. In this method,
depth interpolation is achieved within each region by
using Joint Bilateral Upsampling. (Soh et al., 2012)
also use superpixel segmentation for detecting piece-
wise planar surfaces. In order to upsample the low-
resolution depth data, they apply plane based interpo-
lation and Markov Random Field based optimization
to locally detected planar areas. These approaches can
adapt the processing according to local object shapes
based on the information form each segmented re-
gion.
Inspired from these approaches, we also use su-
perpixel segmentation for detecting locally planar
surfaces and exploit the structure of detected areas.
Compared with other superpixel based methods, our
method can relatively smooth depth map in real-time.
3 PROPOSED METHOD
Figure 1: Left: SoftKinetic DepthSense DS311. Center:
captured color image. Right: captured depth image.
As Figure 1 shows, we use SoftKinetic DepthSense
DS311 for our system, which can capture 640 × 480
color images and 160 × 120 depth maps at 25-60fps.
Before applying our method, we project each 3D
data from depth map onto its corresponding color im-
age by using rigid transformation obtained from cam-
era calibration between color camera and depth sen-
sor. In our experiment, we use the extrinsic parame-
ters given from a DepthSense DS311. After this pro-
cess, we can obtain RGB-D data in color image coor-
dinate frame.
However, it is still low resolution and includes
much noise and occluded depth data around the ob-
ject boundaries due to slight differences depth cam-
era and color camera positions. Therefore, we first
apply depth variance based joint bilateral upsampling
to the RGB-D data and generate highly smoothed and
interpolated depth map. Next, we calculate the nor-
mal map by applying the method proposed by (Holzer
et al., 2012). By using this normal map, we apply
normal-adaptivesuperpixel segmentation for dividing
the 3D depth map into clusters so that the 3D points in
each cluster make up a planar structure. For merging
clusters which are located on the same plane, graph
component labeling is utilized to segment image by
comparing the normals of each cluster. The plane
equation of each cluster is computed from the nor-
mal and center point associated with the cluster. After
that, we evaluate the reliability of each plane and dis-
criminate between planar cluster and curved cluster
and apply plane fitting and optimization to the depth
map. As a result, our method can generate smooth
depth maps which still contain complex shape infor-
mation.
3.1 Depth Variance Based Joint
Bilateral Upsampling
Joint Bilateral Upsampling(JBU) is a modification of
the bilateral filter, an edge-preserving smoothing filter
for intensity images. The smoothed depth value D
f
p
at the pixel p is computed from its neighboring pixels
PlaneFittingandDepthVarianceBasedUpsamplingforNoisyDepthMapfrom3D-ToFCamerasinReal-time
151