a depth sensor like the Microsoft Kinect (Kerl et al.,
2013; Fioraio and Stefano, 2014) are used to refine
camera poses. While the scene can be reconstructed
completely with dense approaches, the accuracy of re-
constructed model is insufficient for 3D body scan-
ning and the run time is extremely expensive.
Using KinectFusion method (Newcombe et al.,
2011), it is possible to determine camera motion from
depth images of a Kinect sensor of Microsoft in real
time and simultaneously to create a 3D model of the
scene by integrating depth information into a trunca-
ted signed distance function (TSDF) volume. Using
the Iterative Closest Point (ICP) method, correspon-
dences in 3D point clouds are found and used for
camera motion tracking. In contrast to other 3D re-
construction methods that track camera movement
from frame to frame, KinectFusion tracks camera mo-
tion from frame to model increasing the reliability of
tracking, since depth noise is reduced while recon-
structing model by averaging of all previous depth
frames. In KinectFusion, finding correspondences,
estimating camera motion and generating 3D mo-
del can be parallelized efficiently on GPU hardware,
which makes it real-time adaptable.
Generally, KinectFusion has two main drawbacks.
The first one is that the reconstruction fails if the scene
has no distinctive shapes, for example, when the ca-
mera moves parallel to a plane or rotates around cylin-
drical or spherical surface. In such cases, KinectFu-
sion can not track the camera motion correctly. This
problem can be faced by human body scanning appli-
cations for medical purpose, where many parts of na-
ked human body are approximately cylindrical such
as legs and the torso. The second drawback is that the
depth data provided by the Kinect sensor involve er-
ror in a range of ±5mm. In camera motion tracking
this error causes small local drifts that are accumu-
lated over time. This in turn leads to unacceptable
deformations in the resulting 3D model.
Recently, many improvements of KinectFusion
method have been proposed. (Jia et al., 2016)
improved KinectFusion by adding graph based-
optimization to achieve rapid optimization of error
accumulation. (Afzal et al., 2014) proposed a modifi-
cation of KinectFusion to enhance 3D reconstruction
of non-rigidly deforming objects. (Kainz et al., 2012)
improved the KinectFusion algorithm to allow for
3D reconstruction from multiple sensors simultane-
ously. (Whelan et al., 2012) extended the KinectFu-
sion method by visual odometry to avoid camera mo-
tion tracking failure at regions of low geometric fea-
tures. (Pagliari et al., 2014) proposed an improvement
of KinectFusion by executing the scanning process
twice. In the first run, an average circular trajectory
of the camera is estimated. In the second run, the tra-
jectory is used for depth data correction.
In this paper, we present a new method to opti-
mize KinectFusion for a 3D body scanner. The idea
is based on the assumption that, for most 3D scan-
ning applications, the camera rotates about the object
to be scanned or the object rotates in front of the ca-
mera. In both cases, the rotation axis and the rota-
tion center remain unchanged while scanning. The-
refore, the camera motion tracking can be simplified
by estimating a rotation angle instead of estimating a
6 DoF transformation. The rotation axis and center
are determined accurately by averaging of depth er-
rors. Performing camera motion tracking using our
method improves the quality of reconstructed 3D mo-
del for two reasons. For one, only angle errors are
accumulated, instead of accumulating transformation
drifts. For another, reducing correspondence search
to only one dimension removes many outliers. The
rest of paper is organized as follows. In Section 2 the
KinectFusion method is described briefly. In Section
3, our method is presented in detail. Experimental
results are evaluated in Section 4. Finally, the paper
concludes in Section 5.
2 KINECT FUSION ALGORITHM
KinectFusion (Newcombe et al., 2011; Izadi et al.,
2011) is a powerful 3D reconstruction technique ba-
sed on Microsoft Kinect Sensor. It allows the 3D
reconstruction of an indoor scene through moving a
kinect camera around in real-time using commodity
graphics hardware. It integrates and merges consecu-
tive depth images provided by the Kinect depth sen-
sor in a 3D volumetric data structure, assuming the
relative position between the sensor and the object is
only slightly changed over time. As described in (Ne-
wcombe et al., 2011), KinectFusion consists of four
major stages: surface measurements, surface recon-
struction update, surface prediction and sensor pose
estimation. In the surface measurement stage the ver-
tex and normal maps are computed from the depth
image and the camera’s intrinsic parameters. Be-
fore computing vertex and normal maps, depth ima-
ges are filtered using bilateral filter to reduce depth
noise while keeping depth edges unblurred as much
as possible. In the surface reconstruction update,
the surface measurements computed in the previous
stage are transformed according to a global coordi-
nate frame and integrated into a 3D volumetric data
structure called Truncated Signed Distance Function
(TSDF). For sensor pose estimation, it is assumed
that only a small camera motion occurs from one
Optimized KinectFusion Algorithm for 3D Scanning Applications
51