et al., 2013; Chajdas et al., 2014; Izadi et al., 2011).
However, this technique does not consider the sam-
ple sparsity and scale of the measurement errors. In
cases when a wall is far away from the camera, its
sample distribution is very sparse. In such a case, two
neighbouring pixels in a range camera represent sam-
ples which are far away from each other (see point
cloud in Figure 7c). This leads to holes in a volume
grid and in the reconstructed surface, which are not
recovered by the algorithm (Figure 7e). Thus, con-
sistent 3D modelling of surfaces is not possible when
the standard technique is applied.
In order to prevent this, the technique from Cur-
less and Levoy is extended to cones. The width of the
cone is small close to the camera and large when the
distance is increased (Figure 7d). Furthermore, the
weights w
i
are extended to
w
f ull
i
= w
sample
i
· w
cone
i
w
cone
i
= e
−λ
c
r
c
(3)
with r
c
as the distance of the i-th voxel orthogonal
to the ray. Similar to (2) λ
c
is set to give 0.1 at the
boundary of the cone, which is however not a critical
parameter.
This extension is the main difference of the pre-
sented 3D modelling approach compared to the state
of the art methods. Figures 7e-f show the effect of
cone fusion on 3D samples acquired from a road
surface. While the application of the standard fu-
sion technique is likely to produce holes caused by
sparse samples and noise, the presented approach still
achieves consistent surfaces.
Note, that the recursive nature of the update pro-
cess has a linear time complexity and is not affected
by the size of the 3D model. Moreover, the voxel up-
dates (1) inferred by each sample can be performed in
parallel which further increases the computation effi-
ciency of the technique.
Section 6 discusses the application of the tech-
nique on realistic datasets from a multi view high
resolution UAV-set-up and a mobile stereo system
mounted on a vehicle.
6 EXPERIMENTS
The cone based 3D fusion technique with hashed oc-
trees is demonstrated on two different applications.
In the first application a UAV with a high resolution
camera flew around a chapel. The images have been
processed by a multi view software similar to (Wu,
2011). In principle, each image is compared with all
other images and similar point features (SIFT (Lowe,
2004)) are matched. After estimating the trajectory
with a bundle block adjustment (Moulon et al., 2013)
technique, the images have been processed by a multi
view stereo matching algorithm from (Hirschmuller
and Scharstein, 2009). Finally, the obtained depth
images for each camera frame are integrated into a
global 3D model via the proposed 3D fusion tech-
nique. Figure 8 shows one of the acquired camera
images (a) and the resulting 3D model (b). The full
model consists of 167 millions of voxels, which has
been acquired from 450 image frames. The resolu-
tion of the scene was set to 0.1m. Note, that the holes
are caused by occlusions and areas which have not
been observed by the UAV camera during the flight.
These often relates to the ground under the trees, or
the ground in the backyard of the chapel occluded by
the walls.
The second application uses a stereo camera sys-
tem in combination with an inertial measurement unit
(IMU). This enables to obtain the six degrees of free-
dom (6dof) pose of the camera in real time. This set-
up is of particular interest for a wide range of indoor
applications such as inspection, autonomous transport
or logistics. More details about the hardware and
software of the real time localization system can be
found in (Baumbach and Zuev, 2014). Again, the
stereo images are processed to dense disparity images
(Hirschmuller and Scharstein, 2009). The trajectory
provided by the IMU+stereo system and the dispar-
ity images are directly used for 3D fusion. Figure 9
shows a point cloud (a) and the resulting 3D model
(b) when the cone based 3D fusion using the hashed
octree is applied.
The presented results clearly show that the devel-
oped technique is capable of handling large data sets
and to process them to simplified 3D models in linear
time depending on the number of 3D samples. It has
been observed that the multi view 3D reconstruction
point clouds suffer from less noise and errors than
the real time stereo depth images. The reason is that
when multiple images from a single object are avai-
lable, each pixel in each depth image contains multi-
ple depth hypotheses. This enables the optimization
of the depth consistency and to increase the overall
3D reconstruction quality dramatically. As for the
stereo data, the standard 3D fusion technique (Cur-
less and Levoy, 1996) lead to a high number of holes
and artefacts in the final model. Only the cone fusion
approach achieved smooth and consistent surfaces.
When the standard 3D fusion technique from
(Curless and Levoy, 1996) is applied, the algorithm
achieves a runtime performance of 500ms for a single
VGA (640 × 480) depth image on a standard desk-
top PC with 16 cores. After extending the algorithm
Infinite 3D Modelling Volumes
253