Self-learning Voxel-based Multi-camera Occlusion Maps for 3D
Reconstruction
Maarten Slembrouck, Dimitri Van Cauwelaert, David Van Hamme, Dirk Van Haerenborgh,
Peter Van Hese, Peter Veelaert and Wilfried Philips
Ghent University, TELIN dept. IPI/iMinds, Ghent, Belgium
Keywords:
Multi-camera, Occlusion Detection, Self-learning, Visual Hull.
Abstract:
The quality of a shape-from-silhouettes 3D reconstruction technique strongly depends on the completeness
of the silhouettes from each of the cameras. Static occlusion, due to e.g. furniture, makes reconstruction
difficult, as we assume no prior knowledge concerning shape and size of occluding objects in the scene. In
this paper we present a self-learning algorithm that is able to build an occlusion map for each camera from
a voxel perspective. This information is then used to determine which cameras need to be evaluated when
reconstructing the 3D model at every voxel in the scene. We show promising results in a multi-camera setup
with seven cameras where the object is significantly better reconstructed compared to the state of the art
methods, despite the occluding object in the center of the room.
1 INTRODUCTION
Occlusion is undesirable for computer vision appli-
cations such as 3D reconstruction based on shape-
from-silhouettes (Laurentini, 1994; Corazza et al.,
2006; Corazza et al., 2010; Grauman et al., 2003) be-
cause parts of the object disappear in the foreground-
background segmentation. However, in real-world
applications occlusion is unavoidable. In order to
handle occlusion, we propose a self-learning algo-
rithm that determines occlusion for every voxel in the
scene. We focus on occlusion as a result of static ob-
jects between the object and the camera.
Algorithms to detect partial occlusion are pre-
sented in (Guan et al., 2006; Favaro et al., 2003;
Apostoloff and Fitzgibbon, 2005; Brostow and Essa,
1999). However, in these papers occlusion is de-
tected from the camera view itself by keeping an
occlusion map which is a binary decision for each
of its pixels. An OR-operation between the fore-
ground/background mask and the occlusion mask, re-
sults in the input masks for the visual hull algorithm.
The major drawback to this approach is that occlusion
is in fact voxel-related, rather than pixel-related: the
same pixel in an image is occluded when the occlud-
*This research was made possible through iMinds, an
independent research institute founded by the Flemish gov-
ernment.
ing object is located between the object and the cam-
era, but not if the object is placed between the camera
and the occluding object. Pixel-based occlusion de-
tection results in a 3D model which consists of far
more voxels not belonging to the 3D model because
the occluder is also reconstructed and depending on
the position of the person, parts of the occluder are
still left over after subtracting the visual hull of the
occluder (we will show this in Section 5).
Therefore, we propose an occlusion map (one for
each camera) from a voxel perspective, in order to
evaluate each voxel separately. After the occlusion
maps are built, we can use this information and only
evaluate the camera views which are not occluded and
therefore contribute to the 3D model.
We determine occlusion and non-occlusion based
on a fast algorithm of the visual hull concept. In or-
der for the system to work, the occlusion algorithm
requires someone walking in the scene to make oc-
cluded regions appear. Subsets of the different camera
views are used to increase the votes for either occlu-
sion or non-occlusion for each voxel. A majority vote
decides about the final classification.
We also assign a quality factor to the subset of
chosen cameras because the volume of the visual hull
strongly depends on the camera positions. Instead of
counting integer votes, we increment by the quality
factor which depends both on the combined cameras
and the voxel position.
502
Slembrouck M., Van Cauwelaert D., Van Hamme D., Van Haerenborgh D., Van Hese P., Veelaert P. and Philips W..
Self-learning Voxel-based Multi-camera Occlusion Maps for 3D Reconstruction.
DOI: 10.5220/0004723305020509
In Proceedings of the 9th International Conference on Computer Vision Theory and Applications (VISAPP-2014), pages 502-509
ISBN: 978-989-758-004-8
Copyright
c
2014 SCITEPRESS (Science and Technology Publications, Lda.)