TOWARDS REAL-TIME AND ACCURATE VOXEL COLORING

FRAMEWORK

Oussama Moslah, Arnaud Debeugny, Vincent Guitteny, Serge Couvet

THALES Security Solutions and Services, 1 Rue du General de Gaulle, 95523 Cergy-Pontoise, France

Sylvie Philipp-Foliguet

ETIS - UMR CNRS 8051, ENSEA, 6 Avenue du Ponceau, 95014 Cergy-Pontoise, France

Keywords:

Voxel coloring, Visual hull, Marching cubes, Octree, GPGPU.

Abstract:

This paper presents algorithms and techniques towards a real-time and accurate Voxel Coloring framework.

We combine Visual Hull, Voxel Coloring and Marching Cubes techniques to derive an accurate 3D model

from a set of calibrated photographs. First, we adapted the Visual Hull algorithm for the computation of the

bounding box from image silhouettes. Then, we improved the accuracy of the Voxel Coloring algorithm using

both colorimetric and geometric citerions. The calculation time is reduced using an Octree data structure.

Then, the Marching Cubes is used to obtain a polygonal mesh from the voxel reconstruction. Finally, we

propose a practical way to speed up the whole process using graphics hardware capababilities.

1 INTRODUCTION

In this paper we address the problem of real-time

3D reconstruction from photographs. Our frame-

work consists of three parts: (1) computation of the

bounding box of the object we want to reconstruct us-

ing a Visual Hull approach, (2) a voxel reconstruc-

tion based on both colorimetric and geometric crite-

rions (3) and a generation of a polygonal mesh us-

ing Marching Cubes techniques. The context of this

work is the growing interest in automatic reconstruc-

tion techniques from photographs. With the increas-

ing capababilities of modern graphics hardware 3D

reconstruction techniques can be accelerated to obtain

accurate models in real-time.

2 RELATED WORK

The original Voxel Coloring paper described in (Seitz

and Dyer, 1997) uses only colorimetric criterions to

reconstruct an object consistent with the input images.

This algorithm starts by discretizing the 3D space into

voxels and projects them on each image. The voxels

that are consistent from a colorimetric viewpoint with

the images are retained. The complexity of this algo-

rithm is O(N

∗ n) with N

is the number of voxels

and n the number of images. In order to improve the

accuracy of this method we use both colorimetric and

geometric criterions to derive 3D models from image

silhouettes. The calculation time is improved using

an Octree data structure. The Visual Hull algorihm

(Franco and Boyer, 2003) operates in a different man-

ner by projecting the image silhouettes into the 3D

space. The intersection of the silhouettes cones pro-

duces the 3D polygonal model. We adapted this algo-

rithm to compute the bounding box of the 3D object

which is needed for the voxel reconstruction. Instead

of projecting the image silhouettes into the 3D space

we project their 2D bounding boxes. The Marching

Cubes technique (Lorensen and Cline, 1987; F. Goetz,

2005) takes as input a 3D point cloud and produces a

textured polygonal mesh. Acceleration using graphics

hardware has been for a long time restricted to purely

graphical processing. With the constant evolution of

graphics hardware and the emerging GPGPU (Gen-

eral Purpose GPU) techniques and technologies such

Cg (W. R. Mark, 2003) and CUDA (Cuda, 2008) re-

searchers start to re-design their algorithms to bene-

ﬁt from the parallel capababilities of modern GPUs

(Trendall and Steward, 2000; Krueger and Wester-

mann, 2003; F. Goetz, 2005).

490

Moslah O., Debeugny A., Guitteny V., Couvet S. and Philipp-Foliguet S. (2009).

TOWARDS REAL-TIME AND ACCURATE VOXEL COLORING FRAMEWORK.

In Proceedings of the Fourth International Conference on Computer Vision Theory and Applications, pages 490-495

DOI: 10.5220/0001794704900495

 SciTePress

Figure 1: A sample data set: Our system takes as input a set of calibrated images and silhouettes and produces a textured

polygonal model (VGG, 2008).

3 OUR APPROACH

Given a set of calibrated images and silhouettes our

system produces a textured polygonal model. Figure

1 illustrates a sample input data set used for evaluating

our reconstruction pipeline. The proposed solution

described consists of three parts. First we compute the

bounding box of the object we want to reconstruct us-

ing a Visual Hull approach. Then we reconstruct the

3D object with a colorimetric and geometric consis-

tency based Voxel Coloring scheme. ﬁnally, we pro-

duce a textured polygonal model using The Marching

Cubes technique. The Voxel Coloring algorithm is ac-

celerated using graphics hardware.

3.1 Visual Hull

The Visual Hull algorihm (Franco and Boyer, 2003)

computes a 3D coarse representation of an object

from it’s 2D projections in a set of images. Figure

2 illustrates this algorithm in the simple case of 2 cal-

ibrated images. Given (1) two projection matrices P

and P

(2) two 2D regions D

and D

representing

the projection of the same 3D object we compute the

3D cones V

and V

and intersect them to compute the

coarse 3D model C.

Figure 2: Illustration of the Visual Hull principle.

We adapted this algorithm to compute the bounding

box of the 3D object which is needed for the voxel re-

construction. Instead of projecting the image silhou-

ettes into the 3D space we project their 2D bounding

boxes.

3.2 Voxel Coloring

The Voxel Coloring algorithm described in (Seitz and

Dyer, 1997) uses a colorimetric criterion to decide if

a voxel is consistent or not. Thus voxels can be col-

orized even if their projection is totally outside of the

object silhouette. In order to improve the accuracy

of the Voxel Coloring algorithm we add a geomet-

ric criterion (Kuzu and Rodehorst, 2001). Figure 3

illustrates the principle of the use of silhouettes: vox-

els are identiﬁed respectively as gray, black or white

depending if their projection into images falls in the

boundary, outside or inside of the silhouettes.

Figure 3: Illustration of the geometric consistency check.

To reduce the complexity of the algorithm we use an

Octree data structure (A. W. Fitgibbon, 1998). We

recursively subdivide the volume into 8 subvolumes.

The subdivision of a volume is made only if it’s pro-

jection into images is on the boundary of the silhou-

ette. Thus the object is reconstructed in an economic

way. Figure 4 shows an illustration of a volume sub-

division and it’s associated data structure.

TOWARDS REAL-TIME AND ACCURATE VOXEL COLORING FRAMEWORK

491

Figure 4: Recursive subdivision of a volume and the asso-

ciated octree.

We also use a 3D connexity check to improve the sur-

face of the voxel reconstruction. We proceed as fol-

lows to update the voxel classiﬁcation: (1) we check

the 6 neighbours of each gray voxel and if no black

voxel is found we identify it as white, (2) we also

check the neighbourhood of each white voxel and if

at least one black voxel is found we identify it as gray.

Figure 5 illustrates the principle of this algorithm.

Figure 5: Illustration of the voxels connexity check.

3.3 Marching Cubes

The Marching cubes algorithm (Lorensen and Cline,

1987) is used to obtain a polygonal model from a scat-

tered set of voxels. The algorithm starts by taking

eight neighbor locations to construct a cube, then de-

termine the polygons that passes through this cube.

The individual polygons are then fused into the model

surface. We use an index of precalculated array of 256

possible polygon conﬁgurations (2

= 256) within the

cube. This array of 256 cube conﬁgurations is ob-

tained by reﬂections and symmetrical rotations of the

basic cases illustrated by Figure 6.

Figure 6: 15 basic conﬁgurations of polygons.

3.4 Acceleration using Graphics

Hardware

The GPU architecture is specialized for parallel com-

puting tasks. The graphics hardware consists of a set

of processors grouped together in a common multi-

processors block. Figure 7 illustrates the hierarchy of

the parallel architecture of actual GPUs.

Figure 7: Graphics hardware parallel architecture.

For example the device used in this work is the NVidia

GeForce 8800GTS card. This card mainly consists

of 12 multiprocessors and a 512 MB device mem-

ory. Each multiprocessor is composed itself by 16

processors, a shared memory and an instruction unit.

This card can thus make 192 calculations in paral-

lel. Unlike the device memory the shared memory

consists of 16 Kb and is accessible only by proces-

sors belonging to the same multiprocessor unit. How-

ever this memory is very usefull and can be accessible

much more faster than the device memory. To exe-

cute hundreds of processes working in various pro-

grams, multi-processors use new architecture called

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

492

(a) (b) (c) (d)

Figure 8: Reconstruction results for three different level of details (64 = 2

, 128 = 2

, 256 = 2

) (a,c) Calculation using the

CPU (b,d) Calculation using the GPU.

SIMT (Single Instruction Multiple Thread).

The Voxel Coloring process is accelerated using the

parallel capabilities of modern graphics hardware.

Unlike the original Voxel Coloring algorithm (Seitz

and Dyer, 1997) we use an Octree data structure

(Szelinski, 1993). First, each volume is subdivided

into 8 subvolumes. This gives rise to 27 points (we

remove redundant 3D points). The projection is made

using the well known pinhole camera model (Zisser-

mann and Hartley, 2003) that describes how a 3D

point M with coordinates (X, Y , Z) in the world co-

ordinate space projects into an image point m with

coordinates (u,v) in pixels using the classic perspec-

tive transformation:

∼



| − R



M = PM (1)

Where R and t respectively represent the camera ori-

entation and position, K the camera matrix or matrix

of intrinsic parameters and P the projection matrix. In

order to optimize the use of parallel architecture of the

graphics hardware we assign a different process for

image point coordinate computation using the simple

formula:

m[p + 27n] =

∑

k=0

P[4n + 12i + k] ∗ M[4p + k] (2)

Where m, P and M respectively represent the concate-

nated matrices of 2D image points, projection ma-

trices and 3D points. Then the result is normalized

and we obtain 27 points in homogenous coordinates

(u,v,1) for each image using the formula :

m[p + 27n] =

m[p + 27n]

m[p + 54]

(3)

Finally, a subvolume can be recursively subdivided if

the projection into images falls in the boundary of the

silhouette. Thus we get better level of details without

the need of subdividing all the voxels.

4 RESULTS

Figure 8 illustrates the ﬁnal results obtained with our

Voxel Coloring framework for different level of de-

tails. The level of detail correspond to the depth of the

Octree data structure. It is the maximum number of

recursive subdivision of a voxel. Figure 9 shows the

reconstruction results of the Dinosaur and the Soldier

for the maximum level of detail. Evaluation of the re-

sults and the computing times of our framework are

presented in Table 1.

5 DISCUSSION

Comparison to Previous Work. Most previous work

on reconstruction techniques from image silhouettes

using Voxel Coloring algorithm either use geometric

or colorimetric approach. In our work we combine

those two criterions to get an accurate reconstruc-

tion. We also use an Octree data structure to improve

the computing times and propose a way to accelerate

the algorithm using parallel processing capabilities of

modern GPUs.

Robustness and Limitations. The robustness of our

approach strongly depends on the quality of silhou-

ettes. Actually only the Voxel Coloring algorithm is

accelerated using the GPU. Thus our framework is a

mixed CPU/GPU implementation.

Future Work. We are working on the implementa-

tion of the Marching Cubes and the Visual Hull al-

gorithms using the GPU and we expect to get real-

time computing times for the whole process. We are

also working on photorealistic rendering of the recon-

structed model using view-dependent texture map-

ping techniques (P. E. Debevec, 1996).

6 CONCLUSIONS

This paper introduced methods and techniques for

real-time recovering of accurate textured 3D models

TOWARDS REAL-TIME AND ACCURATE VOXEL COLORING FRAMEWORK

493

(a) (b)

Figure 9: Results using the highest level of detail (256 = 2

). (a) the Dinosaur reconstructed (b) the Soldier reconstructed.

Table 1: Evaluation results of our Voxel Coloring framework using the Nvidia card 8800GTS 512MB on a Dualcore Intel

Pentium 4 3.2Ghz with 2GB RAM.

Dinosaur Soldier

64 128 256 64 128 256

Visual Hull (seconds) 0.442 3.703

Voxel Coloring (seconds) CPU 92.4 518 5449 56.47 200 1614

GPU 1.50 3.25 16.44 0.79 1.98 8.33

Marching Cubes (seconds) 2.37 12.49 97.16 3.13 13.56 90.34

Mesh (number of triangles) 18736 73672 308956 24020 93652 392468

from image silhouettes. The main contributions of

this work is the way we compute the bounding box

using a Visual Hull approach, the combined colori-

metric and geometric criterions used inside the Voxel

Coloring algorithm and the way the computations are

accelerated using the parallel capabilities of modern

GPUs.

7 ACKNOWLEDGEMENTS

We wish to acknowledge the Cap Digital Business

Cluster Terra Numerica project for sponsoring the re-

search reported in this paper.

REFERENCES

A. W. Fitgibbon, G. Cross, A. Z. (1998). Automatic 3d

model construction for turn-tables sequences. Lec-

tures notes in Computer Sciences.

Cuda (2008). Cuda: Compute uniﬁed device architecture,

www.nvidia.com/cuda. NVidia.

F. Goetz, T. Junklewitz, G. D. (2005). Real-time marching

cubes on the vertex shader. In Eurographics.

Franco, J. and Boyer, E. (2003). Exact polyhedral vi-

sual hull. In British Machine Vision Conference

(BMVC’03), volume I, pages 329–338.

Krueger, J. and Westermann, R. (2003). Acceleration tech-

niques for gpu-based voluume rendering. In IEEE Vi-

sualization’03.

Kuzu, Y. and Rodehorst, V. (2001). Volumetric modeling

using shape from silhouette, photogrammetry and car-

tography.

Lorensen, W. E. and Cline, H. E. (1987). Marching cubes:

A high resolution 3d surface construction algorithm.

In SIGGRAPH’97.

P. E. Debevec, C. J. Taylor, J. M. (1996). Modeling

and rendering architecture from photographs: a hy-

brid geometry- and image-based approach. In SIG-

GRAPH’96.

Seitz, S. M. and Dyer, C. R. (1997). Photorealistic scene

reconstruction by voxel coloring. In Computer Vision

and Pattern Recognition Conf., pages 1067–1073.

Szelinski, R. (1993). Rapid octree construction from images

sequences. In CVGIP, pages 23–32.

Trendall, C. and Steward, A. J. (2000). General calculation

using graphics hardware, with application to interac-

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

494

tive caustics. In Eurographics Workshop on Render-

ing, pages 287–298. Springer.

VGG (2008). Visual geometry group dataset.

www.robots.ox.ac.uk/˜vgg/data/data-mview.html.

W. R. Mark, R. S. Glanville, K. A. M. K. (2003). Cg: A

system for programming graphics hardware in a c-like

language. In Proceedings of SIGGRAPH.

Zissermann, A. and Hartley, R. (2003). Multiple View Ge-

ometry in Computer Vision. Cambridge University

press, 2nd edition.

TOWARDS REAL-TIME AND ACCURATE VOXEL COLORING FRAMEWORK

495