Efﬁcient Ray Traversal of Constrained Delaunay Tetrahedralization

Maxime Maria, S´ebastien Horna and Lilian Aveneau

University of Poitiers, XLIM, UMR 7252, Futuroscope Chasseneuil Cedex, Poitiers, France

Keywords:

Ray Tracing, Acceleration Structure, Constrained Delaunay Tetrahedralization.

Abstract:

Acceleration structures are mandatory for ray-tracing applications, allowing to cast a large number of rays per

second. In 2008, Lagae and Dutr´e have proposed to use Constrained Delaunay Tetrahedralization (CDT) as an

acceleration structure for ray tracing. Our experiments show that their traversal algorithm is not suitable for

GPU applications, mainly due to arithmetic errors. This article proposes a new CDT traversal algorithm. This

new algorithm is more efﬁcient than the previous ones: it uses less arithmetic operations; it does not add extra

thread divergence since it uses a ﬁxed number of operation; at last, it is robust with 32-bits ﬂoats, contrary to

the previous traversal algorithms. Hence, it is the ﬁrst method usable both on CPU and GPU.

1 INTRODUCTION

Ray tracing is a widely used method in computer

graphics, known for its capacity to simulate com-

plex lighting effects to render high-quality realistic

images. However, it is also recognized as time-

consuming due to its high computational cost.

To speed up the process, many acceleration struc-

tures have been proposed in the literature. They are

often based on a partition of Euclidean space or ob-

ject space, like kd-tree (Bentley, 1975), BSP-tree,

BVH (Rubin and Whitted, 1980; Kay and Kajiya,

1986) and regular grid (Fujimoto et al., 1986). A

survey comparing all these structures can be found

in (Havran, 2000). They can reach interactive render-

ing, e.g exploiting ray coherency (Wald et al., 2001;

Reshetov et al., 2005; Mahovsky and Wyvill, 2006)

or GPU parallelization (Purcell et al., 2002; Foley

and Sugerman, 2005; G¨unther et al., 2007; Aila and

Laine, 2009; Kalojanov et al., 2011). Nevertheless,

actually a lot of factors impact on traversal efﬁciency

(scene layout, rendering algorithm, etc.).

A different sort of acceleration structures is the

constrained convex space partition (CCSP), slightly

studied up to then. A CCSP is a space partition into

convex volumes respecting the scene geometry. (For-

tune, 1999) introduces this concept by proposing a

topological beam tracing using an acyclic convexsub-

division respecting the scene obstacles, but using a

hand-made structure. Recently, (Maria et al., 2017)

present a CCSP dedicated to architectural environ-

ments, hence limiting its purpose. (Lagae and Dutr´e,

2008) propose to use a constrained Delaunay tetrahe-

dralization (CDT), i.e. CCSP only made up of tetrahe-

dra. However, our experiments show that their CDT

traversal methods cannot run on GPU, due to numer-

ical errors.

Using a particular tetrahedron representation, this

paper proposes an efﬁcient CDT traversal, having the

following advantages:

• It is robust, since it does not cause any error due to

numerical instability, either on CPU or on GPU.

• It requires less arithmetic operations and so it is

inherently faster than previous solutions.

• It is adapted to parallel programming since it does

not add extra thread divergence.

This article is organized as follows: Section 2 re-

capitulates previous CDT works. Section 3 presents

our new CDT traversal. Section 4 discusses our ex-

periments. Finally, Section 5 concludes this paper.

2 PREVIOUS WORKS ON CDT

This section ﬁrst describes CDT, then it presents its

construction from a geometric model, before focusing

on former ray traversal methods.

2.1 CDT Description

A Delaunay tetrahedralization of a set of points

X ∈ E

is a set of tetrahedra occupying the whole

236

Maria M., Horna S. and Aveneau L.

Efﬁcient Ray Traversal of Constrained Delaunay Tetrahedralization.

DOI: 10.5220/0006131002360243

In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017), pages 236-243

ISBN: 978-989-758-224-0

Figure 1: Delaunay triangulation: no vertex is inside a cir-

cumscribed circle.

space and respecting the Delaunay criterion (Delau-

nay, 1934): a tetrahedron T, deﬁned by four vertices

V ⊂ X, is a Delaunay tetrahedron if it exists a circum-

scribed sphere S of T such as no point of X \ {V} is

inside S. Figure 1 illustrates this concept in 2D.

Delaunay tetrahedralization is “constrained” if it

respects the scene geometry. In other words, all the

geometric primitives are necessarily merged with the

faces of the tetrahedra making up the partition.

Three kinds of CDT exist: usual constrained

Delaunay tetrahedralization (Chew, 1989), conform-

ing Delaunay tetrahedralization (Edelsbrunner and

Tan, 1992) and quality Delaunay tetrahedraliza-

tion (Shewchuk, 1998). In ray tracing context, (Lagae

and Dutr´e, 2008) proved that quality Delaunay tetra-

hedralization is the most efﬁcient to traverse.

2.2 CDT Construction

CDT cannot be built from every geometric models. A

necessary but sufﬁcient condition is that the model is a

piecewise linear complex (PLC) (Miller et al., 1996).

In 3D, any non empty intersection between two faces

of a PLC must correspond to either a shared edge or

vertex. In other words, there is no self-intersection

(Figure 2). In computer graphics, a scene is gener-

ally represented as an unstructured set of polygons. In

such a case, some self-intersections may exist. Nev-

ertheless, it is still possible to construct PLC using a

mesh repair technique such as (Zhou et al., 2016).

CDT can be built from a given PLC using the

Si’s method (Si, 2006). It results in a tetrahedral

mesh, containing two kinds of faces: occlusive faces,

belonging to the scene geometry; and some non-

occlusive faces, introduced to build the partition. Ob-

viously, a given ray should traverse the latter, as non-

occlusive faces do not belong to the input geometry.

2.3 CDT Traversal

Finding the closest intersection between a ray and

CDT geometry is done in two main steps. First, the

(a) (b)

Figure 2: Examples of two non-PLC conﬁgurations: inter-

section between (a) two faces, (b) an edge and a face.

tetrahedron containing the ray origin is located. Sec-

ond, the ray goes through the tetrahedralization by

traversing one tetrahedron at a time until hitting an

occlusive face. This process is illustrated in Figure 3.

Let us notice that there is no need to explicitly test

intersections with the scene geometry, as usual ac-

celeration structures do. This is done implicitly by

searching the exit face from inside a tetrahedron.

Locate

origin

Exit face search

Return

intersection

Occlusive

face?

Propagate

ray

CDT Traversal

Yes No

Figure 3: CDT traversal overview: the main key of any

CDT traversal algorithm lies in the “exit face search” part.

2.3.1 Locating Ray Origin

Using pinhole camera model, all primary rays start

from the same origin. For an interactive application

locating this origin is needed only for the ﬁrst frame,

hence it is a negligible problem. Indeed, camera mo-

tion generally corresponds to a translation, for in-

stance when the camera is shifted, or when ray origins

are locally perturbed for depth-of-ﬁeld effect. Using a

maximal distance in the traversal algorithm efﬁciently

solves this kind of move.

Locating the origin of non primary rays is avoided

by exploiting implicit ray connectivity inside CDT:

both starting point and volume correspond to the ar-

rival of the previous ray.

2.3.2 Exit Face Search

Several methods have been proposed in order to ﬁnd

the exit face of a ray from inside a tetrahedron. (La-

gae and Dutr´e, 2008) present four different ones. The

ﬁrst uses four ray/plane intersections and is similar to

(Garrity, 1990). The second is based on half space

classiﬁcation. The third ﬁnds the exit face using 6

Efﬁcient Ray Traversal of Constrained Delaunay Tetrahedralization

237

permuted inner products (called side and noted ⊙) of

Pl¨ucker coordinates (Shoemake, 1998). It is similar to

(Platis and Theoharis, 2003) technique. Their fourth

and fastest method uses 3 to 6 Scalar Triple Products

(STP). It is remarkable that none of these four meth-

ods exploits the knowledge of the ray entry face.

For volume rendering, (Marmitt and Slusallek,

2006) extend (Platis and Theoharis, 2003). Their

method (from now MS06) exploits neighborhood re-

lations between tetrahedra to automatically discard

the entry face. It ﬁnds the exit face using 2,67 side

products on average. Since the number of products

varies, MS06 exhibits some thread divergence in par-

allel environment. This drawback also appears with

the fastest Lagae et al. method.

All these methods are not directly usable on GPU,

due to numerical instability. Indeed, the insufﬁcient

arithmetic precision with 32-bits ﬂoats causes some

failures to traverse CDT, leading to inﬁnite loops.

In this paper, we propose a new traversal algo-

rithm, based on Pl¨ucker coordinates. Like MS06,

it exploits the neighborhood relations between faces.

The originality lies in our speciﬁc tetrahedron repre-

sentation, allowing to use exactly 2 optimized side

products.

3 NEW TRAVERSAL

ALGORITHM

CDT traversal algorithm is a loop, searching for the

exit face from inside a tetrahedron (Figure 3). We

propose a new algorithm, both fast and robust. It

uses Pl¨ucker coordinates, i.e. six coordinates corre-

sponding to the line direction u and moment v. Such

a line is oriented: it passes through a ﬁrst point p, and

then a second one q. Then, u = q− p and v = p × q.

For two lines l = {u : v} and l

′

= {u

′

: v

′

}, the sign

of the side product l ⊙ l

′

= u · v

′

+ v· u

′

indicates the

relative orientation of the two lines: negative value

means clockwise orientation, zero value indicates in-

tersection, and positive value signiﬁes counterclock-

wise orientation (Shoemake, 1998).

3.1 Exit Face Search

Our algorithm assumes that the entry face is known,

and that the ray stabs the current tetrahedron. For a

given entry face, we use its complementin the tetrahe-

dron, i.e. the part made of one vertex, three edges and

three faces. We denote Λ

, Λ

and Λ

the complement

edges, with counterclockwise orientation from inside

the tetrahedron (Figure 4). We number complement

faces with a local identiﬁer from 0 to 2, such that:

r⊙ Λ

< 0

≥ 0 ≥ 0

r⊙ Λ

< 0< 0

≥ 0

(b)(a)

Figure 4: Exit face search example: (a) ray r enters the

tetrahedron through the back face; (b) r ⊙ Λ

< 0 and r ⊙

≥ 0, so the exit face is identiﬁed by 0.

face 0 is bounded by Λ

and Λ

, face 1 is bounded

by Λ

and Λ

, and face 2 is bounded by Λ

and Λ

Using Pl¨ucker side product, the face stabbed by ray r

is:

• face 0, if and only if r turns counterclockwise

around Λ

and clockwise around Λ

(r ⊙ Λ

≥ 0

and r⊙ Λ

< 0);

• face 1, if and only if r turns counterclockwise

around Λ

and clockwise around Λ

(r ⊙ Λ

≥ 0

and r⊙ Λ

< 0);

• face 2, if and only if r turns counterclockwise

around Λ

and clockwise around Λ

(r ⊙ Λ

≥ 0

and r⊙ Λ

< 0).

We compact these conditions into a decision tree

(Figure 4(b)). Each leaf corresponds to an exit face,

and each interior node represents a side product

between r and a line Λ

. At the root, we check r⊙Λ

If it is negative (clockwise), then r cannot stab face

2: in the left subtree, we only have to determine

if r stabs face 0 or 1, using their shared edge Λ

Otherwise, r turns counterclockwise around Λ

and

so cannot stab face 0, and the right subtree we check

if r stabs face 1 or 2 using their shared edge Λ

With Figure 4(a) example, r turns clockwise around

and then counterclockwise around Λ

; so, r exits

through face 0.

Algorithm 1: Exit face search from inside a tetrahedron.

Require: F

= {Λ

, Λ

}: entry face; Λ

: ray;

Ensure: F

: exit face;

1: side ← Λ

⊙ F

.Λ

;

2: id ← (side ≥ 0); {id ∈ {0, 1}}

3: side ← Λ

⊙ F

.Λ

;

4: id ← id + (side < 0); {id ∈ {0, 1, 2}}

5: F

← getFace(F

,id);

6: return F

;

GRAPP 2017 - International Conference on Computer Graphics Theory and Applications

238

Table 1: Exit face according to the entry face and a local

identiﬁer in {0, 1, 2}, following a consistent face numbering

(Figure 5(a)).

Exit

Entry face

identiﬁer F

0 F

2 F

Since every decision tree branch has a ﬁxed depth

of 2, our new exit face search method answers using

exactly two side products. Moreover, it is optimized

to run efﬁciently without any conditional instruction

(Algorithm 1). Notice that leave labels form two pairs

from left to right: the ﬁrst pair (0,1) is equal to the

second (1,2), minus 1. Then, it uses that successful

logical test returns 1 (and 0 in failure case) to decide

which face to discard. So, the test r ⊙ Λ

≥ 0 allows

to decide if we have to consider the ﬁrst or the second

pair. Finally, the same method is used with either the

line Λ

or Λ

This algorithm ends with

getFace

function call.

This function returns the tetrahedron face number ac-

cording to the entry face and to the exit face label.

It answers using a lookup-table, deﬁned using simple

combinatorics (Table 1), assuming a consistent label-

ing of tetrahedron faces (Figure 5(a)).

3.2 Data Structure

Algorithm 1 works for any entry face of any tetrahe-

dron. It relies on two speciﬁc representations of the

tetrahedron faces: a local identiﬁer in {0, 1, 2}, and

global face F

, i ∈ [0. . . 3]. For a given face, it uses 3

Pl¨ucker lines Λ

. Since such lines contain 6 coordi-

nates, a face needs 18 single precision ﬂoats for the

lines (18 × 32 bits), plus brdf and neighborhood data

(tetrahedron and face numbers).

To reduce data size and balance GPU computa-

tions and memory accesses, we dynamically calculate

the Pl¨ucker lines knowing their extremities: each line

starts from a face vertex and ends with the comple-

ment vertex. So, we need all the tetrahedron vertices.

We arrange the faces such that their complement ver-

tex have the same number, implicitly known. Vertices

are stored into tetrahedra (for coalescent memory ac-

cesses), and vertex indices (in [0. . . 3]) are stored into

faces. This leads to the following data structure:

st ru ct Face {

in t br df ; / / −1: Non−Occlusive

in t t e t r a ; / / neighbor

in t f ace ; / / neighbor

in t idV [ 3 ] ; / / face v e r t i c e s

} ;

(a)

(b)

Figure 5: Description of a tetrahedron: (a) vertices and

faces numbering; (b) the complement vertex for F

is {V

and its edges are Λ

= V

, Λ

= V

and Λ

= V

st ru ct T etrahedron {

f l o a t 3 V[ 4 ] ;

/ / v e r t i c e s

Face F [ 4 ] ; / / f a c es

} ;

To save memory and so bandwidth, we compact

the structure

Face

. The neighboring face (the ﬁeld

face

) is a number between 0 and 3; it can be encoded

using two bits, and so packed with the ﬁeld

tetra

corresponding to the neighboring tetrahedron. Thus,

tetrahedron identiﬁers are encoded on 30 bits, allow-

ing a maximum of one billion tetrahedra. In a similar

way, ﬁeld

idV

needs only 2 bits per vertex. But, they

are common to all the tetrahedra, and so are stored

only once for all into 4

unsigned char

. Hence, a

face needs 8 bytes, and a full tetrahedron 80 bytes.

Notice that, on GPU a vertex is represented by 4 ﬂoats

to have aligned memory accesses. Then on GPU a full

tetrahedron needs 96 bytes.

Figure 5 proposes an example: for F

(made using

the complement vertex V

and counterclockwise ver-

texes V

, V

and V

), we can deduce that Λ

= V

and Λ

= V

. Table 2 gives the descrip-

tion of faces according to their vertices and edges, fol-

lowing face numbering presented in Figure 5(a).

Table 2: Complement edges of entry face F

are implicitly

described by the face complement vertex (identiﬁed by i),

and its vertices in counterclockwise order.

Vertexes Λ

0 {3, 1, 2} V

1 {2, 0, 3} V

{3, 0, 1} V

3 {1, 0, 2} V

3.3 Exiting the Starting Volume

Algorithm 1 assumes known the entry face. This con-

dition is not fulﬁlled for the starting tetrahedron. Al-

gorithm 1 must be adapted in that case. A simple so-

lution lies in using a decision tree of depth 4, leading

Efﬁcient Ray Traversal of Constrained Delaunay Tetrahedralization

239

Table 3: Scenes characteristics and performance: number of tetrahedra, number of occlusive faces (faces coming from the

model), number of non-occlusive faces (faces created during tetrahedralization), occupied memory and ray casting perfor-

mance in millions of ray cast per second on CPU and GPU.

BANANA BUNNY ARMADILLO

Tetrahedra 71,300 682,733 2,990,552

Occlusive faces 24,568 222,775 1,105,218

Non-occlusive faces

117,994 1,142,650 4,875,834

Memory (MB) 6 62 273

Ray-casting (Mray/s)

CPU

9.76 10.5 6.75

GPU

428 289 123

to three Pl¨ucker side products. One can settle this tree

starting with any edge to discriminate between two

faces, and so on with the children.

Nevertheless, a simpler but equivalent solution

exists. Once the root ﬁxed, we have only three

possible exit faces. This corresponds to Algorithm 1,

as if the discarded face was the entry one. So, we

just choose one edge to discard a face and then we

call Algorithm 1 with the discarded exit face as

the fake entry one. This leads to Algorithm 2. We

naturally choose edge V

shared by faces F

and F

(Figure 5(a)). If the side product is negative, then we

cannot exit through F

. Else, with a positive or null

value, we cannot exit through F

. Thus, the starting

tetrahedron problem is solved using three and only

three side products.

Algorithm 2: Exit face search from the starting tetrahedron.

Require: T = {V

, F

}

i∈[0...3]

: Tetrahedron; Λ

: Ray;

Ensure: F

: exit face;

1: side ← Λ

⊙V

;

2: f← side < 0; {f∈ {0, 1}}

3: return ExitTetra(F

, Λ

); {Algorithm 1}

3.4 Efﬁcient Side Product

Both Algorithm 1 and 2 use Pl¨ucker side products. A

naive approach results in 23 operations per side prod-

uct: to calculate Pl¨ucker coordinates, we need 3 sub-

tractions for its direction and 6 multiplications and 3

subtractions for its moment. Then, side product needs

6 multiplications and 5 additions. The two side prod-

ucts in Algorithm 1 result in 46 operations.

We propose a new method using less operations. It

rests upon a coordinate system translation to the com-

plement vertex V

of the entry face. In this local sys-

tem, lines Λ

have a nil moment (since they contain

the origin). So, side products are inner products of

vectors having only 3 coordinates: each one needs 3

multiplications and 2 additions. Moreover, line direc-

tions are computed using 3 subtractions. Hence, such

side products need only 8 operations.

Nevertheless, we also need to modify Pl¨ucker co-

ordinates of the ray r to obtain valid side products.

Let us recall how a Pl¨ucker line is made. We com-

pute its direction u using two points p and q on the

line, and its moment v with p × q = p × u. In the lo-

cal coordinates system, the new line coordinates must

be calculated using translated points. The direction is

obviously the same, only v is modiﬁed:

′

= (p−V

) × u

= p× u−V

× u

= v−V

× u.

So, v

′

is calculated using 12 operations: 3 subtrac-

tions, 6 multiplications and 3 subtractions. This ray

transformation is done once per tetrahedron, the local

coordinates system being shared for all the lines Λ

As a conclusion, the number of arithmetic opera-

tions involved in Algorithm 1 can be decreased from

46 to 28, saving about 40% of computations.

4 EXPERIMENTS

This section discusses some experiments made using

our new traversal algorithm.

4.1 Results

Performance is evaluated using three objects tetrahe-

dralized using Tetgen (Si, 2015). Table 3 sums up

their main characteristics and measured performance.

The simplest object is constructed from a banana

GRAPP 2017 - International Conference on Computer Graphics Theory and Applications

240

T = 40.1 ms - Φ = 9.6

(A)

T = 122 ms - Φ = 22.71

(B)

200

160

120

1000 200

Number of traversed tetrahedra per ray

200+

Figure 6: Rendering times on CPU in ms (T, red curve) and number of traversed tetrahedra in millions (Φ, gray bars) using

1,282 points of view and BUNNY; (A) T = 40.1 ms - Φ = 9.6; (B) T = 122 ms - Φ = 22.71.

model, with 25k occlusive faces. The other two corre-

spond to well-known Stanford’s objects: BUNNY and

ARMADILLO. Their CDT respectively count 200k

and 1.1M occlusive faces. We use quality CDT, in-

troducing new vertices into object models, explaining

the high number of faces our three objects have.

Performance is measured in millions of ray cast

per second (Mrays/s) using ray casting, 1024 × 1024

pixels and no anti-aliasing. The used computer pos-

sesses an Intel



Core

i7-4930K CPU @ 3.40Ghz,

32 Gb RAM and NVidia



GeForce



GTX 680. Al-

gorithms are made parallel on CPU (OpenMP) and

GPU (CUDA, with persistent threads (Aila et al.,

2012)). On average, CPU ray casting reaches 9

Mrays/s, GPU version 280 Mrays/s.

4.2 Traversal

Closest ray/object intersection is found by traversing

CDT one tetrahedron at a time until hitting an occlu-

sive face. The ray traversal complexity is linear in

the number of traversed tetrahedra. Figure 6 shows

the relation between execution time (T) and number

of traversed tetrahedra per image (Φ). Statistics are

extracted on CPU using 1,282 points of view from

BUNNY.

The execution time is proportional to Φ: on point

of view (A), almost 10 millions tetrahedra are tra-

versed in 40.1 ms; on point of view (B), we tra-

verse 23 millions tetrahedra in 122 ms. It is not

strictly proportional, mainly due to memory accesses

that become more important when more tetrahedra

are traversed, leading to more memory cache defaults.

False-colored image of point of view (B) reveals that

rays going close to object boundary traverse more

tetrahedra.

4.3 Numerical Robustness

Using ﬂoating-point numbers can cause errors due to

numerical instability. Tetgen uses geometric predi-

cates (e.g. (Shewchuk, 1996) or (Devillers and Pion,

2003)) to construct robust CDT. If this is common

practice in algebraic geometry, it is not the case in

rendering. Hence, it is too expensive to be used in

CDT ray traversal.

We experimented three methods proposed in (La-

gae and Dutr´e, 2008) (ray/plane intersection tests,

Pl¨ucker coordinates and STP), plus the method pro-

posed in (Marmitt and Slusallek, 2006) (MS06) (Sec-

tion 2.3.2). We noticed they all suffer from numer-

ical errors either on CPU or GPU. Indeed, calcula-

tion are not enough precise with rather ﬂat tetrahedra.

Thus, without extra treatment (like moving the ver-

tices) these algorithms may return a wrong exit face

or do not ﬁnd any face at all (no test is valid). Table 4

reports for each object the number of rays per image

concerned by this problem, averaged over points of

view series.

In contrast, we did not obtain wrong results us-

Efﬁcient Ray Traversal of Constrained Delaunay Tetrahedralization

241

Table 4: Numerical errors impact on GPU: number of rays

suffering from wrong results for 1024 × 1024 pixels, and

averaged over about 1, 300 points of view.

BANANA BUNNY ARMADILLO

Ray/plane 33.27 40.85 74.85

Pl¨ucker

3.6 22.25 412.13

STP 63.07 204.89 456.65

MS06

0.0007 0.004 0.422

Ours 0 0 0

ing our method. It can be explained by the smaller

number of performed arithmetic operations; less nu-

merical errors accumulated, more accurate results.

4.4 Exit Face Search Comparison

This section compares performance of our exit face

search algorithm with the same 4 previous methods:

ray/plane intersection tests, Pl¨ucker coordinates, STP

and MS06 (Section 2.3.2). Statistics are summed up

in Table 5. Times are measured for 16,384 random

rays stabbing 10,000 random tetrahedra, both on CPU

(using one thread) and GPU.

Table 5: Exit face search comparison: time (in ms) to deter-

mine the exit face for 10,000 tetrahedra and 16,384 random

rays per tetrahedron; on CPU (single thread) and on GPU.

Method

Time (ms)

CPU GPU

Ray/plane 15,623 36

Pl¨ucker 10,101 28

STP

4,876 29

MS06 5,994 21

Ours 2,663 13

CPU results show that our method is much more

efﬁcient than former ones. This behavior is expected

since our new method requires less arithmetic opera-

tions. STP is the fastest previous method, but is 83%

slower than ours.

On GPU, results are slightly different. For exam-

ple, Pl¨ucker method is faster than STP. Indeed, even

if it requires more operations, it does not add extra

thread divergence. Hence, it is more adapted to GPU.

Among the previous GPU methods, the most efﬁcient

is MS06, still 59% slower than ours.

4.5 State-of-the-art Comparison

In (Lagae and Dutr´e, 2008), authors noticed that ren-

dering using CDT as acceleration structure takes two

to three more computation times than using kdtree. In

this last section, we check if it is still the case using

our new tetrahedron exit algorithm and on GPU. We

Table 6: Performance comparison with (Aila et al., 2012),

in number of frames per second.

CDT

BVH

(Aila et al., 2012)

BANANA 315-947 200-260

BUNNY 130-1040 160-260

ARMADILLO

82-160 130-260

compare our GPU ray-tracer with the state-of-the-art

ray tracer (Aila et al., 2012), always using the same

computer. Their acceleration structure is BVH, con-

structed using SAH (MacDonald and Booth, 1990)

and split of large triangles (Ernst and Greiner, 2007).

To our knowledge, nowadays their implementation is

the fastest GPU one.

Table 6 sums up this comparison. Results show

that CDT is still not a faster acceleration structure

than classical ones (at least than BVH on GPU). First,

the timings show larger amplitude using CDT than

BVH. Moreover, while CDT is on average faster than

BVH with BANANA and BUNNY models, it is no

more true using ARMADILLO. This is directly linked

to the traversal complexity of the two structures. BVH

being built up following SAH, its performance is less

impacted with the geometry input size, contrary to

CDT where this size has a direct impact on perfor-

mance. Clearly, a heuristics similar to SAH is missing

for tetrahedralization.

5 CONCLUSION

This article proposes a new CDT ray traversal algo-

rithm. It is based upon a speciﬁc tetrahedron repre-

sentation, and fast Pl¨ucker side products. It uses less

arithmetic operations than previous methods. Last but

not least, it does not involve any conditional instruc-

tions, employing two and only two side products to

exit a given tetrahedron.

This algorithm exhibits several advantages com-

pared to the previous ones. Firstly it is inherently

faster, requiring less arithmetic operations. Secondly

it is more adapted to parallel computing, since having

a ﬁxed number of operations it does not involve extra

thread divergence. Finally, it is robust and works with

32-bits ﬂoats either on CPU or GPU.

As future work, we plan to design a new construc-

tion heuristic, to obtain as fast to traverse as possible

CDT. Indeed, CDT traversal speed highly depends on

its construction. CDT traversal complexity is linear

in the number of traversed tetrahedra: the less tra-

versed tetrahedra, the more high performance. Be-

fore SAH introduction, the same problem existed with

well-known acceleration structures like kd-tree and

GRAPP 2017 - International Conference on Computer Graphics Theory and Applications

242

BVH, for which performance highly depends on the

geometric model. Since CDT for ray-tracing is a re-

cent method, we expect that similar heuristics exists.

REFERENCES

Aila, T. and Laine, S. (2009). Understanding the Efﬁciency

of Ray Traversal on GPUs. In High-Performance

Graphics, HPG ’09, pages 145–149.

Aila, T., Laine, S., and Karras, T. (2012). Understanding

the efﬁciency of ray traversal on GPUs – Kepler and

Fermi addendum. Technical report, NVIDIA Corp.

Bentley, J. L. (1975). Multidimensional Binary Search

Trees Used for Associative Searching. Communica-

tions of the ACM, 18(9):509–517.

Chew, L. P. (1989). Constrained Delaunay triangulations.

Algorithmica, 4:97–108.

Delaunay, B. (1934). Sur la sph`ere vide.

A la m´emoire de

Georges Vorono¨ı. Bulletin de l’Acad´emie des Sciences

de l’URSS, (6):793–800.

Devillers, O. and Pion, S. (2003). Efﬁcient Exact Ge-

ometric Predicates for Delaunay Triangulations. In

5th Workshop on Algorithm Engineering and Exper-

iments, ALENEX ’03, pages 37–44.

Edelsbrunner, H. and Tan, T. S. (1992). An upper bound

for conforming delaunay triangulations. In 8th An-

nual Symposium on Computational Geometry, SCG

’92, pages 53–62.

Ernst, M. and Greiner, G. (2007). Early Split Clipping for

Bounding Volume Hierarchies. In IEEE Symposium

on Interactive Ray Tracing, RT ’07, pages 73–78.

Foley, T. and Sugerman, J. (2005). KD-tree Accelera-

tion Structures for a GPU Raytracer. In ACM SIG-

GRAPH/EUROGRAPHICS conference on Graphics

Hardware, HWWS ’05, pages 15–22.

Fortune, S. (1999). Topological Beam Tracing. In 15th An-

nual Symposium on Computational Geometry, SCG

’99, pages 59–68.

Fujimoto, A., Tanaka, T., and Iwata, K. (1986). ARTS:

Accelerated Ray-Tracing System. IEEE Computer

Graphics and Applications, 6(4):16–26.

Garrity, M. P. (1990). Raytracing Irregular Volume Data.

ACM SIGGRAPH Computer Graphics, 24(5):35–40.

G¨unther, J., Popov, S., Seidel, H.-P., and Slusallek, P.

(2007). Realtime Ray Tracing on GPU with BVH-

based Packet Traversal. In IEEE Symposium on Inter-

active Ray Tracing 2007, RT ’07, pages 113–118.

Havran, V. (2000). Heuristic Ray Shooting Algorithms.

PhD thesis, Department of Computer Science and En-

gineering, Faculty of Electrical Engineering, Czech

Technical University in Prague.

Kalojanov, J., Billeter, M., and Slusallek, P. (2011). Two-

Level Grids for Ray Tracing on GPUs. Computer

Graphics Forum, 30(2):307–314.

Kay, T. L. and Kajiya, J. T. (1986). Ray Tracing Com-

plex Scenes. ACM SIGGRAPH Computer Graphics,

20(4):269–278.

Lagae, A. and Dutr´e, P. (2008). Accelerating Ray Trac-

ing using Constrained Tetrahedralizations. Computer

Graphics Forum, (4):1303–1312.

MacDonald, D. J. and Booth, K. S. (1990). Heuristics for

Ray Tracing Using Space Subdivision. The Visual

Computer, 6(3):153–166.

Mahovsky, J. and Wyvill, B. (2006). Memory-Conserving

Bounding Volume Hierarchies with Coherent Raytrac-

ing. Computer Graphics Forum, 25(2):173–182.

Maria, M., Horna, S., and Aveneau, L. (2017). Constrained

Convex Space Partition for Ray Tracing in Architec-

tural Environments. Computer Graphics Forum.

Marmitt, G. and Slusallek, P. (2006). Fast Ray Traversal

of Tetrahedral and Hexahedral Meshes for Direct Vol-

ume Rendering. In 8th Joint EG / IEEE VGTC Confer-

ence on Visualization, EUROVIS ’06, pages 235–242.

Miller, G. L., Talmor, D., Teng, S.-H., Walkington, N.,

and Wang, H. (1996). Control Volume Meshes using

Sphere Packing: Generation, Reﬁnement and Coars-

ening. In 5th International Meshing Roundtable, IMR

’96, pages 47–62.

Platis, N. and Theoharis, T. (2003). Fast Ray-Tetrahedron

Intersection Using Plucker Coordinates. Journal of

Graphics Tools, 8(4):37–48.

Purcell, T. J., Buck, I., Mark, W. R., and Hanrahan,

P. (2002). Ray Tracing on Programmable Graph-

ics Hardware. ACM Transactions on Graphics,

21(3):703–712.

Reshetov, A., Soupikov, A., and Hurley, J. (2005). Multi-

level Ray Tracing Algorithm. ACM Transactions on

Graphics, 24(3):1176–1185.

Rubin, S. M. and Whitted, T. (1980). A 3-dimensional rep-

resentation for fast rendering of complex scenes. ACM

SIGGRAPH Computer Graphics, 14(3):110–116.

Shewchuk, J. R. (1996). Adaptive precision ﬂoating-point

arithmetic and fast robust geometric predicates. Dis-

crete & Computational Geometry, 18:305–363.

Shewchuk, J. R. (1998). Tetrahedral Mesh Generation by

Delaunay Reﬁnement. In 14th Annual Symposium on

Computational Geometry, SCG ’98, pages 86–95.

Shoemake, K. (1998). Pl¨ucker coordinate tutorial. Ray

Tracing News, 11:20–25.

Si, H. (2006). On Reﬁnement of Constrained Delaunay

Tetrahedralizations. In 15th International Meshing

Roundtable, IMR ’06, pages 509–528.

Si, H. (2015). TetGen, a Delaunay-Based Quality Tetrahe-

dral Mesh Generator. ACM Transactions on Mathe-

matical Software, 41(2).

Wald, I., Slusallek, P., Benthin, C., and Wagner, M. (2001).

Interactive Rendering with Coherent Ray Tracing.

Computer Graphics Forum, 20(3):153–165.

Zhou, Q., Grinspun, E., Zorin, D., and Jacobson, A. (2016).

Mesh Arrangements for Solid Geometry. ACM Trans-

actions on Graphics, 35(4).

Efﬁcient Ray Traversal of Constrained Delaunay Tetrahedralization

243