A GPU-based Method for Generating quasi-Delaunay Triangulations

based on Edge-ﬂips

Cristobal A. Navarro

, Nancy Hitschfeld-Kahler

and Eliana Scheihing

Department of Computer Science, FCFM, Universidad de Chile, Santiago, Chile

Instituto de Inform´atica, Universidad Austral de Chile, Valdivia, Chile

Keywords:

Delaunay Triangulations, Edge-ﬂip Technique, Parallel Realtime Applications, CUDA, OpenGL, GPGPU.

Abstract:

The Delaunay edge-ﬂip technique is a practical method for transforming any existing triangular mesh S into

a mesh T(S) that satisﬁes the Delaunay condition. In this paper we present an iterative GPU-based method

capable of improving triangulations under the Delaunay criteria. This method is based on the edge-ﬂip tech-

nique and its implementation is fully integrable with the OpenGL rendering pipeline. Since the algorithm uses

an ε value to handle co-circular or close to co-circular point conﬁgurations, we can not guarantee that all trian-

gles fulﬁll the Delaunay condition. However, we have compared the triangulations generated by our method

with the ones generated by the Triangle software and by the CGAL library and we obtained less than 0.05%

different triangles. Based on our experimental results, we report speedups from 14× to 50× against Lawson’s

sequential algorithm and of approximately 3× against the O(nlog n) CGAL’s and Triangle’s constructive

algorithms while processing bad quality triangulations.

1 INTRODUCTION

The Delaunay triangulation T of a point set P is the

triangulation that maximizes the smallest angle over

all triangulations of P. Numeric computations on this

kind of triangulation is known to be more precise

than in the other ones (De Berg, 2000). Good qual-

ity meshes are needed in many applications such as

scientiﬁc simulations, terrain rendering, video-games

and medical 3D reconstruction, among others.

Delaunay triangulations can be achieved in two

ways: (a) by creating them from a PSLG (Planar

straight linear graph), or (b) by transforming an al-

ready existing triangulation into one that satisﬁes the

Delaunay condition. In general, in the case (a), a De-

launay triangulation is generated for the set of points

of the PSLG. The segments (boundary edges) are

then inserted to generate either a constrained Delau-

nay triangulation or a conforming Delaunay triangu-

lation (Shewchuk, 1996). Case (b) assumes that a

triangulation is given as input and the mesh needs

to be transformed into a Delaunay mesh. A known

technique for making this transformation is to ﬂip

the edges that do not satisfy the Delaunay condition.

The edge-ﬂip technique was ﬁrst introduced by Law-

son (Lawson, 1972) and the proposed sequential al-

gorithm has a worst-case complexity O(n

), where n

is the number of points of the triangulation (Fortune,

1993; Edelsbrunner, 2001).

Real-time applications cannot make use of se-

quential algorithms when handling meshes close to

a million triangles. To achieve faster computations,

parallel solutions are needed. In recent years, GPU

computing has become an important research area for

parallel computing due to its high performance and

low cost. Several applications that require geometric

modeling and visualization beneﬁts strongly from the

use of a GPU. In particular, the generation of Delau-

nay triangulations with a fast GPU-based method is

today a topic of research.

The main contribution of this paper is the design

and implementation of an iterative GPU-based al-

gorithm that generates quasi-Delaunay triangulations

starting from any existing triangulation. The algo-

rithm maps threads to edges. Each thread is responsi-

ble for checking one edge Delaunay condition, doing

one edge-ﬂip and updating one edge data inconsis-

tency if necessary. The performance and the qual-

ity of the generated meshes is compared with two

well known and efﬁcient sequential constrained De-

launay algorithms: the algorithm inside the software

Triangle (Shewchuk, 1996) and the algorithm avail-

able in the CGAL library (CGAL, 2012). As test

examples we used bad quality triangulations (with

Navarro C., Hitschfeld-Kahler N. and Scheihing E..

A GPU-based Method for Generating quasi-Delaunay Triangulations based on Edge-ﬂips.

DOI: 10.5220/0004281900270034

In Proceedings of the International Conference on Computer Graphics Theory and Applications and International Conference on Information

Visualization Theory and Applications (GRAPP-2013), pages 27-34

ISBN: 978-989-8565-46-4

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

minimum angles close to 0), with mesh sizes rang-

ing from 100 thousand up to 5 millions points. We

are not using exact predicates nor ﬂoating point ﬁlters

because these techniques can not be efﬁciently imple-

mented on GPU architectures without sacriﬁcing per-

formance. These techniques would require adding if-

else conditionals and handle irregular-dataaccess pat-

terns. That is why, in favor of speed, some results are

quasi-Delaunay triangulations and not fully Delaunay

triangulations.

In principle, the comparison of a transformation

algorithm that generates quasi-Delaunay triangula-

tions, as the proposed in this paper, with respect to

constructive ones that generate exact Delaunay trian-

gulations such as CGAL and Triangle can seem un-

fair because they solve different problems. However,

since the complexity of a constructive algorithm is

O(nlog n) and the Lawson algorithm is O(n

), many

times it is preferred to build the Delaunay mesh from

scratch instead of improving an existing one. In some

way, this research is intended to show that a parallel

method based on edge-ﬂips can become fast and use-

ful in practice for applications that not require exact

Delaunay meshes.

The paper is organized as follows: Section 2

presents our data structures and how they are com-

patible with OpenGL. Sections 3 and 4 cover the al-

gorithm and implementation details. Section 5 shows

experimental results with 2D random inputs and 3D

surface triangulations. Section 6 describes some re-

lated work and the similarities and differences with

other GPU-based approaches. Finally, section 7 con-

cludes our work.

A preliminary and short version of this paper was

presented at the EuroCG11 workshop (Navarro et al.,

2011).

2 DATA STRUCTURES

Proper data structures have been deﬁned to efﬁciently

represent a triangulation on the GPU. This representa-

tion is inspired by the Dynamic Render Mesh (Tobler

and Maierhofer, 2006). Figure 1 illustrates the three

main components: Vertices, Triangles and Edges.

Vertices are represented with a one-dimensional

array in the same way as the OpenGL VBO (Ver-

tex buffer object). Each position is of the type (x,y)

or (x,y,z) depending on the used spatial dimension.

The Triangles array is a set of indices to the Ver-

tices array. For each three consecutive indices, a

triangle is deﬁned. Each edge of the Edges array

contains a pair of indices v

, v

to the Vertices ar-

ray and two pairs of indices t

= {t

} and t

Surface Mesh S

Edges

Triangles

Vertices

p-2...

n-2

0 2 4

A C B C

ta tb

tm-1

3m-3 3m-2 3m-1

A B C

...

n-1

...

... ... ...

A B C

tbta

1 2

v1 v2

ta tb

v1 v2

ta tb

v1 v2

p-3

ta tb

v1 v2

p-1

ta tb

v1 v2

51 3

Figure 1: Data structures for mesh rendering/processing.

} to the Triangles array (for boundary edges,

remains unused). This way, an edge can know

its endpoint indices directly through v

, v

or in-

directly via the pairs {Triangles[t

],Triangles[t

]}

and {Triangles[t

],Triangles[t

]}. This redundant

information becomes useful for checking neighbor-

hood consistency after each ﬂip. In addition, indices

to the opposite vertices per edge are stored in the

Opposites array in order to speed up angle computa-

tions (boundary edges have only one opposite vertex).

This data model can be naturally implemented and in-

tegrated with the OpenGL API and CUDA (Nvidia,

2011) or OpenCL. It is important to mention that for

rendering, only the Vertices and Triangles arrays are

accessed by the graphics API; the Edges and the Op-

posites arrays are for efﬁciently accessing neighbor

information. All mentioned arrays use Θ(n) of mem-

ory space, where n is the number of points.

3 ALGORITHM OVERVIEW

The algorithm we propose is iterative. In each iter-

ation, two consecutive phases of parallel computing

are executed:

• Phase 1: Detection, exclusion & processing.

• Phase 2: Repair.

On each iteration the algorithm transforms the

mesh a step closer to the Delaunay mesh. The

algorithm ﬁnishes when the Delaunay triangulation

is reached. The following sub-sections explain the

phases in more detail.

3.1 Detection, Exclusion and Processing

This phase is in charge of three steps: (1) detection of

edges that do not fulﬁll the Delaunay condition, (2)

exclusion of edges that can not be ﬂipped in parallel

and (3) processing the edges that can be ﬂipped in par-

allel. Our algorithm maps threads to edges by using

GRAPP2013-InternationalConferenceonComputerGraphicsTheoryandApplications

a cb

ex(T1) ex(T3) ex(T5)

ex(T2)

ex(T2) ex(T4)

selected subsetexcluded

b ca

ba b cc

F[] array

Figure 2: Exclusion mechanism. Each thread performs an atomic operation ex(T

) to select its triangles. The F array has the

information of which thread has taken a given triangle.

the PRAM model in such a way that thread t

handles

edge e

with i ∈ [0, ne−1] (ne is the number of edges).

The execution threads detect ﬁrst the edges that need

to be ﬂipped (bad edges), then they go through a ﬁl-

ter where only the independent threads survive and

ﬁnally, the survivors ﬂip their edge.

For the detection step, threads test their corre-

sponding edge e against the Delaunay condition by

computing the opposite angles λ and γ of e using the

information from t

, t

. The test must satisfy the fol-

lowing condition:

λ+ γ ≤ π (1)

If the test of equation (1) fails, the edge is a bad edge

and needs to be ﬂipped. On the other hand, if the test

passes, the execution thread ends. Most of the time it

is not possible to ﬂip the complete set of bad edges in

one iteration because the ﬂip of a given edge e com-

promises the consistency of the neighbor edges that

belong to t

and t

. However, it is possible to pro-

cess a subset A of the edges that satisfy the following

condition:

∀e

∈ A T

∩ T

0; T

= {t ∈ T : e ∈ t} (2)

For implementing the exclusion step, the algo-

rithm internally uses a Flags array where Flags[i] ==

Taken if the i-th triangle was ﬂagged by a thread, and

Flags[i] == Free if it was not. Each thread that needs

to ﬂip an edge requires two ﬂags to be set, the ones

associated with the triangles that share its edge. This

operation is done atomically (atomic operations are

sequential only when two or more threads access the

same memory location). When a thread ﬂags the ﬁrst

triangle, some neighbor threads will be excluded (i.e

the ones that failed to catch this ﬂag). When a thread

ﬂags the second one, the rest of the neighbors will get

excluded. Figure 2 shows an example, where edges

a,b and c need to be ﬂipped but only {a,c} or {b,c}

can be processed at the same time. By using condition

(2), the thread associated with edge a is excluded.

For the processing step, the per thread edge-ﬂip

method is designed as a swap of indices between the

associated triangles t

, t

making a rotating effect of

the triangles (see Figure 3). For a given edge e

, our

parallel edge-ﬂip proceeds in the following way:

1. Variables: O[] = Opposites, T[] = Triangles,

E[] = Edges;

2. Get the opposite vertex indices o

= O[i][0]; o

= O[i][1];

3. Get c

∈ t

such that v

= T[c

],v

= T[c

= E[i].t

; c

= E[i].t

;

4. do the edge-ﬂip:

T[c

] = T[o

]; T[c

] = T[o

];

5. Update t

and v

E[i].t

= [o

]; E[i].t

= [c

];

E[i].v

= E[i].t

; E[i].v

= E[i].t

;

T '

131

0 21 3 5

0 32

0 21

1 2

3 54

0 2

1 3

0 1

before

after

rotation / flip

Figure 3: Visual example of edge-ﬂip procedure for e.

The steps of this phase are summarized in Figure 4

showing how threads make their way down.

3.2 Repair

After the parallel edge-ﬂips were executed, incon-

sistent information can be stored on neighbor edges.

Some edges can store references to triangles whom

they no longer belong (obsolete t

and t

pairs). Fig-

ure 5 shows a simple mesh where inconsistent infor-

mation appears at edges d and b after e was ﬂipped.

AGPU-basedMethodforGeneratingquasi-DelaunayTriangulationsbasedonEdge-flips

Figure 4: General view of the detection, exclusion and pro-

cessing mechanism. Each block acts as a thread ﬁlter.

The information of the new triangles to whom d and

b belong are in the triangles that were rotated while

ﬂipping e.

Triangles

Vertexes

0 3 2 0

31 2

Edges

0 2

1 2

0 1 2 3 4 5

T T

Figure 5: Edges marked with a cross are inconsistent.

Fortunately these inconsistencies can be easily identi-

ﬁed with the following two expressions:

q = |v

− t

| + |v

− t

| (3)

w = |v

− t

| + |v

− t

| (4)

If q > 0 (w > 0) then t

) needs to be repaired.

The rotation relations are stored at the moment of

ﬂipping an edge e using an array of rotations R[] of

size m (number of triangles). The triangle that rotated

with t

and t

is t

= R[t

/3] and t

= R[t

/3], re-

spectively. Note that the indices stored in t

and t

point to the Triangles array and each triangle is de-

ﬁned by three consecutive vertex indices.

3.3 Handling Problematic and Worst

Cases

During the ﬁrst phase, there are two scenarios that re-

quire a more detailed explanation: case (1); the exis-

tence of co-circular conﬁgurations and case (2); the

possible existence of dead-locks.

Case (1): if there are co-circular or almost co-circular

conﬁgurations, our algorithm could fall into an inﬁ-

nite loop of edge-ﬂips due to ﬂoating point errors. We

solve this issue by using a small tolerance value in the

evaluation of condition (1):

λ+ γ ≤ π + ε (5)

This leads to ignoring some ﬂips that in theory,

should have been performed. That is why the gen-

erated triangulations may be quasi-Delaunay triangu-

lations and not fully Delaunay triangulations. The ε

value was experimentally estimated.

Case (2): a dead-lock could occur if there exists a

circular chain of triangles, where all edges must be

ﬂipped and each thread can ﬂag only one of its tri-

angles. This kind of chain can not exist because it

must have at least one edge that fulﬁlls the Delaunay

condition: the smallest edge of the chain. Note that

the triangles that share the smallest edge are free to

be ﬂagged by a neighbor thread. Then, in chains like

these there will always be at least one edge that can

be ﬂipped, therefore a dead-lock will never occur.

The known worst case conﬁguration for Law-

son’s sequential algorithm is the one shown in Fig-

ure 6 (Edelsbrunner, 2001).

Figure 6: One of the worst cases for any edge-ﬂip based

method.

This worst-case triangulation has eight vertices,

thirteen edges and six triangles. The algorithm ex-

ecuted ﬁve iterations and the number of ﬂips per it-

eration was {1,2,3,2,1}. As this triangulation gets

larger, the number of triangles increases and the num-

ber of iterations also grows. However, the perfor-

mance of the algorithm is better than the sequential

algorithm, because under the PRAM model the cost

per iteration is Θ(1) as the algorithm can do several

edge ﬂips in parallel. Experimentally, we observed

that for these conﬁgurations the number of required

GRAPP2013-InternationalConferenceonComputerGraphicsTheoryandApplications

iterations is m − 1, with m the number of triangles.

The amount of edge-ﬂips per iteration increases by

one until the m/2-th iteration. Then, the number of

parallel ﬂips decreases by one until the last iteration is

reached. The computational complexity is O(n) (note

that m = O(n)). This is an improvement over the se-

quential method, which in this case is O(n

4 IMPLEMENTATION DETAILS

Nvidia’s CUDA architecture and API were chosen to

implement the kernels, while OpenGL was chosen to

render the triangulations. Using C type data struc-

tures for the mesh model, it is possible to represent

vertex and triangle data via the OpenGL buffer ob-

jects: the VBO (Vertex Buffer Object) and EBO (El-

ement array Buffer Object). In addition, CUDA sup-

ports OpenGL interoperability, meaning that threads

can read and write directly into the VBO and EBO ar-

rays. As with the vertices, the edges are also sent to

the GPU at mesh loading time, and they can option-

ally be sent back at the end if needed (for example, to

save the mesh into a ﬁle). The exclusion step is han-

dled with atomic operations available from the CUDA

C API. The performance is increased by using loop

unrolling, coalesced memory on per edge data, mini-

mal branching, constant types and shared memory to

reduce registry usage. The implementation is avail-

able as a functionality of cleap, an open source C/C++

library (http://sourceforge.net/projects/cleap/).

5 EXPERIMENTAL RESULTS

In the following sections, we will refer to our imple-

mentation as MDT (Massive Delaunay Transformer).

In order to analyze its performance and its behavior,

we will evaluate the following aspects of the algo-

rithm:

• Quality of the generated triangulations: how close

they are to being Delaunay triangulations

• Computational time against (a) the Triangle soft-

ware and (b) the CGAL library and (c) our own

implementation of Lawson’s algorithm

• Number of edges that can be ﬂipped, number of

edges that were ﬂipped and number of edges that

could not be ﬂipped at each iteration.

• Inﬂuence of the mesh size in the number of itera-

tions

Table 1 shows the hardware used for the evaluation.

We have selected the algorithm available inside Trian-

gle and the one available in the CGAL library because

they are known to generate full Delaunay meshes

and they have O(nlog n) sequential implementations.

Note that these algorithms start from a PSLG geome-

try and not from a given triangulation.

Table 1: Hardware used for testing.

Hardware Detail

CPU AMD Phenom I X4 9850 2.5 Ghz

GPU Nvidia Geforce GTX 580

Mem 4GB RAM DDR2 800Mhz

5.1 2D Triangulations

The set of tests consists of fully random bad-quality

triangulations in the sense that they need a high num-

ber of edge ﬂips to be transformed into Delaunay tri-

angulations. These inputs are generated by placing

random points inside two adjacent triangles starting

from a square domain. For each new inserted point,

the triangle that includes the point is divided into three

smaller triangles as shown in Figure 7. The size of the

test triangulations ranges from 100 thousand to 5 mil-

lion points and the smallest angle of all the meshes is

practically zero (less than 10

−6

radians).

Figure 7: Construction of a full random mesh.

In Figure 8(a) we present the computational time

for each mesh size. It can be observed that MDT is

approximately three times faster in these bad quality

triangulations than the algorithms inside the CGAL

library and the Triangle software. There is also a

speedup of 50× with respect to our sequential im-

plementation of Lawson’s original edge-ﬂip method.

In Figure 8(b) we show the quality of the generated

meshes in the sense of how close they are to being full

Delaunay meshes. We took the meshes generated by

the CGAL library as reference triangulations. Both

MDT and Triangle generate different triangles with

respect to the reference triangulations. However, the

missed triangles, i.e, the triangles that are in the trian-

gulations generated by MDT and Triangle, and are not

in the reference triangulations, are less than 0.05%.

The error rate is computed as the number of missed

triangles with respect to the total number of triangles

of the reference triangulation. Note that the triangu-

lations generated by CGAL and Triangle are different

because Triangle modiﬁes the vertex list if two points

AGPU-basedMethodforGeneratingquasi-DelaunayTriangulationsbasedonEdge-flips

0.01

0.1

100

1000

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

time [s]

# vertices (10^6)

CPU-Construction vs GPU-Transformation

Lawson

CGAL

Triangle

MDT

1e-05

1e-04

1e-03

1e-02

1e-01

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

% error

# vertices (10^6)

% of error

MDT / CGAL

Triangle / CGAL

(a) (b)

Figure 8: (a) Computational time for all methods and (b) Differences of Triangle and MDT triangulations with respect to

CGAL triangulation.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

# iterations

# vertices (10^6)

GPU iterations

0e+00

5e+05

1e+06

2e+06

3e+06

0 5 10 15 20 25 30 35 40 45

# edge flips

# iteration

Edge flips per iteration for input n=5M

(a) (b)

Figure 9: (a) Number of iterations vs. problem size, (b) number of edge-ﬂips vs. iterations for the biggest input case (a mesh

of 5 million points).

are too close to each other. Some important aspects of

the behavior of the MDT are shown in Figure 9. Fig-

ure 9(a) shows an approximated curve that represents

the number of iterations versus the mesh size. Empiri-

cally, this curve shows a complexity of O(log n). Fig-

ure 9(b) shows how the number of edge ﬂips changes

among the iterations while transforming the triangu-

lation of 5 million vertices and approximately 15 mil-

lion edges. We can observe that during the ﬁrst half of

the iterations, most of the edge-ﬂips are done, while

in the last iterations few edge ﬂips are executed. For

this input, both MDT and Lawson’s edge-ﬂip methods

performed approximately 37 million edge-ﬂips. It is

worth mentioning that in all the tested triangulations,

the percentage of edge ﬂips done in parallel was more

than 80% (i.e., the excluded threads were less than

20%).

5.2 3D Surface Triangulations

MDT was originally intended for improving the min-

imum angle of smooth 3D surface triangulations for

the modeling of tree stem deformations. An edge e

is considered for ﬂipping only if the normal vectors

of the two neighbor triangles that share e are almost

parallel according to some threshold value. Figure 10

shows the different test inputs with their correspond-

ing number of vertices, edges and triangles.

Figure 10: 3D Surface test-case meshes.

The dragon was taken from the Stanford Com-

puter Graphics Laboratory, the horse from Cyberware

Inc, the Moai from the GeomView examples and the

Inﬁnite was built with our custom tools. Figure 11

shows the performance of the MDT and our imple-

GRAPP2013-InternationalConferenceonComputerGraphicsTheoryandApplications

mentation of the Lawson sequential algorithm. (The

traced lines were added to connect the measurements

using the same implementation.) As expected, the

MDT method achieves a speedup of 80× . Table 2

shows the number of ﬂipped-edges and the percent-

age of excluded threads for each iteration in the four

input meshes. As in the 2D tests, the ﬁrst iterations

do most of the required edge-ﬂips. The number of

iterations is lower than in the 2D tests because these

surface triangulations have an overall better quality.

0.0001

0.001

0.01

0.1

Moai Dragon Horse Infinite

time [s]

mesh

3D surface improvement times

MDT

Lawson CPU

Figure 11: Performance results on 3D meshes.

6 RELATED WORK

There is a considerable amount of work on the subject

of computing Delaunay triangulations, from sequen-

tial implementations (Paul Chew, 1989; Shewchuk,

1996; De Berg, 2000) to parallel ones (Antonopou-

los et al., 2005; Healey et al., 1997; Kohout and

Kolingerov´a, 2003; Rong et al., 2008). Most of these

works belong to the case when a Delaunay triangu-

lation needs to be computed from a set of points or

from a PSLG, and not from an existing triangulation.

To the best of our knowledge, only recently has the

edge-ﬂip technique been used in the design of parallel

GPU-based algorithms for the generation of triangu-

lations. Cao Thang (Cao, 2010) developed an algo-

rithm for the generation of a Delaunay triangulation

and its Voronoi diagram from a set of points. One step

of his method performs GPU-based edge-ﬂips; the al-

gorithm maps threads to triangles. Cervenansk´y et al.

(Cervenansk´y et al., 2010) propose a GPU-based tri-

angulation algorithm for image processing. Edges are

ﬂipped in parallel, as we also do, but by using a differ-

ent approachfor deciding which subset of them can be

ﬂipped in parallel (i.e., they do not use Delaunay con-

ditions). Harada (Harada, 2011) proposed a constraint

solver for rigid body simulation. In his work, threads

are assigned to pairs of adjacent triangles by using

atomic operations in the same way as we do for decid-

ing which non-Delaunay edges can be ﬂipped in par-

allel. Unfortunately,we could not compare our imple-

mentation directly with the parallel edge-ﬂip methods

of the authors because different hardware was used

in their results and they only compare their edge-ﬂip

routine against prior work of themselves. We think it

is a better practice to use the standard method of com-

parison in parallel computing; to measure speedups

against a reference sequential algorithm.

7 DISCUSSION

AND CONCLUSIONS

We have presented a GPU-based implementation for

computing quasi-Delaunay triangulations. The so-

lution is compatible with OpenGL, handles special

cases such as co-circular point conﬁgurations and is

free of dead-locks. The behavior of the MDT shows

several interesting aspects. The amount of edge-ﬂips

per iteration quickly decreases, making the ﬁrst half

of the iterations much more important than the rest.

We report an exclusion rate of threads under 20%

serving as a guarantee that parallelism can indeed be

useful. The curve of the number of iterations as a

function of the mesh size empirically shows a com-

plexity of O(log n). This is a good behavior since

GPU methods are aimed at addressing large problems

and less iterations means more parallelism. The worst

behavior of the algorithm is when edge-ﬂips can not

be done in parallel. In this case the computational

complexity for the sequential and parallel algorithms

is the same.

We analyzed the performance of MDT under dif-

ferent inputs; bad-quality random 2D triangulations

and popular 3D surface meshes. Our experimental

evaluation shows that the percentage of missed tri-

angles of the triangulations generated by MDT with

respect to the triangulations generated by CGAL was

less than 0.05% in all experiments. On these bad qual-

ity meshes, MDT obtains a speedup of up to 50× with

respect to Lawson’s O(n

) edge-ﬂip method on CPU

and a speedup of 3× with respect to the 2D O(nlog n)

algorithms available inside CGAL and Triangle. This

speedup seems to be not so impressive as we are com-

paring GPU with CPU implementations and quasi-

Delaunay triangulations with exact ones. However, it

is important to mention that the MDT implementation

is sensitive to the topology of the input triangulation

and the CGAL and Triangle implementations are not

because they are constructive methods. This means

that if the input mesh needs little work to become De-

launay, the speedup of MDT with respect to CGAL

and Triangle should be higher than 3×.

AGPU-basedMethodforGeneratingquasi-DelaunayTriangulationsbasedonEdge-flips

Table 2: Detail of effective edge-ﬂips and parallelism ratio at each iteration for the 3d examples.

Moai Horse Dragon Inﬁnite

#iteration ﬂipped excluded ﬂipped excluded ﬂipped excluded ﬂipped excluded

1 1,786 3.88% 31,453 0.08% 106,101 6.63% 508,502 8.62%

2 215 1.83% 4,376 0.21% 22,608 4.08% 237,340 8.48%

3 41 4.66% 598 1.65% 4,455 5.86% 95,975 3.81%

4 3 0% 131 4.38% 995 1.49% 29,568 5.95%

5 1 0% 33 8.34% 207 1.43% 7,882 4.52%

6 10 0% 34 0% 2,844 1.6%

7 4 0% 1 0% 762 5.81%

8 1 0% 195 6.25%

9 40 0%

10 11 0%

Total ﬂips 2,046 36,596 134,401 883,119

Our proposed implementation is useful for appli-

cations that need to quickly improvethe minimum an-

gle of triangulations and visualize a mesh at the same

time; dynamic terrain manipulation and tree stem de-

formations to name some examples. In the near fu-

ture, we will compare the in-circle test with the oppo-

site angle test used in this implementation. We also

want to test the algorithm with bad quality 3D surface

meshes.

ACKNOWLEDGEMENTS

This work was partially supported by Fondecyt

Project N

1120495.

REFERENCES

Antonopoulos, C. D., Ding, X., Chernikov, A., Blagojevic,

F., Nikolopoulos, D. S., and Chrisochoides, N. (2005).

Multigrain parallel Delaunay mesh generation: chal-

lenges and opportunities for multithreaded architec-

tures. In Proceedings of the 19th annual international

conference on Supercomputing, ICS ’05, pages 367–

376, New York, NY, USA. ACM.

Cao, T. T. (2010). Computing 2d Delaunay triangulation

using GPU. Manuscript in preparation.

Cervenansk´y, M., T´oth, Z., Starinsk´y, J., Ferko, A.,

and Sr´amek, M. (2010). Parallel GPU-based data-

dependent triangulations. Computers & Graphics,

34(2):125–135.

De Berg, M. (2000). Computational Geometry: Algo-

rithms and Applications. Springer-Verlag TELOS,

Santa Clara, CA, USA.

Edelsbrunner, H. (2001). Geometry and topology for mesh

generation (Cambridge monographs on applied and

computational mathematics).

Fortune, S. (1993). A note on Delaunay diagonal ﬂips”.

Pattern Recognition Letters, 14(9):723 – 726.

Harada, T. (2011). A parallel constraint solver for a rigid

body simulation. In SIGGRAPH Asia 2011 Sketches,

SA ’11, pages 22:1–22:2, New York, NY, USA. ACM.

Healey, R. G., Minetar, M. J., and Dowers, S., editors

(1997). Parallel Processing Algorithms for GIS. Tay-

lor & Francis, Inc., Bristol, PA, USA.

Kohout, J. and Kolingerov´a, I. (2003). Parallel Delau-

nay triangulation based on circum-circle criterion. In

SCCG ’03: Proceedings of the 19th spring conference

on Computer graphics, pages 73–81, New York, NY,

USA. ACM.

Lawson, C. L. (1972). Transforming triangulations. Dis-

crete Mathematics, 3(4):365 – 372.

Navarro, C., Hitschfeld-Kahler, N., and Scheihing, E.

(2011). A parallel GPU-based algorithm for Delaunay

edge-ﬂips. In Abstracts from 27th European Workshop

on Computational Geometry (EUROCG2011), pages

75–78. Morschach, Switzerland.

Nvidia (2011). NVIDIA CUDA Compute Uniﬁed Device

Architecture - Programming Guide.

Paul Chew, L. (1989). Constrained Delaunay triangulations.

Algorithmica, 4:97–108.

Rong, G., Tan, T.-S., Cao, T.-T., and Stephanus (2008).

Computing two-dimensional Delaunay triangulation

using graphics hardware. In I3D ’08: Proceedings of

the 2008 symposium on Interactive 3D graphics and

games, pages 89–97, New York, NY, USA. ACM.

Shewchuk, J. R. (1996). Triangle: Engineering a 2d quality

mesh generator and Delaunay triangulator. In ACM,

editor, First Workshop on Applied Computational Ge-

ometry, pages 124–133. (Philadelphia, Pennsylvania).

CGAL (2012). CGAL, Computational Geometry Algo-

rithms Library. http://www.cgal.org.

Tobler, R. F. and Maierhofer, S. (2006). A mesh data struc-

ture for rendering and subdivision. In WSCG ’2006:

Proceedings of WSCG (International Conference in

Central Europe on Computer Graphics, Visualization

and Computer Vision), pages 157–162.

GRAPP2013-InternationalConferenceonComputerGraphicsTheoryandApplications