EFSAT

An Exact and Efﬁcient Triangle Intersection Test Hardware

Andreas Raabe, Jan Tietjen and Joachim K. Anlauf

Technical Computer Science, University of Bonn, R

omerstr. 164, 53117 Bonn, Germany

Keywords:

Intersection test, Hardware acceleration, Fixed-point, Exact.

Abstract:

Software implementations that test two triangles for intersection often favour speed over exact calculation.

They leave it to the user to choose an exact or a fast test depending on the domain of application. Hardware

implementations can not opt to make this distinction since users will always expect an accelerator hardware

to be applicable in all possible settings. This paper introduces a novel approach towards exact intersection

testing of triangles. It is based on the separating axes test and lends itself well to hardware implementation.

To be integrable into a hierarchical collision detection design this test needs to be extremely resource efﬁcient.

Thus, it does not iterate until an exact solution is found, but instead categorises results as correct and possi-

bly incorrect. It is implemented using 18-bit ﬁxed-point numbers, while still maintaining resolutions that can

keep up with double-precision ﬂoating-point implementations. The proposed test is integrated into a hierar-

chical collision detection FPGA-design accelerating collision queries by an order of magnitude. In a realistic

benchmark less than 0.9% of possibly incorrect results are reported without impairing system performance.

1 INTRODUCTION

In physically-based simulation most of the process-

ing effort is spent on collision detection (Plante et al.,

2001). Hence hardware acceleration of collision

queries signiﬁcantly speeds-up the overall simulation

and liberates the CPU for other tasks. Since surfaces

are often composed of triangles, a last step in process-

ing the query is usually intersection testing of triangle

pairs. When using hierarchical or spacial subdivision

approaches only a relatively small number of triangles

will be singled out for testing. This and the concur-

rent processing of the hierarchy renders the overall

performance independent of the delay of the triangle

intersection test. Still, it is highly beneﬁcial to imple-

ment it in hardware as well, since it will reduce the

number of triangles transferred back to the host. This

reduces the delay caused by bus transfers which has

signiﬁcant impact on overall performance.

Software implemented algorithms for testing two

triangles for intersection often favor speed over ex-

act calculation. Examples for this are the well known

intersection tests of Held (Held, 1996) and M

oller

oller, 1997). They leave it up to the user to choose

an exact or a fast test depending on the domain of ap-

plication. Hardware implementations can not opt to

make this distinction since users will always expect an

accelerator hardware to be applicable in all possible

settings. Known approaches for exact collision detec-

tion, on the other hand, are usually iterative in nature.

In general this iterativeness arises through the use of

arbitrary precision libraries like CORE (Karamcheti

et al., 1999) or LEDA (Mehlhorn and Schirra, 2001)

in standard algorithms like the ones stated above.

This renders every arithmetic operation a potential it-

eration over the necessary precision. (Robbins and

Whitesides, 2003) uses Shewchuk’s orientation test

(Shewchuk, 1997) in combination with M

oller’s in-

tersection test. This is more elegant, since it solely

needs to iterate the calculation of the 3D orientation

predicate given as the sign of a polynomial of de-

gree 3 in the coordinate values of four points. A well

known fact in hardware development is that unrolling

iterative algorithms renders them very resource con-

suming. Since in general one will want to operate

the triangle intersection concurrently to a hierarchy

or a spatial subdivision traversal, they both need to

be small to ﬁt into the same chip. This avoids addi-

tional delay caused by inter-chip communication. A

very fast and yet hardware efﬁcient hierarchical col-

lision detection architecture based on k-DROPs was

proposed by (Raabe et al., 2006). It is also provably

exact, thus it does not return any false reports of non-

collision. It will be used in the following as a basis

and the proposed exact intersection test will be inte-

355

Raabe A., Tietjen J. and K. Anlauf J. (2009).

EFSAT - An Exact and Efﬁcient Triangle Intersection Test Hardware.

In Proceedings of the Fourth International Conference on Computer Graphics Theory and Applications, pages 355-360

DOI: 10.5220/0001775203550360

 SciTePress

grated into the overall design. Usage of hierarchical

intersection test yields a logarithmic average run-time

in realistic scenarios (Weller et al., 2006). An AABB

based broad-phase collision test was implemented in

hardware in (Woulfe et al., 2007). It could be applied

prior to the hierarchy traversal in future implementa-

tions.

2 EFSAT

2.1 Separating Axes Test

The separating axes approach as proposed in

(Gottschalk, 1996) tests two convex polytopes for in-

tersection by projecting them onto a set of test axes. If

the projections do not intersect it follows that the axis

is the normal of a plane separating the polytopes. If

neither any normal of any of the faces, nor any cross

products of edges of both polytopes are a separating

axis then the polytopes intersect.

Triangles can be interpreted as degenerated poly-

topes in 3D and thus the SAT approach can be applied

to them. Let

A and

B be two triangles. In general

and

B will life in different reference frames (RFs). To

test them for intersection, we need to transform them

into a common RF. Let this common RF be that of

A without loss of generality and let M be the accord-

ing transformation matrix. Let the resulting triangles

be B (with vertices W

, W

, and W

) and A =

A (with

vertices V

, V

, and V

Using around indexing, the normals n

, n

and the

cross products C

k,l

can now be calculated as follows.

:= (V

−V

) ×(V

−V

)

:= (W

−W

) ×(W

−W

)

k,l

:= (V

k+1

−V

) ×(W

l+1

−W

)

(1)

Then the triangles’ points are projected onto any test

axis L out of this set by

:= V

∗L p

:= W

∗L (2)

The projection intervals on the axis are

:= [min

{

, p

}

, max

{

, p

}

]

:= [min

{

, p

}

, max

{

, p

}

]

(3)

If I

and I

do not intersect a separating axis

is found. It follows that the triangles are non-

intersecting. If none of the axes is separating the tri-

angles do intersect.

This test lends itself well to resource efﬁcient

hardware implementation as was discussed in (Raabe

et al., 2008). The structure of this non-exact test

serves as a basis for the exact intersection test EFSAT

proposed in the following.

2.2 Pre-Processing

This calculation will (of course) not be computation-

ally exact if it is implemented using ﬁxed-point or

ﬂoating-point types. Thus the images resulting from

the projections will differ from the mathematically

correct image. Size and sign of the error depend

on the size of the numbers and the rounding mode

used for arithmetic operations (e.g., IEEE ﬂoating-

point arithmetic defaults to rounding 0.0001 to 0.00

(error= 0.0001), and 0.001 to 0.01 (error= −0.001)).

To enable use of ﬁxed-point data types it is im-

perative to normalize the objects to prevent a tremen-

dous loss of precision. Thus, all triangles are nor-

malized relatively to the size of the biggest one. To

enable bounding of all subsequent calculations as-

sume that a hierarchical bounding-volume test was

executed prior to the triangle intersection test. We are

using k-DROPs as bounding-volumes, and thus obtain

√

3 as a bound on the maximum distance of triangles

fed into the intersection test. Thus, numbers with 4-

bit integral digits are used. Any other bound will work

as well.

2.3 Fixed Point Interval Arithmetic

An obvious solution is using an arbitrary precision

datatype. As previously discussed this is very re-

source consuming when implemented in hardware.

Therefore we choose not to provide an exact solu-

tion for every query, but to categorize the results in

exact and possibly incorrect. Due to the grave space

restriction imposed by the FPGA-implementation we

use ﬁxed-point arithmetic, and exploit that we have

complete control over the rounding modes and bit

width of the numbers.

Let a

n,k

be a ﬁxed-point number of length n, of

which the k most signiﬁcant bits are the integral part

of the number. m := n −k is the number of fractional

bits. Let ⊕ furthermore denote bit concatenation and

let a ∈ R, −2

k−1

≤ a < 2

k−1

−1. Now let ba

n,k

c be

the next smaller or equal ﬁxed-point number of the

given dimensions and da

n,k

e be the next greater one.

The ﬁxed-point resolution is then given by

EPS

n,k

:= da

n,k

e−ba

n,k

c = 0. 0 ···0

|{z}

m−1

1 (4)

Now we can initialize ﬁxed-point intervals, so that

they contain the input data.

n,k

(a) :=

n,k

, A

n,k

c, da

n,k

(5)

GRAPP 2009 - International Conference on Computer Graphics Theory and Applications

356

mathematically

correctprojection

maximumprojection

minimumprojection

Figure 1: The actual projection using interval arithmetic can

be interpreted as a projection of the point intervals onto the

original axis. Then the minimum projection is underesti-

mated and the maximum projection is overestimated.

If this is done to all coordinates of a triangle point the

resulting intervals can be visualized as a cube aligned

with the discretization grid induced by the ﬁxed-point

data-type.

Now operators ⊗ ∈

{

+, ·, −

}

can be deﬁned on

intervals in a way, that the resulting interval contains

all possible outcomes of applying the operation to any

pair of numbers out of the intervals. Since linear op-

erators are used exclusively these operations can be

implemented very resource efﬁcient using case dis-

crimination based on the signs of the interval bounds.

Since this is simple, but a bit lengthy it is left out

here. To prevent intermediate results from becoming

to large in bit-length they are reduced to the original

bit-length by rounding. This is easily done by round-

ing towards −∞ and, if it is supposed to be an upper

limit EPS

n,k

of the target type is added.

Now the projection 2 is applied to the interval

points using these interval operators and a test axis

also described using intervals. This can geometri-

cally be interpreted as a projection onto the original

test axis where the projection of the maximum possi-

ble value of the point will be overestimated to avoid

rounding errors. The minimum value will be underes-

timated accordingly. 1 illustrates this.

2.4 Testing Interval Images for

Intersection

After applying the previously discussed data types

and operators, the points of the triangles can be pro-

jected using interval arithmetic. Since again intervals

result it is no longer clear, how they need to be trans-

formed into images. Therefore it is necessary to ge-

ometrically interpret the presented algorithm.

Consider 2. Initializing the ﬁxed-point intervals with

point coordinates will turn the points into cubes of

EPS

n,k

side length. Transforming triangle B into A’s

reference frame will turn those cubes into cuboids,

whose side lengths depend on the absolute values of

the original coordinates and of the matrix entries. The

actual projection of these cuboids results in a mini-

mum and a maximum projection each. From these we

Figure 2: Transforming one triangle into the reference

frame of the other using ﬁxed-point numbers contributes to

the derivation from the mathematically correct result. This

is respected in the size of the coordinate intervals.

mightbe

occupied

mightbe

occupied

min

max

min

max

(a)

mightbe

occupied

mightbe

occupied

min

max

min

max

(b)

mightbe

occupied

mightbe

occupied

min

max

min

max

(c)

Figure 3: Projecting the point intervals onto the test axis

can lead to different conﬁgurations of the resulting images

of the interval boundaries. Which case applies depends on

the distance of the point intervals and the relative distance

to the test axis.

can derive the projection interval 3. The boundaries of

the projection interval are now intervals themselves,

denoting the range the extremal points of the triangle

could possibly be projected on. The projection inter-

vals can be pairwise intersecting or not.

EFSAT - An Exact and Efficient Triangle Intersection Test Hardware

357

min

max

min

mightintersect

Figure 4: Only if two intervals that contain the minimum

and the maximum projection interval of one of the trian-

gles (e.g., A

min

and A

max

) each are non-intersecting it can

be derived, that the triangles are also non-intersecting.

3 illustrates some of the cases that can occur.

Which case applies depends on the distance of the co-

ordinate intervals and the relative distance to the test

axis. The closer the point intervals are located and the

farther a point is from the test axis the more probable

it is that the projections of the point intervals intersect.

If the images of the point intervals are separated,

it can be derived which point contributes to the maxi-

mum and the minimum interval (3(a)). If neighboring

images intersect this is no longer possible (3(b)). Still,

in both of these cases a line segment which is deﬁni-

tively occupied by the image can easily be identiﬁed.

This will be used to identify deﬁnitive intersections

with another triangle image. 3(c) shows a conﬁgura-

tion were all three images of the point intervals inter-

sect. No line segment is occupied by the image for

sure.

But in any case the greatest point projection can

vary only between the greatest maximum projection

and the greatest minimal projection. Let these be de-

noted by a

max

and a

max

. The smallest point projection

is located between the smallest minimum projection

and the smallest maximum projection. Let these be

denoted a

min

and a

min

respectively. Let the minimum

(and maximum) interval of a triangle A be denoted by

min

:= [a

min

, a

min

] (and A

max

:= [a

max

, a

max

])

Thus the images of the triangles become ’blurred’.

They now become intervals of intervals (compare 3).

Thus, it is now longer obvious, which conﬁgurations

of two triangle projections need to be interpreted as

intersecting and which implicate that the triangles are

non-intersecting.

The latter is the simpler of the two cases and thus

is tackled ﬁrst. Consider the two intervals that con-

tain the extreme boundaries of each projected triangle

A = [a

min

, a

max

] and B = [b

min

, b

max

]). Only if these

two intervals are non-intersecting it can be derived,

that the triangles are also non-intersecting, since in

all remaining cases the mathematically correct pro-

jections might intersect (see 4).

This accounts to the following criterion:

min

> a

max

) ∨



min

> b

max



⇒ separation (6)

Checking for intersections is more complicated.

Especially conﬁgurations were triangle images do not

occupy a line segment for sure (see 3(c)) are causing

min

max

guaranteeintersection

Figure 5: If x ∈ A

max

, y ∈ B

max

implies that x < y, then

the projections will intersect for sure only if x ∈ A

max

, y ∈

min

⇒ y < x.

problems. If this applies to both triangles it is im-

possible to identify cases were the images intersect

for sure. To cover all remaining conﬁgurations mul-

tiple cases have to be discriminated. Consider 5. If

x ∈ A

max

, y ∈ B

max

implies that x < y, then the pro-

jections will only intersect for sure, if x ∈ A

max

, y ∈

min

⇒ y < x. This accounts to the following crite-

rion:

case(a

max

< b

max

) :



min

< a

max



⇔ intersection

(7)

Very analogously further criteria can be identiﬁed:

case



max

< a

max



: (a

min

< b

max

) ⇔ intersection

case(a

min

< b

min

) :



max

< b

min



⇔ intersection

case



min

< a

min



: (b

max

< a

min

) ⇔ intersection

(8)

If the images are not deﬁnitively separate, but none of

the above cases applies we know that the maximum

intervals A

max

and B

max

intersect and that the mini-

mum intervals A

min

and B

min

intersect. In this case

we have to ensure, that the surely occupied spaces do

overlap, before an intersection can be derived.

All remaining cases :

min

< b

max

) ∧



min

< a

max



⇔ intersection

(9)

If the images are neither intersecting nor separated

for sure, the result of the axis test is marked as pos-

sibly incorrect. We experimented with various ap-

proaches to provide an educated guess on the exact

outcome, but none yielded satisfying results. Thus

the simplest solution was chosen and all possibly in-

correct results are assumed to be intersections. This

is the weaker assumption and will be corrected by

any deﬁnitive separating axis found, since in this case

the triangles do not intersect for sure. If no separat-

ing axis was found, but any of the axes tests returned

a possibly incorrect intersection the overall result of

the triangle intersection test is ’possibly incorrect in-

tersection’. This enables the user to decide on how

to proceed. Here an exact software intersection test

could be applied for example. This way the number

of possibly incorrect results has a major impact on the

overall performance and must be decreased as much

as possible, without impairing resource consumption.

GRAPP 2009 - International Conference on Computer Graphics Theory and Applications

358

Figure 6: Two triangles that are relatively far from each

other considering their sizes. Since they are very small

considering the discretization grid induced by the data-type

used for the images, the latter still intersect.

2.5 Decreasing the Number of Possibly

Wrong Results

The target architecture is a Xilinx Virtex II (XC

2V6000, speed grade -4) on an Alpha Data ADM-

XRC-II board with 256 MB DDR-RAM at 100MHz

clock frequency. The FPGA features 144 18-bit mul-

tipliers and 6 million gate equivalents. Cascading the

multipliers to yield bit widths up to 32-bit is pro-

hibitively expensive since it quadruples the number of

multipliers used. For benchmarking we integrated the

previously introduced design into the hierarchical col-

lision detection architecture presented in (Raabe et al.,

2006). We used two identical objects (a car headlight)

with 5947 triangles. They are placed in different dis-

tances from each other and with different rotations.

For each constellation, the time to detect all intersect-

ing triangles is determined. Due to the 18-bit restric-

tion this yields relatively poor results. In this simu-

lation only 16.8% of the queries are exact outcomes.

This is unacceptable since it implies that the host PC

needs to retest almost all triangle pairs. Therefore a

very efﬁcient and effective optimization is proposed

in the following.

We already normalized the triangles in 2.2 to en-

able ﬁxed-point implementation. Still there can be

very small triangles compared to the largest one. In

the presented framework triangles of sizes close to

the ﬁxed-point resolution cause numerical problems,

even if the triangles are not closely located with re-

spect to their sizes (6). To avoid such conﬁgurations

the triangle pair can be re-normalize individually on-

chip prior to the actual intersection test. This de-

creases the ﬁxed-point resolution relatively to the dis-

tance of the triangles.

One way to accomplish this is to shift the barycen-

ter of the triangles into the origin and stretch them to

the outer limits of the ﬁxed-point domain. The lat-

ter can be done individually for the coordinate axes to

maximize the effect without impairing the outcome

of the collision query. To enable efﬁcient implemen-

tation, this is approximated by shifting an arbitrary

point of the triangles into the origin and shifting the

rest consistently. Afterward bit-wise or is applied to

the absolute values of the coordinates of all 6 triangle

100

110

120

130

140

150

160

0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

Runtime / ms

Object Distance

Software

FPGA with LTA and two pipelines

Figure 7: The EFSAT implementation is integrated into a

hierarchical collision detection accelerator hardware. The

overall design still yields a speed-up of an order of mag-

nitude compared to a state-of-the-art software implemen-

tation. The latter does not provide categorization of exact

results or any guarantees on correctness.

points for their x-,y-, and z-coordinates. The number

of leading zeros now denotes the number of bits in-

signiﬁcant for the further calculation. All coordinates

are shifted accordingly. Now only the matrix multi-

plication that transforms

B into A’s reference frame

is implemented in 32-bit precision. The projection it-

self is done in 18-bit. Using this scheme 99.1% of the

results are exact. This will sufﬁce for the vast major-

ity of applications and enables retesting in all others

without impairing the overall performance.

3 RESULTS

3.1 Comparing EFSAT to M

oller and

SAT

The overall design still yields a speed-up of an order

of magnitude compared to a state-of-the-art software

implementation (Zachmann, 1998) of a hierarchical

collision detection running on a system with an iden-

tical memory interface (see 7). The latter does nei-

ther provide categorization of exact results, nor does

it provide any guarantees on correctness.

In terms of precision the presented approach can

even compete with double precision ﬂoating-point

implementations of the M

oller and the SAT approach.

1 shows a comparison of the three approaches. As

can be seen there exist cases where SAT and M

oller

disagree. Even in some of these cases the approach

presented in this paper provides a deﬁnitive answer to

the query.

EFSAT - An Exact and Efficient Triangle Intersection Test Hardware

359

Table 1: Comparing EFSAT with double precision ﬂoating-

point implementations of M

oller and SAT.

EFSAT M

oller SAT %

Sep. Sep. Sep. 70.93

Sep. Sep. Int. 0

Sep. Int. Sep. 0.02

Sep. Int. Int. 0

Int. (p.i.) Sep. Sep. 0.13

Int. (p.i.) Sep. Int. 0

Int. (p.i.) Int. Sep. 0.04

Int. (p.i.) Int. Int. 0.77

Int. Sep. Sep. 0

Int. Sep. Int. 0

Int. Int. Sep. 0

Int. Int. Int. 28.13

3.2 Resource Consumption

Implemented in VHDL and synthesized, placed and

routed with Xilinx ISE 8.1 the design’s resource con-

sumption is extraordinarily modest. It uses a total of

only 72 18-bit multipliers and 48% of available gates.

4 CONCLUSIONS

This paper presents the EFSAT approach, a ﬁxed-

point hardware implementation of the SAT algorithm,

which categorizes its results into exact and possibly

incorrect. An additional optimization improves the

resolution of the algorithm, so that it can compete

with double precision ﬂoating-point implementations

of M

oller’s algorithm and the standard SAT. The ap-

proach is extremely resource efﬁcient. It was imple-

mented and tested in VHDL. It utilizes a total of only

72 18-bit multipliers and 48% of available gate equiv-

alents. This enables ﬁtting it into a Xilinx Virtex-

II XC 2V6000 together with a hierarchy traversal

module. The overall design is ten times faster than a

state-of-the-art software implementation running on a

system with identical memory bandwidth, which does

not provide any guarantees on correctness or catego-

rization of the results. This renders the EFSAT imple-

mentation resource efﬁcient, fast, and exact.

5 FUTURE WORK

Due to deﬁnition 1 every test axis is orthogonal to

at least two triangle edges. Thus the projections of

the two triangle points deﬁning this axis are identical.

This is exploited in the original SAT-test to reduce the

number of projections and comparisons. It remains

unclear if this holds for the given algorithm as well

and thus will be evaluated.

In all our experiments EFSAT returns correct an-

swers to collision queries concerning colinear trian-

gles, although only 11 axes are tested instead of the

17 necessary for this in the original SAT. It remains

an open problem if this can be generalized.

Currently we are working on a software imple-

mentation of the SAT algorithm using interval arith-

metic and ﬂoating-point numbers to provide a fast and

precise arithmetic ﬁlter.

REFERENCES

Gottschalk, S. (1996). Separating Axis Theorem. Technical

Report TR-96-024.

Held, M. (1996). ERIT – A collection of efﬁcient and re-

liable intersection tests. Technical Report, University

at Stony Brook.

Karamcheti, V., Li, C., Pechtchanski, I., and Yap, C. (1999).

A core library for robust numeric and geometric com-

putation. In SCG ’99, New York, USA. ACM.

Mehlhorn, K. and Schirra, S. (2001). Exact computation

with leda real - theory and geometric applications. In

Symbolic Algebraic Methods and Veriﬁcation Meth-

ods, pages 163–172.

oller, T. (1997). A Fast Triangle-Triangle Intersection

Test. journal of graphics tools, 2(2):25–30.

Plante, E., Cani, M.-P., and Poulin, P. (2001). A layered

wisp model for simulating interactions inside long

hair. In Computer Animation and Simulation 2001,

Computer Science.

Raabe, A., Hochgurtel, S., Zachmann, G., and Anlauf, J. K.

(2006). Space-Efﬁcient FPGAAccelerated Collision

Detection for Virtual Prototyping. In Design Automa-

tion and Test (DATE), pages 206211, Munich, Ger-

many.

Raabe, A., Nett, A., and Niers, A. (2008). A Reﬁnement

Case-Study of a Dynamically Reconﬁgurable Inter-

section Test Hardware. In ReCoSoc08.

Robbins, S. and Whitesides, S. (2003). On the reliability of

triangle intersection in 3d. In ICCSA (3), pages 923–

930.

Shewchuk, J. R. (1997). Adaptive precision ﬂoating-point

arithmetic and fast robust geometric predicates. Dis-

crete & Computational Geometry, 18(3):305–368.

Weller, R., Klein, J., and Zachmann, G. (2006). A model for

the expected running time of collision detection using

aabb trees. In Eurographics Symposium on Virtual En-

vironments (EGVE), Lisbon, Portugal.

Woulfe, M., Dingliana, J., and Manzke, M. (2007). Hard-

ware accelerated broad phase collision detection. In

SIGGRAPH 2007, New York, USA. ACM SIG-

GRAPH.

Zachmann, G. (1998). Rapid Collision Detection by

Dynamically Aligned DOP-Trees. Proc. of IEEE,

VRAIS’98 Atlanta.

GRAPP 2009 - International Conference on Computer Graphics Theory and Applications

360