creasing moves, regarding the given objective func-
tion, but it also tolerates increasing moves in order
to avoid getting trapped in local minima. Indeed, it
uses a probability function, that decreases as the exe-
cution advances, for accepting increasing moves. The
method asymptotically converges to a global mini-
mum, whenever certain conditions hold, concerning
the annealing schedule.
Figure 5 describes how to build a BVH cut us-
ing SA. Besides the current cut (currentCut), the al-
gorithm also holds another one (nextCut) that corre-
sponds to a random evolution of the former. These
two cuts advance together along the execution of
two nested loops: one for decreasing the control pa-
rameter temp (line 10) –the temperature used in the
original SA formulation– and another one for trying
many moves at the same temp (line 11). Regard-
ing increasing the render time, the algorithm accepts
those cuts whose acceptance threshold (line 13) is
greater than a uniform random value in [0,1] (line
14). If the nextCut is finally accepted, it is assigned to
currentCut (line 15) and the best cut is updated if re-
quired (lines 18–21). In any case, a new random evo-
lution is computed (line 24) and subsequently stored
in nextCut.
The function evolve generates a reachable cut
from currentCut by applying either the join or
the un f old operation. In the former, an inner
node n of the cut C is replaced by its two chil-
dren: un f old(C, n) = (C −{n})∪{le f t(n), right(n)},
whereas two sibling nodes l, r ∈ C of C are re-
placed by their father in the later: join(C, l, r) = (C −
{l, r}) ∪ { f ather(l)} In this function, one of these op-
erations is randomly chosen (if both are possible).
5 EXPERIMENTAL SETTINGS
Our application has been run on a NVIDIA GeForce
GTX 285 with 1GB of RAM. The test scenes are
FAIRYFOREST, CONFERENCEROOM, SPONZA and
SIBENIK (see Figure 1). The FAIRYFOREST scene
is open but a quadrilateral has been positioned as
a roof, preventing the rays from escaping from the
scene. All the images have been taken at a resolution
of 1024 × 1024.
The BVHs have been built by following the
Surface Area Heuristics (SAH) by (Goldsmith and
Salmon, 1987) and using the greedy top-down algo-
rithm by (Ize et al., 2007). To improve the overall per-
formance of the BVH, we have also applied the early
split clipping technique by (Ernst and Greiner, 2007).
So, before starting the construction, the bounding vol-
ume of each triangle is iteratively halved until its sur-
face area is lower than a certain threshold.
We have used path tracing (Kajiya, 1986) as our
ray tracing algorithm, and for the sake of conve-
nience, every surface of the scene is considered as
diffuse (i.e. with a constant BRDF). Hence, as soon
as a ray finds the nearest intersection point, a new ray
is spawned. Its origin is the intersection point and
its direction is randomly chosen over a virtual hemi-
sphere on the surface normal. We have considered the
cosine as the probability density function, i.e. those
points near the pole have more probability because it
depends on cos θ (where θ is the angular deviation of
the point from the pole). Since the number of rays
does not increase, we have an absolute control over
the memory that is actually allocated.
Each ray is bound to a persistent CUDA thread,
according to (Aila and Laine, 2009). The set of
rays whose associated threads are simultaneously
launched is called a generation. Generations are enu-
merated; the generation 0 is composed of the primary
rays, and the generation i is composed of the rays
spawn from the generation i − 1. The number of con-
sidered generations in this paper is fixed to 10. The
number of rays in a generation is the biggest one that
our implementation and our graphics card are able to
store: 8 MRays (= 8 · 2
20
rays). The primary rays are
spawned from a bidimensional array of 4096×2048.
Since the images are at a resolution of 1024 × 1024,
each subarray of 4 × 2 rays contains 8 samples for the
same pixel. When it is stored in memory, the bidi-
mensional array is flattened according to the Z-order
(Morton code).
In these settings, path tracing is specially suitable
for our experiments since no property can be assumed
in advance for the rays from generation 0 on (i.e. no
primary rays). As we will see in Section 6, the inco-
herency becomes maximal from generation 2 on.
We have used the linear congruential generator by
(Park and Miller, 1988) as random number generator
algorithm. It has a period of 2
31
− 2, which is greater
than the total amount of random numbers needed in
the tests, ensuring that each ray receives different ran-
dom numbers.
Our path tracer has been implemented with five
CUDA kernels: RayGenerator (RG), Test, Compact,
TraversalIntersection (TI) and Shader (SH). The algo-
rithm runs according to the following scheme. First,
the primary rays are spawned from a pinhole cam-
era, in the kernel RG. Then, in the kernel Test, the
rays are tested for intersection with a node n of the
cut. Next, the rays that passed the previous intersec-
tion test are compacted, in the kernel Compact. This
kernel is actually the primitive cudppCompact of the
CUDPP library by (Harris et al., CUDPP) and pre-
TRAVERSING A BVH CUT TO EXPLOIT RAY COHERENCE
145