3.2.1 KD-Tree Implementation
The construction of KD-Trees in our implementation
is done using the SAH algorithm (Wald and Havran,
2006). Our implementation allows for the creation
of empty leaf nodes but does not perform triangle
clipping. By experimenting with different values for
C
traversal
and C
intersection
, we concluded that mobile ar-
chitectures tend to favour wider and shallower trees.
We found a value of 3.0 for C
traversal
and 1.5 for
C
intersection
to yield good results.
The memory layout for KD-Tree nodes varies ac-
cording to which traversal algorithm is being used.
(a) KD-Pushdown node layout
(b) KD-Backtrack node layout
Figure 1: Layouts of KD-Tree nodes. vMin and vMax rep-
resent the node bounding box.
Another difference between trees for the two
traversal methods is that while building the tree for the
KD-Backtrack traversal method, we do not allow for
perfectly flat nodes, i.e. nodes that have zero length
on one of the axis. This is done to avoid precision
related issues while traversing the tree. We imple-
mented the KD-Pushdown and KD-Backtrack algo-
rithms using the node layouts shown in Figure 1.
3.2.2 BVH Implementation
In our implementation, the construction of BVHs is
done using an altered version of the construction algo-
rithm (Wald and Havran, 2006) that was also used in
the KD-Trees construction. Like with KD-Trees, af-
ter experimenting with several values, we came to the
conclusion that, again, wider, shallower trees tend to
perform best. As such, the values chosen for C
traversal
and C
intersection
were, again, 3.0 and 1.5 respectively.
The memory layout for BVH nodes also varies ac-
cording to which traversal method is being used. The
possible layouts are shown in Figure 2.
For GPU traversal we implemented Trail traversal
along with the Parent-Link traversal algorithm.
3.3 GPU Rendering Methods
Our implementation used multiple rendering ap-
proaches. Regardless of the rendering method cho-
(a) BVH Trail traversal node layout
(b) BVH Parent traversal node layout
Figure 2: Layout of BVH nodes. vMin and vMax represent
the node bounding box.
sen, our implementation starts by constructing the se-
lected acceleration structure along with the auxiliary
structures for primitive storage. These structures are
then copied to GPU memory as Shader Storage Buffer
Objects (SSBOs). The application also creates and
uploads a Vertex Array Object (VAO) containing a
full-screen quad that is then used for every render-
ing method. The drawing process, however, changes
according to which rendering approach is selected:
• Fragment Shaders - the application renders a full-
screen quad using a very simple vertex shader.
The fragment shader is then responsible for ray
tracing the corresponding pixel. In this case, all
the code for ray tracing and structure traversal is
contained in the fragment shader.
• Compute Shaders - the application performs a two
step process. In the first step, the application
dispatches the necessary compute workgroups so
that each thread processes a pixel of the final im-
age. The result of this first step is stored in an
Image Buffer which is then utilized in the second
step as an input texture. The second pass simply
draws a full-screen quad, using the texture gener-
ated in the first step.
• Hybrid Shading - the application, not only cre-
ates a VAO containing the full-screen quad, but
also a second VAO containing the entire geom-
etry for the scene being rendered. This second
VAO is used in the first phase of the rendering pro-
cess, where the application issues a drawcall that
rasterizes all primitives. This first phase stores
the calculated normals into a color attachment.
From this first step a depth buffer is also gener-
ated. These two buffers are then used on the sec-
ond phase of the drawing process where the full-
screen quad is rendered. The values in the buffers
are used to create and cast the shadow ray which
then triggers a structure traversal. For this ren-
dering method, all the ray tracing logic is in the
fragment shader of the second pass.
GRAPP 2019 - 14th International Conference on Computer Graphics Theory and Applications
334