for primary and 8.55% for shadow rays in AO. Pri-
mary rays in PT and AO are almost identical, so their
results are very similar. Shadow rays in AO take fewer
traversal steps than secondary rays in PT. This makes
sense because the average length of shadow rays in
AO is shorter than that of secondary rays in PT.
Concerning the execution model of GPUs, the
traversal of different rays is not totally independent
from each other. Therefore, texture cache misses and
divergences can make the runtime execution differ-
ent than expected. Even primary rays suffer from
these stalls since the decrease in traversal steps do
not agree with the improve in performance. For in-
stance, SPHERE-ORTH takes 11.08% less traversal
steps than SAH for BUNNY in PT (Table 4), but it
only reaches an improvement of 4.29% in perfor-
mance (Table 5). On the contrary, this clear differ-
ence does not hold for secondary rays. It is true that
the sorting can entail an increase of their coherence,
but secondary rays are randomly spawned and sort-
ing only considers the kd-tree selection. Thus, reports
highly depend on how these rays are concretely built
during rendering.
Regarding sorting, the performance of our heuris-
tics exceeds that of SAH when the overload due to
sorting is not considered (columns n.inc. in Table
5). When this overload is included (columns inc.)
our heuristics keep overcoming in most cases. The
overload is more relevant in AO since shadow rays
traverse fewer steps on average. Notice that scenes
BUNNY and CONFROOM have the lowest average
traversal steps and their results show that the overload
make their runtime performance mostly slower w.r.t.
SAH.
We have also compared SAH with the cosine
heuristics. The settings are the same than previous
heuristics. We have measured the traversal steps and
the runtime performance (including Sort) by ranging
β from 0.5 to 20 in steps of 0.5. Tables 6 and 7 show
the results for COS-ORTH (blue curves) for PT and
AO, respectively. The results for COS-OBLI are not
depicted because they have a similar behaviour. A
dashed horizontal line is added to the charts to com-
pare this heuristics with SAH.
With respect to traversal steps in PT, it can be seen
a decrease of them as β increases until it reaches a
value between 2 and 3.5, for primary and secondary
rays. These values of β lead to similar weights re-
garding the spherical and cubic heuristics. After that,
the behaviour of rays becomes scene-dependant. The
charts of runtime performance have an inverse be-
haviour, since the fewer traversal steps the rays tra-
verse, the higher the runtime performance is.
The charts of traversal steps for primary and
shadow rays have a similar shape in AO. Again, the
steps traversed by shadow rays are fewer than those
for secondary rays in PT due to their shorter length.
Comparing the performance charts between PT
and AO, AO exhibits a better performance than PT,
but the difference between COS-ORTH and SAH is
larger for PT (Table 6) than for AO (Table 7). The
explanation of this is the same as previous heuristics,
i.e. the constant overload of sorting is more relevant
for those rays with fewer traversal steps.
8 CONCLUSIONS AND
FUTURE WORK
In this paper, we have presented six new heuristics de-
veloped from a mathematical description of the orig-
inal SAH. These heuristics specialize SAH for differ-
ent sets of ray directions by restricting their domain or
assuming different probabilities. In order to cover the
whole space of directions, several sets have been pro-
posed and a kd-tree has been built for each of them
(Multi-kd-tree). The traversal of a Multi-kd-tree re-
ports fewer traversal steps and better runtime perfor-
mance than a single SAH-based kd-tree over usual
scenes.
However, runtime performance does not agree
with the number of traversal steps, due to the execu-
tion on SIMT hardware. This fact is even more rele-
vant for secondary or shadow rays due to their random
spawning. It is necessary further research about this
issue to fill the gap between traversal steps and run-
time performance on parallel hardware.
A tighter division of the direction space could be
realized. However, two considerations must be taken
into account. First, all the information needed for the
traversal has to be stored in device memory. So, a big-
ger amount of divisions entails more memory require-
ments. Second, the selection of the kd-tree to traverse
has to be quick. In this work, only few comparisons
are needed, which makes the selection negligible with
respect to the whole traversal.
Finally, the cosine heuristics have been devel-
oped independently to the spherical and cubic heuris-
tics. It would be interesting to analyze the behaviour
of spherical or cubic patches in which rays are dis-
tributed according to the cosine heuristics.
ACKNOWLEDGEMENTS
This paper has been supported by the Spanish projects
CCG10-UCM/TIC-5476 and GR35/10-A-921547.
IMPROVING RAY TRAVERSAL BY USING SEVERAL SPECIALIZED KD-TREES
225