GPU Memory
TB List VBO
tb0
tb1 tb2
tb3
...
TB List
PC Cells VBO
pc0
pc1 pc2
pc3
...
PC List
GS
v0
v1 v2
v3
...
v0
v1 v2
v3
...
TIN
Grid
Geometry VBO
{
Vertex
attributes
TBO
TBO
Figure 3: Organization of data structures in memory.
3 GPU HM APPROACH
Our approach attains the parallel tessellation of sev-
eral PC cells by running multiple instances of the HM
algorithm in the GPU cores. Furthermore, an opti-
mized version of the original data structures is used
to improve the memory access.
In our proposal, Geometry Shaders are used to de-
code the precomputed data structures and to generate
the triangles needed for the tessellation of each PC
cell. The GPU parallelism is automatically exploited
as these tasks can be efficiently executed simultane-
ously in multicore GPUs.
Next, the optimized data structures and the tessel-
lation and rendering algorithm are presented.
3.1 GPU and CPU Data Structures
The data structures used by the original HM algorithm
were presented in Section 2: the GC list, the list of PC
cells and the TB list. Since in the GPU HM method
the rendering is performed by the Geometry Shader,
every shader thread needs to access some of these
structures and the meshes’ geometric data. The orig-
inal HM data layout, however, can not be efficiently
accessed from the GPU. Thus, we have modified the
data storage and accessing strategies depending upon
their accessing pattern. Figure 3 illustrates the loca-
tion and access methods used for the data structures.
The GC list is only needed to identify the cover-
age pattern of the cell for rendering the grid mesh.
Once the grid cell has been identified, it is rendered,
discarded or tessellated using the GPU and thus is not
needed by the Geometry Shader.
The GPU HM rendering algorithm uses the geo-
metric data of the TB vertices and the grid cell be-
ing tessellated, together with the TB data and the list
of PC cells. Consequently, these data structures have
to be maintained in GPU memory, stored in Vertex
Buffer Objects (VBO), although they are accessed
in different ways. The geometry data and the TB
list need an array-like access, since the processing
of one PC cell may involve reading non adjacent ele-
ments in random order. Texture Buffer Objects (TBO)
(OpenGL.org, 2009), associated to the correspond-
ing VBOs, are used in our proposal to read this data
within the shader. Using TBOs is an efficient way
to obtain array-like access to large data buffers in the
GPU memory. They provide a convenient interface to
the data, simulating a 1D texture where the tex coordi-
nate corresponds to the offset in the buffer object. Ad-
ditionally, accessing latencies can be effectively hid-
den by overlapping the readings with data processing
operations, since in the HM rendering stage there are
several processing operations.
An additional optimization to improve the data lo-
cality of the TB list has been used. The TIN mesh
vertices are ordered in the vertex buffer containing the
geometry data to guarantee that the TB boundary ver-
tices are stored at the beginning. In this way, the offset
of the vertex data in the buffer object represents its po-
sition in the TB list. Thus, vertex information can be
accessed by using its TB index –reducing one level of
indirection regarding to the original data structures–
and adjacent vertices in the TB are now adjacent in
the buffer object, improving the cache behavior due
to this better data locality.
The PC list, on the other hand, is made avail-
able to the shader through normal input vertex at-
tributes. There is a one-to-one relationship between
PC cells and vertex shader threads; therefore, packing
the items in the PC list as input vertex attributes is the
most effective way to send the right information for
every shader thread. Moreover, data transfer between
CPU and GPU during rendering is reduced, given that
we can use indexed draw calls to select the active PC
cells being tessellated every frame. This list of active
PC cells is easily built on-the-fly by the CPU accord-
ing to the view-dependent grid LOD, and efficiently
uploaded to the GPU due to its small size.
3.2 Rendering Algorithm
The GPU HM rendering steps are similar to the steps
of the original algorithm, but the implementation dif-
fers. An overview of the render flow is presented in
Figure 4 for a coarse (Subfigure 4(a)) and a fine (Sub-
figure 4(b)) grid LOD, following the order of the nu-
merical labels. The TIN mesh is the same in both
cases, but the PC grid cells depend on the grid LOD
and thus different cells are rendered for each case.
The first step of the process is to identify the active
NC, CC, and PC cells. Next, the NC cells are ren-
dered using indexed draw calls (step (1) in Figures
GRAPP 2012 - International Conference on Computer Graphics Theory and Applications
256