while all the computations involved in the
simulation part are implemented on the CPU.
Concerning the simulation methodology, moving the
detection of collisions to GPU has been the most
studied topic. Collision detection algorithms on
GPUs can be classified into two categories. On one
hand, screen-space approaches use the depth or
stencil buffers to perform the collision tests by
rendering the geometry primitives (Govindaraju et
al., 2005) (Teschner et al., 2005). Their main
problems are that effectiveness is often limited by
the image space resolution, and that only potentially
colliding pairs are reported, so another test must be
applied on the CPU later.
On the other hand, object-space approaches use
the floating point bandwidth and programmability of
modern GPUs to implement the collision test.
(Zhang and Kim, 2007) performs massively-parallel
pairwise, overlapping tests onto AABB streams,
although exact primitive-level intersection tests are
performed on CPU. (Greβ and Zachmann, 2004)
(Horn, 2005) (Greβ et al., 2006) generate bounding
volume hierarchies on the GPU from a geometry
imaged (Gu et al., 2002) representation of the solids.
In order to expose the parallel processing
capabilities of the GPU, they breadth-first traverse
these hierarchies by using the non-uniform stream
reduction presented in (Horn, 2005). Nevertheless,
these approaches cannot autonomously operate on
the GPU, since the selection of the objects of the
pair to be tested is usually chosen on the CPU.
Furthermore, they do not apply any response when
the objects actually collide, so they should require
extra CPU collaboration to address interactions.
In the context of deformable objects, recent
papers have used the GPU capabilities to quickly
update their geometry: (Pascale et al., 2005)
proposes the use of vertex shaders to locally deform
the object, (Zhang and Kim, 2007) employs a
fragment shader to update the AABB streams, and
(Kim et al., 2006) uses a fragment shader to compute
the mass properties of rigid bodies in a buoyancy
simulation. Nevertheless, none of these papers cover
interactions between objects.
In this paper we study how to implement a fully
GPU-based rigid body simulator, by programming
shaders for every phase of the simulation. We
analyze the pros and cons of different approaches,
and point out the bottlenecks we have detected. We
also apply the developed techniques to two case
studies, comparing them with the analogous versions
running on CPU.
2 THE SIMULATION LOOP
The animation in a rigid body simulation is achieved
through a main loop, which updates the information
related to every object, after a cycle or step has been
completed. The size of the step must be accurately
chosen because of stability reasons. So the realism
level of the simulation directly depends on it.
In order to complete a step, the dynamics of
every object (position of its center of mass
x(t),
orientation
r(t), linear velocity v(t), and angular
velocity ω(t)) must be updated by using numerical
methods to solve ordinary differential equations.
Figure 1 shows both the configuration
Y(t) of the
state of an object and its time derivative in a 2D
scenario. In this case, the vector
r(t) and ω(t)
can be simplified to single scalars. Velocities change
according to the action of forces, since there exist
simple relations between their time derivatives and
the applied forces. The torque generated by a force
F(t) is defined as τ(t) = (p-x(t))×F(t),
where p is the location at where F(t) acts. Again
the vector
τ(t) can be simplified to a single scalar
in a 2D scenario. The mass M and the moment of
inertia
I are two scalars expressing the resistance of
a body to a linear or an angular motion, respectively.
The collision computation is the main task
involved in each step, since collisions make forces
generate motion. It is made up the three sequential
stages that will be presented in the following
subsections. Roughly speaking, rigid body
simulation can be considered as a large catalogue of
subroutines, some of those are carefully chosen to
fill each of these stages to build systems that
efficiently solve the specific scenes they drive. Here
we show how some of these subroutines can be
implemented on GPU, analyzing pros and cons with
respect to other approaches. Since the subroutines
can be independently incorporated into the whole
simulator, they are interchangeable parts; therefore
systems alternating CPU- and GPU-computations
are available. Nevertheless such hybrid simulators
would require additional tasks to change the
processor (e.g. transmitting data between CPU and
GPU, binding textures to shaders, and assigning
values to uniform variables) that could slow down
the simulation. Thus we will only consider fully
GPU-implementations in the sequel.
A FULLY GPU-IMPLEMENTED RIGID BODY SIMULATOR
343