For each point ~z
i
, a set of random samples S
i
=
n
~z
(1)
i
,...,~z
(m)
i
o
is generated around its neighborhood.
The stochastic sampling function used is a uniform
random walk around the initial point:
~z
(n)
i
= f (~z
i
,~n
i
) =~z
i
+~n
i
, ~n
i
∼ U
3
(−s,s) (2)
where ~n
i
is a 3-dimensional uniform distribution hav-
ing minimum in -s and maximum in s. The parameter
s is chosen to be directly proportional to the prior
reprojection error of the point being sampled. In this
way, the optimization behaves adaptively avoiding
falling into local minimums and handling well points
far from the optimum. The GPU implementation is
performed using a fragment shader. The data needed
are the 3D point to be optimized and the texture
with the random numbers. The output is a texture
containing the coordinates for all hypotheses. The
only datum transfered is the 3D point coordinates,
because the the random numbers are transfered in
preprocessing stage. It is not necessary to download
the generated hypotheses to main memory, because
they are only going to be used by the shader that
evaluates the samples.
3.4 Evaluating Samples
All the set S
i
for every point ~z
i
is evaluated in
this stage. The objective function is the residual
of Equation 1 applied to every 3D point for every
available frame:
argmin
j
t
∑
k=1
r
Π
R
k
~z
( j)
i
+
~
t
k
−~y
k
i
(3)
Equation 3 satisfies the independence needed in
stream processing, since each hypothesis is indepen-
dent from others.
Hypotheses are evaluated using a different shader
program. This shader runs once for each projection
~y
k
i
using texture ping-pong (Pharr, 2005), avoiding to
use loops inside the shader. The only data needed to
be transferred are the camera pose and the projection
of the 3D point for each frame. This shader program
must be executed t times for each 3D point.
When all the passes are rendered, the output
texture will contain the matrix with all the hypotheses
weighted. Now there are two ways to proceed. The
first one is to download the entire texture to main
memory and then search the best candidate using the
CPU. The second one is to search directly in the GPU.
Experimentally, we concluded that the second one is
the best way if the size of the texture is big enough.
This search is performed in a parallel fashion using
reduction techniques (Pharr, 2005).
4 EXPERIMENTAL RESULTS
Both precision and performance of the proposed
method have been measured in order to validating
it. All tests are executed on a real video recorded
in 320 × 240 using a standard webcam. Results are
compared with the implementation of the Levenberg-
Marquardt algorithm given by (Lourakis, 2004). In
our setup, the GPU optimizer runs with a viewport
of 256 × 256, reaching a total of 2
16
hypotheses per
point. The maximum number of iterations allowed to
the Levenberg-Marquardt algorithm is 200.
4.1 Precision
Various optimizations on triangulated 3D points have
been executed to measure the precision of the GPU
optimizer. In each run, 25 different points are re-
constructed using 15 consecutive frames tracked by
the algorithm described in (Eskudero et al., 2009).
Figure 1 shows the mean reprojection residual. The
figure is in logarithmic scale. This test shows that the
GPU optimizer gets on average 1.4 times better re-
sults than Levenberg-Marquardt, demonstrating that
both Levenberg-Marquardt and GPU optimizer get
equivalent results.
Figure 1: Residual error on real images.
4.2 Performance
The PC used for performance tests is an Intel C2D
E8400 @ 3GHz with 4GB of RAM and a nVidia
GeForce GTX 260 with 896MB of RAM memory.
Following tests show the performance comparison be-
tween the GPU optimizer and Levenberg-Marquardt.
In Figure 2, 15 points are used, incrementing in each
time step the number of frames and Figure 3 shows a
test running with 10 frames incrementing the number
of points in each time step.
Note that both figures are in logarithmic scales.
Figure 2 shows that the GPU optimizer runs ap-
proximately 30 times faster than Levenber-Marquardt
when the number of frames is increased, being capa-
ble to run at 30fps. even when optimizing 15 points
over 60 frames.
Next tests analyze deeper the time needed by the
GPU optimizer in its different phases. Figure 4 shows
GPU OPTIMIZER: A 3D RECONSTRUCTION ON THE GPU USING MONTE CARLO SIMULATIONS - How to Get
Real Time without Sacrificing Precision
445