powerful machine learning approach,
• to define a GPGPU-based testing stage of the pro-
posed classification method integrated to the final
rendering,
• to analyze the performance of our method com-
paring five different implementations with differ-
ent public data sets on different hardware.
2 ADABOOST CLASSIFIER
In this paper, we focus on the Discrete version of Ad-
aboost, which has shown robust results in real ap-
plications (Friedman et al., 1998). Given a set of N
training samples (x
1
, y
1
), .., (x
N
, y
N
), with x
i
a vector
valued feature and y
i
= −1 or 1, we define F(x) =
∑
M
1
c
f
f
m
(x) where each f
m
(x) is a classifier produc-
ing values ±1 and c
m
are constants; the correspond-
ing prediction is sign(F(x)). The Adaboost procedure
trains the classifiers f
m
(x) on weighted versions of
the training sample, giving higher weights to cases
that are currently misclassified. This is done for a se-
quence of weighted samples, and then the final clas-
sifier is defined to be a linear combination of the clas-
sifiers from each stage. For a good generalization of
F(x), each f
m
(x) is required to obtain a classification
prediction just better than random (Friedman et al.,
1998). Thus, the most common ”weak classifier” f
m
is the ”decision stump”. Stumps are single-split trees
with only two terminal nodes. If the decision of the
stump obtains a performance inferior to 0.5 over 1, we
just need to change the polarity of the stump, assur-
ing a performance greater (or equal) to 0.5. Then, for
each f
m
(x) we just need to compute a threshold value
and a polarity to take a binary decision, selecting that
one that minimizes the error based on the assigned
weights.
In Algorithm 1, we show the testing of the final de-
cision function F(x) =
∑
M
1
c
f
f
m
(x) using the Discrete
Adaboost algorithm with Decision Stump ”weak clas-
sifier”. Each Decision Stump f
m
fits a threshold T
m
and a polarity P
m
over the selected m-th feature. In
testing time, x
m
corresponds to the value of the fea-
ture selected by f
m
(x) on a test sample x. Note that c
m
value is subtracted from F(x) if the hypothesis f
m
(x)
is not satisfied on the test sample. Otherwise, positive
values of c
m
are accumulated. Finally decision on x is
obtained by sign(F(x)).
We propose to define a new and equivalent repre-
sentation of c
m
and |x| that facilitate the paralleliza-
tion of the testing. We define the matrix V
f
m
(x)
of size
3 × (|x| · M), where |x| corresponds to the dimension-
ality of the feature space. First row of V
f
m
(x)
codifies
the values c
m
for the corresponding features that have
been considered during training. In this sense, each
position i of the first row of V
f
m
(x)
contains the value
c
m
for the feature mod(i, |x|) if mod(i, |x|) 6= 0 or |x|,
otherwise. The next value of c
m
for that feature is
found in position i + |x|. The positions corresponding
to features not considered during training are set to
zero. The second and third rows of V
f
m
(x)
for column
i contains the values of P
m
and T
m
for the correspond-
ing Decision Stump. Thus, each “weak classifier” is
codified in a channel of a 1D-Texture, respectively.
As our main goal is to have real time test-
ing, we deal with two main possibilities: a
GLSL-programmed method on the fragment shader
and an OpenCL/CUDA implementation using the
OpenCL/CUDA-GL integration. We choose OpenCL
for portability reasons. Using GLSL, the gradient cal-
culation habitually is computed on CPU because it’s
faster. Then the results are send to the GPU as shown
in Figure 1. The testing and visualization stages can
be computed into the fragment shader to obtain good
speedups. Through OpenCL, in contrast to GLSL,
we can control almost all the hardware so we can
solve the gradient problem faster in the GPU using
an adaptation of the Micikevicius algorithm (Micike-
vicius, 2009). Thus, the gradient calculation and the
classification steps can be computed into the GPU re-
ducing the PCIe transfers and computing each step
faster. In the OpenGL side we use a 3D Texture Map
to visualize the models. The integration of OpenCL
and OpenGL allows to avoid sending the OpenCL la-
beled voxel model back to the Host and visualize it di-
rectly. The OpenGL layer loads the data to the graph-
ics card and then OpenCL obtains the ownership of
the data from the global memory, processes it and re-
turns ownership to OpenGL when finished.
3 GPGPU IMPLEMENTATION:
INTRODUCING WORK GROUP
SHARING
As shown in Figure 1, we propose two OpenCL ker-
nels: the gradient and the classification or testing ker-
nel.
Algorithm 1: Discrete Adaboost testing algorithm.
1: Given a test sample x
2: F(x) = 0
3: Repeat for m = 1, 2, .., M:
(a) F(x) = F(x) + c
m
(P
m
· x
m
< P
m
· T
m
);
4: Output sign(F(x))
GRAPP 2011 - International Conference on Computer Graphics Theory and Applications
216