In the past, up to a few years ago hand crafted fea-
ture based methods where the state-of-the-art in ob-
ject detection. Detectors like Viola and Jones (Vi-
ola et al., 2003), HOG (Dalal and Triggs, 2005),
ICF (Doll
´
ar et al., 2009), ACF (Doll
´
ar et al., 2014)
and DPM (Felzenszwalb et al., 2008) are some ex-
amples of detectors that use these kind of features.
Viola and Jones and ICF calculate an integral inten-
sity image, and use some kind of Haar wavelets to
generate possible feature values. HOG, ACF, ICF and
DPM make use of so called HOG like features. Mul-
tiple histograms, each representing a small part of the
image are calculated on the image gradient, each bin
in the image then represents a separate feature layer.
The DPM detector learns a detector for different parts
of the object which makes it more invariant to pose
changes. The calculated features are then used to train
a classifier using SVM or AdaBoost. To cover the en-
tire image a sliding window approach is used to eva-
luate all possible detection windows in the image on
different scales.
In this paper we choose to focus further on the the
ACF person detector for a few reasons: ACF is in it-
self, on CPU already quite fast, which means that it
is often used as a person detector on embedded plat-
forms. Porting ACF to GPU is something that to the
best of our knowledge has not been done before. The
authors in (Obukhov, 2011) explain how the Viola and
Jones face detections algorithm can be ported to GPU,
which is some ways similar to ACF.
The GPU implementation is an extension of our
own CPU implementation of ACF, which is already
faster than D
`
ollar’s Matlab implementation.
3 ACF PERSON DETECTOR
To be able to follow along with our GPU implemen-
tation of the ACF algorithm we will first give a brief
overview of the ACF algorithm itself.
The ACF person detector uses an AdaBoost clas-
sifier which uses “ACF features” to classify image pa-
tches, the entire image is searched using a sliding win-
dow approach.
In total the ACF features consist of ten channels,
LUV color / intensity information, gradient magni-
tude and histograms of Oriented Gradients (HOG).
They are calculated as follows: RGB color informa-
tion coming from an image source is converted to the
LUV color space, a gaussian blur is applied and the
resulting Luminosity (L) and chroma values (U and
V) are used for the first three channels. The gradients
(in both directions) of the image are calculated from
the luminosity channel. The magnitude of the gra-
dient, after again applying a gaussian blur is the fourth
channel. The six remaining channels each represent a
different bin (containing a set of orientations) in the
gradient orientation histogram. A separate histogram
is calculated for each patch of n ×n pixels (often 4x4)
in the gradient images, this means that the resulting
feature channels will be downscaled by a factor of n
(know as the shrinking factor). To make sure that all
channels have the same dimensions, the LUV and gra-
dient magnitude channels are also downscaled by the
shrinking factor. Each gradient magnitude in the n×n
patch for which a histogram is calculated is placed in
the two neighboring bins using linear interpolation ac-
cording to its orientation.
Using these features a classifier can be trained
to detect objects like people. In our implementation
we are only interested in speeding up the evaluation
phase as it is the only part that needs to run in real-
time, and also the only part that will run on embed-
ded hardware. For this reason we will only explain
how evaluation of an ACF model is performed, and
omit the training phase details.
For classification ACF uses a variation of the Ada-
Boost (Freund and Schapire, 1995) algorithm. A se-
ries of weak classifiers (decision trees) are evaluated
to make one strong classifier. Every decision tree adds
or subtracts a certain value (determined during trai-
ning) to a global sum which represents the detection
score for a certain window, as seen in equation 1.
H
N
(X) =
N
∑
n=1
h
n
(x) (1)
Decision trees are evaluated sequentially, if at any
point N the global score H
N
reaches a value below a
certain cutoff threshold, the evaluation for that par-
ticular window is stopped. Only for windows that
never go below this threshold all decision trees are
evaluated. Stopping early with the evaluation means
that much fewer decision trees have to be evaluated
making the evaluation much faster. Only windows
with a high score (where the object likely is present)
are evaluated fully. After evaluating each window in
this fashion using the sliding window approach, Non-
Maximum-Suppression (NMS) is applied which gives
us our final detection boxes.
4 GPU IMPLEMENTATION
We can divide our GPU implementation of the ACF
detector into two different steps, feature calculation
and model evaluation. In this section we will explain
both of them. In preliminary test we saw that for the
GPU Accelerated ACF Detector
243