6 CONCLUSIONS
In this paper, we presented our experiences with the
implementation of an algorithm for object detection
in OpenCL. We discussed the opportunities we ex-
ploited by parallelizing parts of the algorithm on
GPUs. We used a CPU implementation as a refer-
ence, and percieved a speedup from circa 0.60 sec-
onds to 0.15 seconds for the construction of the fea-
ture pyramid for images with a resolution of 600x480.
During this implementation we encountered different
challenges. The most important one is the simulta-
neous write operation to the same memory location,
which called for an approach that allows execution in
a more parallel way.
We discussed also the ease-of-use of OpenCL.
The flexibility comes at the cost of needing many
function calls before the actual kernels can be exe-
cuted, but these calls can be reused in other projects
which makes the big advantage of heterogeneity and
scalability outweighs the extra work. Since it is a
novel standard, the available literature is limited, but
still growing. Overall, we can certainly state thus that
the optimization game is worth the OpenCL candle.
7 FUTURE WORK
In the future, we will extend our implementation with
the use of texture memory, which allows a more ran-
dom access pattern in memory. Recent preliminary
experiments show we could reach an access speed of
three to four times the current. We will also integrate
the use of vectors, which allows an operation to be
executed on multiple elements at once. This requires
an additional padding of memory to be a multiple of
the vector size. We will also implement the search
for models in OpenCL. This part of the algorithm is
mostly the execution of convolutions with the model.
This overcomes the need to transfer the total feature
pyramid to host memory, but only the coordinates of
the detections.
ACKNOWLEDGEMENTS
This work is supported by the Institute for the Pro-
motion of Innovation through Science and Technol-
ogy in Flanders (IWT) via the Tetra project S.O.S.
OpenCL - Multicore cooking.
REFERENCES
Benedict, G. R., David, K., Perhaad, M., and Dana, S.
(2011). Heterogeneous Computing with OpenCL.
Morgan Kaupmann.
Dalal, N. and Triggs, B. (2005). Histograms of oriented
gradients for human detection. In International Conf.
on CVPR, volume 2, pages 886–893.
Felzenszwalb, P., Girschick, R., and McAllester, D.
(2010a). Cascade object detection with deformable
part models. In Proc. of the IEEE Conf. on CVPR.
Felzenszwalb, P., Girschick, R., McAllester, D., and Ra-
manan, D. (2010b). Object detection with discrimina-
tively trained part based models. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 32(9).
Felzenszwalb, P., McAllester, D., and Ramanan, D. (2008).
A discriminatively trained, multiscale, deformable
part model. In Proc. of the IEEE Conf. on CVPR.
Felzenszwalb, P. F., Girshick, R. B., and
McAllester, D. (2010c). Discriminatively
trained deformable part models, release 4.
http://people.cs.uchicago.edu/ pff/latent-release4/.
Gall, J., Yao, A., Razavi, N., Van Gool, L., and Lempitsky,
V. (2011). Hough forests for object detection, track-
ing, and action recognition. In IEEE Transactions on
Pattern Analysis and Machine Intelligence.
Group, K. (2011). Opencl - the open standard for
parallel programming of heterogeneous systems.
http://www.khronos.org/opencl/.
Leibe, B., Leonardis, A., and B.Schiele (2004). Combined
object categorization and segmentation with an im-
plicit shape model. In ECCV’04 Workshop on Sta-
tistical Learning in Computer Vision.
Lowe, D. G. (2004). Distinctive image features from scale-
invariant keypoints. International Journal of Com-
puter Vision.
PETS (2010). Pets 2010 benchmark data.
http://www.cvg.rdg.ac.uk/PETS2010/a.html.
Tsuchiyama, R., Nakamura, T., Lizuka, T., Asahara, A.,
and Miki, S. (2009). The OpenCL Programming book.
Fixstars.
Van Beeck, K., De Smedt, F., Beckers, S., Struyf, L., Ven-
nekens, J., De Samblanx, G., Goedem
´
e, T., and Tuyte-
laars, T. (2011). Towards robust automatic detection
of vulnerable road users: Monocular pedestrian track-
ing from a moving vehicle,. In Proc. of ATINER 7th
Annual International Conf. on Computer Science and
Information Systems.
Viola, P. and Jones, M. (2001). Rapid object detection using
a boosted cascade of simple features. In Proc. of the
IEEE Conf. on CVPR.
IS THE GAME WORTH THE CANDLE? - Evaluation of OpenCL for Object Detection Algorithm Optimization
291