nearly linear speed-up when the levels each GPU will
process is carefully determined.
4 EXPERIMENTAL RESULTS
We have tested both the CPU and GPU implemen-
tations on the same desktop PC that contains a Intel
Core i5-2500k processor, 3 GTX 580 GPUs and 8GB
RAM. Figure 3 shows the average number of frames
processed per second by all implementations on video
streams of various sizes. These measurements in-
clude the time required to perform preprocessing and
memory copies between the host and the device. The
multi-threaded CPU implementation uses OpenMP to
distribute the processing to different cores.
14
11
4
2
42
34
14
6
171
140
64
35
320
264
121
68
445
380
177
99
0
50
100
150
200
250
300
350
400
450
640x480 720x540 1280x720 1920x1080
Frames Per Second
Resolution
CPU (1 thread)
CPU (4 threads)
1 GPU
2 GPUs
3 GPUs
Figure 3: Frame rates of GPU and CPU implementations on
various input resolutions.
As it can be seen from Figure 3, even the single-
GPU implementation outperformsthe single-threaded
and multi-threaded CPU implementations by a factor
of 12-18x and 4-6x, respectively. As the resolution
increases, so does the difference between the speed
of the GPU and CPU implementations, clearly show-
ing that a GPU, as a massively parallel processor, is
better suited to process high resolution videos than a
CPU. These results also show that, in contrast to the
CPU, the performance of the GPU based implementa-
tion scales nearly linearly with the number of GPUs.
We have also tested both our CPU and GPU im-
plementations on the CMU+MIT frontal face test set
(Rowley et al., 1998) to validate our results. The de-
tection rates for both implementations are measured
as %90.8 with 32 false positives, proving that our
GPU implementation generates the exact same results
with those of the CPU implementation.
5 CONCLUSIONS
We have presented an efficient GPU implementation
of a boosting based, real-time object detection sys-
tem utilizing MCT based classifiers using CUDA. We
have shown that even our single GPU implementa-
tion outperforms both the single-threaded and multi-
threaded CPU implementations runningon a high-end
CPU. We have pointed out that, because of their mas-
sively parallel architecture, GPUs are more suitable
for working with high resolution videos than CPUs.
Our implementation, with its ability to detect objects
in high resolution video streams in real-time, can eas-
ily be used in modern multimedia, entertainment and
surveillance systems.
ACKNOWLEDGEMENTS
This work is supported by ITU-BAP and TUBITAK
under the grants 34120 and 109E268, respectively.
REFERENCES
Fr¨oba, B. and Ernst, A. (2004). Face detection with the
modified census transform. In Proceedings of the
Sixth IEEE international conference on Automatic
face and gesture recognition, FGR’ 04, pages 91–96,
Washington, DC, USA. IEEE Computer Society.
Harvey, J. P. (2009). Gpu acceleration of object classifi-
cation algorithms using nvidia cuda. Master’s thesis,
Rochester Institute of Technology.
Hefenbrock, D., Oberg, J., Thanh, N. T. N., Kastner, R.,
and Baden, S. B. (2010). Accelerating viola-jones
face detection to fpga-level using gpus. In Proceed-
ings of the 2010 18th IEEE Annual International Sym-
posium on Field-Programmable Custom Computing
Machines, FCCM ’10, pages 11–18, Washington, DC,
USA. IEEE Computer Society.
Herout, A., Joth, R., Jurnek, R., Havel, J., Hradi, M., and
Zemk, P. (2010). Real-time object detection on cuda.
Journal of Real-Time Image Processing, 2010(1):1–
12.
Obukhov, A. (2004). Haar classifiers for object detection
with cuda. In Fernando, R., editor, GPU Gems: Pro-
gramming Techniques, Tips and Tricks for Real-Time
Graphics, chapter 33, pages 517–544. Addison Wes-
ley.
Rowley, H., Baluja, S., and Kanade, T. (1998). Neural
network-based face detection. IEEE Trans. Pattern
Anal. Mach. Intell., 20(1):23–38.
Sharma, B., Thota, R., Vydyanathan, N., and Kale, A.
(2009). Towards a robust, real-time face processing
system using cuda-enabled gpus. In High Perfor-
mance Computing (HiPC), 2009 International Con-
ference on, page 368377. IEEE.
Viola, P. and Jones, M. J. (2004). Robust real-time face
detection. Int. J. Comput. Vision, 57:137–154.
VISAPP 2012 - International Conference on Computer Vision Theory and Applications
688