Table 1: Comparison between the scene specific detector
(center column) and the original detector using exhaustive
search (left). For achieving comparability, the scene spe-
cific detector is also applied on scale 1.0 and above (right).
exhaust. scene scene
scales used all ≥ 1 relevant rel. ≥ 1
nr. det. windows 48016 3672 2409
nr. scales 33 29 16
time preproc. 4.82 3.64 1.62
time analysis 10.66 4.26 1.97
time total (sec.) 15.48 7.90 3.59
It has to be noted that the original detectors scans
the image at all possible scales greater than 1.0 with
an increment of 5%. The proposed scene specific de-
tector analyzes all relevent scales (with the same scale
increment), which may include scales smaller than
1.0. To enable direct comparison Table 1 also gives
results for the proposed scene specific detector apply-
ing a same minimum scale of 1.0. Using the scene
scale estimation the number of detection windows and
the number of processed scales can be significantly
reduced resulting in an average computational speed-
up by a factor of 4. The increased run-time perfor-
mance is mainly due to the reduced number of scales
and locations at which the feature descriptors have to
be computed. Since the base implementation already
caches and reuses priorly computed descriptors the
20-fold reduction of the number of detection windows
only leads to a 5-fold reduction of analysis time.
4 CONCLUSIONS
A robust approach for automatically adapting a de-
tector to an unknown planar scene is described. Ex-
periments on a variety of datasets demonstrate that
scene specific detection gives a speed-up by a factor
of 4 and a significant improvement in precision and
recall compared to an existing person detector. The
Matlab implementation of the scene scale estimation
and the code changes to the original person detector
in C++ (Dalal and Triggs, 2005) are made available
for download
2
. One open issue is the number of ob-
servations that are needed for a robust scene scale es-
timation. Although theoretically only few (3) good
detections are required for a planar scene model, the
estimate gets more reliable the more detections are
available. In our experiments promising results were
obtained using a few hundred detections. If many ob-
servations are available it is preferable to sample the
most probable detections (according to the detector’s
2
http://scovis.joanneum.at/sceneadaptation
confidence score) with a large coverage of the image
area. Given the low computational complexity of the
scene scale estimation an incremental application of
the approach is proposed.
ACKNOWLEDGEMENTS
The authors would like to thank their colleague Georg
Thallinger, Helmut Neuschmied and Werner Haas.
The research leading to these results has received
funding from the European Community’s Seventh
Framework Programme FP7/2007-2013 - Challenge
2 - Cognitive Systems, Interaction, Robotics - under
grant agreement n216465 - SCOVIS.
REFERENCES
Breitenstein, M. D., Sommerlade, E., Leibe, B., van Gool,
L., and Reid, I. (2008). Probabilistic parameter se-
lection for learning scene structure from video. In
BMVC.
Dalal, N. and Triggs, B. (2005). Histograms of oriented
gradients for human detection. In CVPR.
Doll
´
ar, P., Wojek, C., Schiele, B., and Perona, P. (2009).
Pedestrian detection: A benchmark. In CVPR.
Fischler, M. A. and Bolles, R. C. (1981). Random sample
consensus: a paradigm for model fitting with appli-
cations to image analysis and automated cartography.
Communications of the ACM.
Greenhill, D., Renno, J., Orwell, J., and Jones, G. A. (2008).
Occlusion analysis: Learning and utilising depth maps
in object tracking. Image Vision Computing.
Hoiem, D., Efros, A. A., and Hebert, M. (2006). Putting
objects in perspective. In CVPR.
Renno, J. R., Orwell, J., and Jones, G. A. (2002). Learning
surveillance tracking models for the self-calibrated
ground plane. In BMVC.
Stalder, S., Grabner, H., and van Gool, L. (2009). Explor-
ing context to learn scene specific object detectors. In
Performance Evaluation of Tracking and Surveillance
workshop at CVPR.
UK Home Office (2008). i-LIDS multiple camera tracking
scenario definition.
Zhu, L., Zhou, J., Song, J., Yan, Z., and Gu, Q. (2008).
A practical algorithm for learning scene information
from monocular video. Optics Express.
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
338