Figure 4: Global precision-recall curves.
5.2 Global Results
In the last paragraph we present the results of our sys-
tem for each region separately. In this section we
are interested in its global performance. We compute
precision-recall curves on the whole image.
Our results are shown on the figure 4. We can
see that both contextualized detectors (blue and green
curves) are better than the generic one (red curve).
As expected, the contextualized and region driven de-
tector performs better than the solely contextualized
detector. On the blue curve, the distortion is easily
explained by the performances of the classifier in the
fourth region (id 3).
All these experiments tend to prove that a global
classifier, even if it is contextualized, is not optimal in
our application. A region driven classifier can achieve
better performances.
5.3 Automatic Detection Threshold
Estimation
Once all classifiers have been trained, we need to tune
them to achieve the best performances. In a video-
surveillance system, as the scene is stable, one false
positive detection is fairly sure to come back in the
following frames. So it is essential to filter out the
false positive detections. This mostly consists in ad-
justing each detection threshold independently.
The optimal threshold is the one maximizing the
best F-measure, which is a trade-off between the pre-
cision and the recall. In the case of a generic learn-
ing algorithm, we simply use an annotated testing
dataset to estimate this threshold. In a contextual-
ized approach, we do not have such a dataset like a
groundtruth. So we reuse the oracle to collect new ex-
amples to create an estimated groundtruth. This time
we replace the generic classifier of the oracle by a
contextualized one. With this estimated groundtruth,
it is possible to compute a recall and a precision for a
given threshold and to deduce the F-measure. A 1-D
maximization is done to find the best threshold.
Table 1: Comparison between the detectors (global and spa-
tial) performances at their optimal thresholds θ
pr
opt
and at
their estimated ones θ
auto
opt
(T: threshold, P: precision, R: re-
call, F: F-measure).
Global Region 0 Region 1 Region 2 Region 3
θ
pr
opt
θ
auto
opt
θ
pr
opt
θ
auto
opt
θ
pr
opt
θ
auto
opt
θ
pr
opt
θ
auto
opt
θ
pr
opt
θ
auto
opt
T 24.6 21.9 11.5 16.2 1.4 17.9 3.1 32.5 28.3 27.6
P 0.87 0.73 0.88 0.99 0.99 1.0 0.97 1.0 0.88 0.86
R 0.69 0.72 0.75 0.60 0.98 0.89 0.96 0.88 0.73 0.74
F 0.77 0.72 0.81 0.74 0.98 0.94 0.96 0.94 0.80 0.80
To evaluate the accuracy of the estimated thresh-
olds, θ
auto
opt
, we compare them to the optimal thresh-
olds θ
pr
opt
. Results are shown on the table 1. As the
contextualized oracle is not perfect, there are some
mislabeled examples in the estimated groundtruth. So
the estimated thresholds can be slightly different from
the optimal ones. But usually they achieve similar
F-measures.
6 CONCLUSIONS
In this article, we propose a system to automatically
build a contextualized pedestrian detector for video-
surveillance applications. First, an oracle with a high
precision gathers scene specific pedestrian examples.
This dataset and the geometry of the scene are then
used to design 3D regions where pedestrians share
similar appearance characteristics. The idea is to ex-
ternalize the classifier complexity. Finally one detec-
tor, composed by the classifiers trained for each re-
gion and set to their optimal working points, is run.
REFERENCES
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., and
Süsstrunk, S. (2012). SLIC Superpixels Compared to
State-of-the-art Superpixel Methods. PAMI.
Dalal, N. and Triggs, B. (2005). Histograms of oriented
gradients for human detection. In CVPR.
Felzenszwalb, P., McAllester, D., and Ramanan, D. (2008).
A discriminatively trained, multiscale, deformable
part model. In CVPR.
Grabner, H., Roth, P. M., and Bischof, H. (2007). Is pedes-
trian detection really a hard task? In PETS.
Park, D., Ramanan, D., and Fowlkes, C. (2010). Multireso-
lution models for object detection. In ECCV.
Rodriguez, M., Sivic, J., Laptev, I., and Audibert, J.-Y.
(2011). Density-aware person detection and tracking
in crowds. In ICCV.
ARegionDrivenandContextualizedPedestrianDetector
799