performed twice to explore hard negative examples.
In the first iteration, cropped training images with a
fixed size are used to train initial regional SVMs,
and then hard negative examples are explored by
detecting the original training negative files rather
than cropped training negative examples
(Felzenszwalb et al., 2010). The hard negative
examples are added to the negative training set to
conduct the second round of training. The process of
detecting objects using our proposed algorithm
consists of four steps (given a test image): 1)
pyramid feature extraction; 2) predictions by
regional SVMs; 3) confidences estimated by the
spatial model; 4) non-maximum suppression. The
non-maximum suppression is employed to deal with
the situation in which multiple overlapping
detections for each instance of an object are
obtained. The bounding box with the highest
confidence is reported among bounding boxes
overlapping at least 50%.
Fig. 4 presents comparisons of several shots that
are processed by our object detection system with
and without the spatial model. It is clear that the
spatial model greatly improves the performance. The
performance on the entire dataset is assessed by the
Figure 5: Precision-recall curve on the Caltech Airplanes
Dataset.
precision, recall and F-measure of the testing results,
and Table 3 presents the results of ours compared
with the Neighborhood Suppression Algorithm
(NSA) and the Repeated Part Elimination Algorithm
(RPEA) of (Agarwal et al., 2004), which have also
evaluated the performance on the multi-scale test
images of the UIUC Database. The results of Table
3 demonstrate that our algorithm achieves a
performance (F-measure) that is almost 20% better
than the performance of the NSA and RPEA. Note
that the best F-measure of these two algorithms
reported in (Agarwal et al., 2004) is referred to in
Table 3.
Table 3: Performance on the multi-scale test images of the
UIUC Image Database for Car Detection.
SA RPEA Ours
Recall 38.85% 39.57% 66.91%
Precision 49.09% 49.55% 60.00%
F-measure 43.37% 44.00% 63.27%
We also evaluate our proposed algorithm on the
Caltech Airplanes dataset consisting 1074 images,
which are divided into a training set (500 images),
an validation set (74 images) and a test set (500
images). The training process is similar to that
applied on the Car Dataset and the performance is
evaluated by the precision-recall curve (Everingham
and Zisserman, 2007) as shown in Fig. 5 (some of
the detection results are shown in Fig. 6). The
comparison methods include the SVM method that
employs the HOG feature and part-based algorithm
(Felzenszwalb et al., 2010), and our algorithm gives
the best average precision (AP) (Everingham and
Zisserman, 2007) and a relatively better performance.
5 CONCLUSIONS
This paper presents the regional SVM classifiers
with a spatial model to describe the 3D (axes x, y, z
in Fig. 1) spatial relationship of features, which is
ignored by the conventional SVM. Regional SVM
classifiers encode the spatial relationship along axis
z, and the spatial model incorporates the spatial
relationship along axes x and y. We demonstrate
regional SVM classifiers with the spatial model
using diversified features in various categories, and
the experiments establish that the regional SVM
classifiers do enhance the performance of the SVM
classifier and the spatial model improves the
performance of the object detection system. The
experiments on the benchmark datasets show that
our system has a relatively better performance
compared with other object detection algorithms.
ACKNOWLEDGEMENTS
This work was financially supported by the Basic
Science Research Program through the National
Research Foundation of Korea (NRF) funded by the
Ministry of Education, Science and Technology
RegionalSVMClassifierswithaSpatialModelforObjectDetection
377