10% of the image were selected randomly with the re-
mainder used for testing. This was repeated 20 times.
For the 50% training data experiments, stratified 5x2
fold cross validation was used. Each cross validation
selected 50% of the dataset for training and tested the
classifiers on the remaining 50%; the test and training
sets were then exchanged and the classifiers retrained
and retested. This process was repeated 5 times. Fi-
nally, for the 90% training data situation, stratified
1x10 fold cross validation was performed, with the
dataset divided into ten randomly selected, equally
sized subsets, with each subset being used in turn for
testing after the classifiers were trained on the remain-
ing nine subsets. For offline random forests, we train
detectors for bikes, cars and persons on 100 positive
and 100 negative images (of which 50 are drawn from
the other object class and 50 from the background),
and test on a similarly distributed set.
6 EXPERIMENTAL RESULTS
GRAZ02 images contain only one object category per
image so the recognition task can be seen as a bi-
nary classification problem: bikes vs. background,
people vs. background, and cars vs. background.
The well known statistic measure; the Area Under the
ROC Curve (AUC) is used to measure the classifiers
performance in these object recognition experiments.
The AUC is a measure of classifier performance that
is independent of the threshold: it summarizes not the
accuracy, but how the true positive and false positive
rate change as the threshold gradually increases from
0.0 to 1.0. An ideal, perfect, classifier has an AUC
value 1.0 while a random classifier has an AUC of
0.5.
6.1 Mean AUC Performance
Tables 2, 3, and 4 give the mean AUC values across
all runs to 2 decimal places for each of the classifier
and training data amount combinations, for the bikes,
cars ad people datasets respectively. For on-line RF
we report the results for different depths of the tree.
As can be seen, our algorithm always performs sig-
nificantly better than the offline RF. We found that the
differences in performance are (avg. = 1.2 ± 15%),
while our approach has achieved a number of desir-
able properties: (1) it is incremental, in a sense that
we are able to add new categories incrementally mak-
ing use of already acquired knowledge, the model will
continuously improve by exploring more features and
training data. If the process is running for a long time,
a lot of features are processed and evaluated but still
only a small number of features are sufficient for up-
dating. (2) it is adaptable, in a sense that the selection
of features and also the learning (we do not freeze the
learning) can change over time. Note that this kind
of adaptively is not possible in the standard random
forests and the other batch learning classifiers. Such
capability of on-line adaptation would take us closer
to the goal of more versatile, robust and adaptable
recognition system. The improvement when we vary-
ing the tree depth are relatively small. This makes
intuitive sense: when an image is characterized by
high geometric variability, it is difficult to find useful
global features.
6.2 A Bag of Covariance vs. Histograms
Another objective of the experiments was to de-
termine whether a bag of covariance matrices can
improve the recognition performance of histogram
methods. Covariance features are faster than the
histogram since the dimensionality of the space is
smaller. The search time of an object in 24-bit color
image with size 640 × 480 is 8.5 (s) with C++ im-
plementation which yield near real time performance.
The main computational effort is spend for updating
the base classifiers. In order to decrease computation
time we use a method similar to (Wu, 2003). Assum-
ing all feature pools are the same F
1
= F
2
= ··· = F
M
,
then we can update all corresponding base classi-
fiers only once. This speeds up the process consider-
ably while only slightly decreasing the performance.
We noted that the standard deviation varies between
±2.0 ± 3.2, which is considered quite high. The rea-
son is the images in the dataset vary greatly in their
level of difficulty, so the performance for any single
run is dependent on the composition of the training
set.
6.3 On-line Recognition Learning vs.
Offline Learning
Without loss of generality, the method we propose
is able to learn completely in the on-line mode, and
since we do not freeze the learning, it can adapt to a
changing situation. There are two reasons why this
choice of incremental learning of object recognition
may be useful. First of all, a machine has a com-
petitive advantage if it can immediately use all the
training data collected so far, rather than wait for a
complete training set. Second, search efficiency can
significantly improve due to the use of covariance de-
scriptor, which is consider more closet represent the
object choice shape.
IMAGAPP 2009 - International Conference on Imaging Theory and Applications
10