performance.
We are now able to characterize an incremental
learning method in the sense of ordering sensitivity
regarding samples from all classes or regarding the
order of classes. However, by now we need to know
how to define the best sequence for training. The
following section proposes one criterion to choose a
good sequence of classes just depending on the data.
Up to now, we have not found a similar criterion
to get best suited within class sequences.
3.2 How to Define a Curriculum?
The task now is to find from all permutations of ex-
amples the one that produces the best performance.
Related to the property of class ordering sensitivity of
an incremental learning method defined in Section 2
we propose here a way to get good training sequences
regarding this property.
For this purpose, we considered different mea-
sures for separating classes, e. g. distance measures
as the Kullback-Leibler-distance or the Bhattacharyya
distance. But both assume knowledge about the un-
derlying class generating distributions. We do not
want to make any assumptions about that. Hence, we
looked for criteria that are independent from any as-
sumption about the data distribution.
For this, we found the Bayes error as a general
measure for the separability of classes that can be es-
timated without any knowledge about the data distri-
bution. Thus, it defines the best achievable error rate
in a classification procedure. We can estimate bounds
of the Bayes error using the k-nearest-neighbour clas-
sificator. This is shown by (Fukunaga, 1972, chap. 6).
We did some experiments that we do not report in
detail, at which we compared the error rates of dif-
ferent class sequences with the according bounds of
Bayes error. Our empirical results indicate a strong
relation between the Bayes error and the performance
of an incremental learning method given a particular
training sequence. We got good results in terms of
error rates using the following two rules:
• Choose class combinations for initialisation with
lowest bounds of Bayes error.
• Choose remaining classes with lowest bounds of
Bayes error regarding the complete set of learned
classes up to now.
In order to estimate good class sequences accord-
ing this rules, which are just based on the data, we
have used a greedy search procedure. Thereby we
estimate bounds of Bayes error for all possible com-
binations of classes and search for sequences with
the overall best performance. For description of the
whole procedure see (Wenzel and F
¨
orstner, 2009).
The resulting list L consists of pairs of
sets {initialisation classes, single classes to add}.
Thereby the number of initialisation classes increases
from 2 to any number M of classes that is still suited
to initialise the learning method.
Experiment 3: Confirm Best Class Sequence of
Classes Determined from Bounds of Bayes Error.
We used the greedy search to find good sequences for
our test dataset of handwritten digits. This results in
lists of good class sequences which are shown in de-
tail in (Wenzel and F
¨
orstner, 2009).
To verify this way of getting good class sequences
we show the results of one selected experiment,
shown in Figure 4. Primarily the experiment is de-
fined as described in Section 3.1. Again, we trained
the learning method with different randomly chosen
sequences of classes. We started training with a fixed
set of classes based on the results of searching for
good sequence. Here we used classes 1, 3, 5, 6 for
initialisation. In addition to the randomly chosen se-
quences afterwards we did an additional trial where
we continued with classes 4, 2, 9, 7, 8, 10, which was
one of 11 found suited sequences. The error rates of
this trial is shown as red line in Figure 4. Again, more
results, for other sequences found by greedy search
can be found in (Wenzel and F
¨
orstner, 2009).
In fact, we get always almost best performance for
classes sequences given by the bounds of Bayes error.
First, we observe that error rates from all trials
during the initialisation are on the bottom area of
those from Experiment 2, shown in Figure 3.
Second, after initialisation, thus, when adding
samples from new classes, error rates from previous
defined class sequences are always almost below er-
ror rates from random trials, shown as gray lines.
Hence, with the estimate of bounds of the Bayes
error we actually have found an appropriate indicator
for finding good sequences of classes for training an
incremental learning method.
Figure 4: Results of testing predefined class ordering. Ex-
periment with a fixed set of classes for initialisation.
THE ROLE OF SEQUENCES FOR INCREMENTAL LEARNING
437