in a unified framework, continuous online learning of
qualitative object properties and spatial relations in
a setting with no negative examples and when every
sample is labelled with multiple concept labels, with
a view toward using this as a basis for further learning
and facilitating unlearning as well.
To facilitate this type of learning, our method
models the values of features, that are associated
with individual concepts, by estimating the probabil-
ity density function that generated them. To that end,
we employ kernel density estimates (KDE), which
apply a mixture of kernels to approximate the un-
derlying density. Since the models have to be esti-
mated continuously from the arriving data, we con-
struct a kernel for each incoming data and use it to
update the corresponding distributions. This boils
down to estimating the single parameter of the kernel
– its bandwidth. We propose a method for estimat-
ing the bandwidth of each incoming kernel, which
is the second contribution of this paper. A number
of bandwidth selection methods have been proposed
previously which aim to minimize the asymptotic-
mean-integrated-squared-error (AMISE) between the
unknown original distribution and its approximation
based on a set of the observed samples (e.g., Wand
and Jones (Wand and Jones, 1995)). However, these
approaches are not directly applicable in incremental
settings. To that end, various incremental approaches
have been proposed which usually incorporate some
constraints or prior knowledge of the relation between
the consecutive samples (Elgammal et al., 2002; Han
et al., 2004), like temporal coherence on the incom-
ing data (Arandjelovic and Cipolla, 2005; Song and
Wang, 2005). Szewczyk (Szewczyk, 2005) applies a
Dirichlet process prior on components and applies a
Gamma density prior to sample the bandwidth of the
incoming data. A drawback of this approach, how-
ever, is that the parameters of the prior need to be
specified for a given problem. Here, we propose an
incremental bandwidth selection approach that does
not assume any temporal coherence and does not re-
quire setting a large number of parameters.
The paper is organised as follows. First we intro-
duce the main incremental learning algorithm. Then
we explain the algorithm for incremental updating of
KDE representations. In section 4 we then present
the evaluation of the proposed methods. Finally, we
summarize and outline some work in progress.
2 MAIN INCREMENTAL
LEARNING ALGORITHM
The main task of the incremental algorithm is to as-
sign associations between extracted visual features
and the corresponding visual concepts. Since our
system is based on positive examples only (we do
not have negative examples for the concepts being
learned), and each input instance can be labelled with
several concept labels, the algorithm can not exploit
discriminative information and can rely only on re-
constructive representations of observed visual fea-
tures. Each visual concept is associated with a vi-
sual feature that best models the corresponding im-
ages according to the consistency and specificity cri-
teria. It must determine which of the automatically
extracted visual features are consistent over all im-
ages representing the same visual concept and that
are, at the same time, specific for that visual concept
only. The learning algorithm thus selects from a set
of one-dimensional features (e.g., median hue value,
area of segmented region, coordinates of the object
center, distance between two objects, etc.), the feature
whose values are most consistent and specific over all
images representing the same visual concept (e.g., all
images of large objects, or circular objects, or pairs of
objects far apart etc.). Note that this process should
be performed incrementally, considering only the cur-
rent image (or a very recent set of images) and learned
representations – previously processed images cannot
be re-analysed.
Therefore, at any given time, each concept is asso-
ciated with one visual feature, i.e., with the represen-
tation built from previously observed values of this
feature. A Kernel Density Estimate (KDE) is used
to model the underlying distribution that generated
these values. The KDE models, in our case Gaussian
mixture models, are updated at every step by consid-
ering the current model and new samples using the
algorithm presented in the next section. However, af-
ter new samples have been observed, it may turn out
that some other feature would better fit the particu-
lar concept. The system enables such switching be-
tween different features by keeping simplified repre-
sentations of all features. Assuming that the data that
has to be modeled is coarsely normally distributed the
proposed algorithm keeps updating the Gaussian rep-
resentations of all features for every concept being
learned
2
. These updates can be performed without
loss of information. When at some point the algo-
rithm determines that some other feature is to be asso-
ciated with the particular concept, it starts building a
2
In practice, only the representations of a number of po-
tentially interesting features could be maintained.
CONTINUOUS LEARNING OF SIMPLE VISUAL CONCEPTS USING INCREMENTAL KERNEL DENSITY
ESTIMATION
599