which the margin is maximized:
max
w,b
{min
~x
i
{||~x −~x
i
|| :~x ∈ R
N
,w ·~x + b = 0}} (3)
As the training data is not always separable, a soft
margin classifier uses a misclassification cost C that
is assigned to each misclassified example. Equation 3
is optimized by introducing Lagrange multipliers α
i
and recasting the problem in terms of its Wolfe dual:
maximize: L
D
=
∑
i
α
i
−
1
2
∑
i, j
α
i
α
j
y
i
y
j
~x
i
~x
j
subject to: 0 ≤ α
i
≤ C, and
∑
i
α
i
y
i
= 0
(4)
All ~x
i
for which the corresponding α
i
is non-zero are
the support vectors.
The support vectors limit the position of the opti-
mal hyperplane. The objects ~x
i
for which α
i
= C are
the bound examples, which are incorrectly classified
or are within the margin of the hyperplane.
3 OBJECT SELECTION
In this work, we assume that we have j views of de-
scending quality V
u
,u = 1,..., j. The best level is de-
noted by V
1
and the worst by V
j
. We describe the ob-
ject~x in view j as V
j
(~x). Note that all~x
i
in the training
set are still in R
N
(the number of attributes does not
change).The training set at any point in time t consists
of the objects - each object at a different view level:
D = {(V
1
(~x
1
),y
1
),(V
2
(~x
2
),y
2
),. .. , (V
m
(~x
m
),y
m
). At
t = 0, all objects are only given in the worst view V
j
.
At any time point t we want to enhance some ob-
jects in order to have more detailed information for
the classification algorithm. We consider this as an
iterative setting, where we train a classifier, enhance
some objects, and then train the classifier on the new
training set.
Enhancing an object means we get a better esti-
mate of where this object lies in the current feature
space. Intuitively, if we have trained a classifica-
tion algorithm on a dataset, enhancing objects that are
classified with high confidence will not provide much
information for the classifier.
Instead, we propose to select objects that are close
to the decision boundary of the current classifier.
These are the objects that are classified with low con-
fidence. We expect that a more detailed information
about the exact position of this object in this feature
space provides the most information for the classifier.
This idea is very related to the concept of uncertainty
sampling in Active Learning (e.g. in the works of
(Schohn and Cohn, 2000), (Campbell et al., 2000) and
(Tong and Koller, 2001)). In the future, it might be in-
teresting to look at other selection strategies from this
domain, e.g. version space reduction.
The preceding works have used a Support Vec-
tor Machine (SVM) classifier and ranked objects ~x
by their distance to the dividing hyperplane, which
is given by the normal vector w and offset b:
min
~x
|w ·~x + b| (5)
The goal is to maximally narrow the existing margin
with an object.
Our proposed algorithm works as follows: all ob-
jects are given at the lowest view level in the begin-
ning. We train a SVM classifier and use Equation 5 to
rank objects. We can then select the top n objects for
enhancement and add the improved examples to the
current training set. This continues until we are sat-
isfied with the results or we do not have more budget
left to enhance further examples.
4 EXPERIMENTS
4.1 Artificial Data
To demonstrate the principle of operation and the po-
tential benefit of the proposed algorithm, we have
chosen a dataset that is easy to visualize. We have
generated a two-dimensional dataset with two classes
with a banana shaped distribution. The data is uni-
formly distributed along the bananas and is super-
imposed with a normal distribution with a standard
deviation s in all directions. The class priors are
P(1) = P(2) = 0.5.
Two views have been created: a good view V
1
(see
Figure 1(a)) by using a small standard deviation and a
bad view V
2
(Figure 1(b)) with a large standard devia-
tion. As can be seen, V
2
is very noisy, but the underly-
ing concept of two opposed banana shapes is still the
same. We have used a SVM with a RBF kernel; the γ
parameter has been set to 2.0. The SVM classifier is
plotted as a solid black line. The classification in view
V
1
reflects our ground truth and is therefore plotted as
a dotted black line in the other views in Figure 1.
Due to the high standard deviation, the classifica-
tion in view V
2
is far from optimal. In this experi-
ment, we have improved 30 examples. We plot the
improved dataset and the corresponding classifier that
is learned in this new data space. Figure 1(c) shows
the new dataset and classifier with our Active Algo-
rithm and Figure 1(d) shows the new dataset and clas-
sifier with randomly improved examples.
We can observe that the strategy of choosing ex-
amples close to the decision boundary results in a bet-
TOWARDS LEARNING WITH OBJECTS IN A HIERARCHICAL REPRESENTATION
327