2.4 Object Classification with SVM
Among the numerous existing supervised
nonparametric classification methods, the compact
kernel SVM classifier was chosen because of its
superiority in terms of classification accuracy in the
context of remote sensing images, and its ability to
handle the curse of dimensionality (Bishop, 2006),
(Fauvel, 2008), (Melgani, 2004), (Foody, 2004). The
SVM algorithm is a 2-class classifier. We consider
the general case of a training set of two overlapping
classes. First, a nonlinear kernel function is applied
on the input space in order to obtain a higher
dimensional feature space having a better class
separability. The Gaussian kernel provides often the
best results, and is used in this paper. Second, the
parameters of the hyperplan linear model are
estimated according to the maximal margin criterion
and by penalizing the classification errors. The SVM
algorithm with Gaussian kernel has two
regularization parameters: the misclassification
penalty term and the Gaussian width. In this paper,
these parameters are optimized using cross-
validation, by minimizing the false classification rate
over a 2D-grid of ten thousand couples of values for
the two tuned parameters. This is costly but ensures
to find the global minimum. In order to have a very
high precision, this procedure is repeated three times
in a coarse to fine scheme. Finally, the optimal
values are used to learn the classifier on the entire
training set. In our application we have three classes
(“road”, “building” and “other”), and the “one-
against-all” multiclass SVM strategy is used. It
consists in using three binary SVM classifiers
independently, one for each class. During the
learning of one class, the elements of the training set
of the considered class are opposed to the elements
of the two other classes. This technique can provide
unbalanced training sets. However, in our
application this phenomenon is limited because we
have only three classes, and the training set is
composed of four hundred buildings, four hundred
roads and two hundred others. This training set was
built by manually assigning to a class some mean
shift areas situated outside the classification part
(outside the image of figure 1). It can be noted that
such a training set is designed only to classify parts
of the considered aerial image, and not parts of other
images with different illuminations. In fact, in our
application for each aerial image a training set is
built on parts of it, and the other parts are classified.
With the “one-against-all” approach, the final
decision can be taken by applying the “winner-take-
all” to the binary classifier probabilities (Melgani,
2004). Another possibility is to consider the final
binary classifier decisions (binary word) (Bishop,
2006). In that case, for three classes there are eight
possibilities and the five conflict situations (multiple
assignments) are generally handled by choosing
randomly one of the classes. In our application, it is
possible to handle conflicts by using a priori
knowledge and contextual information. In dense
urban area, the classes “building” and “road” are
largely predominant and have priority in case of
conflicts with the class “other”. Also, it has been
noticed by visualizing the “building” and “road”
conflict areas that contextual information can be
advantageously used. For example, if buildings (or
roads) mainly surround a conflict area, most of the
time it is a building (or a road). It would be
interesting in further work to compare this approach
with the one using the binary classifier probabilities
(Melgani, 2004). Also, combining contextual
information with probabilities would certainly be
optimal.
2.5 Classification Accuracy
Figure 3 shows the SVM classification results for
the image of figure 1, with the pattern {area,
eccentricity, mean of the RGB vector}. On the top:
superimposition of the binary SVM results. Detected
roads, buildings, and others are respectively drawn
in yellow, green and black. The “building” and
“road” conflict areas are shows in red. It can be
noticed that there are few conflicts. On the middle:
3-class SVM results after handling the previous
conflicts with contextual information. Conflicts were
generally well handled. Bottom: ground truth built
by visual interpretation. The red on the ground truth
corresponds to areas where it was visually difficult
to discriminate roads and buildings, and road or
building detection on these areas are considered as
exact. Computing the 3x3 confusion matrix (in terms
of pixels) between the classified image (an example
is in the middle of figure 3) and the ground truth
assesses classification accuracy. Some descriptive
measures computed from the confusion matrix are in
table 1.
Table 1: Classification accuracy measures.
3-class SVM, pattern {a,e,R,G,B}
Overall accuracy 0.60
Producer’s accuracy road 0.66
Producer’s accuracy building 0.58
Producer’s accuracy other 0.59
User’s accuracy road 0.47
User’s accuracy building 0.86
User’s accuracy other 0.35
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
454