has proven effective for discerning texture (Unser,
M., 1995). The texture feature vector consists of 14
coefficients (7 for the mean and 7 for the variance)
which are produced due to seven sub-bands that are
created for two (2) resolution levels. In total, the
feature vector used for indoor/outdoor classification
consists of 23 coefficients.
2.2 City/Landscape visual feature
extraction
In the case of city/landscape classification, robust
features are extracted using a combination of color
and structural information expressed by the line
segment orientation.
The color is considered in the same manner as in the
case of indoor/outdoor feature extraction. We obtain
a vector of 9 coefficients that have been computed
using the Equations 2-4. Together with color, we use
a line segment descriptor. The underlying idea is to
distinguish between long horizontal and vertical
contours that dominate in city images and short
length contours having other directions than either
horizontal or vertical that can be found in landscape
images. A similar contour descriptor has been
proposed in (Stauder et al., 2004) leading to the
extraction of a 12-bin histogram while in (Vailaya et
al., 2001) the edge direction distribution has been
proposed for the discrimination between city and
non-city images.
To construct the line segment descriptor which is a
histogram of line segment directions, we follow the
next steps. First, we apply an edge detection using
the Canny edge detector (Canny, J., 1986). The
produced edges are thinned and thereafter we try to
transform the edge representation into a line segment
representation. For this, we apply a non-parametric
curve segmentation into straight lines as it is
explained in (Rosin, P.,L., and West, G.A.W., 1995).
The direction of each straight line is calculated and
categorized as being either horizontal, vertical or
diagonal. Furthermore, the line segment length is
taken into account in order to be labelled as either
short or long segment. A segment will be considered
as a long one if it is greater than 10% of the
minimum dimension (either width or height).
Finally, a histogram with six (6) bins is computed. A
schematic representation of the different required
steps is shown at Figure 5. In total, the feature vector
used for city/landscape classification consists of 15
coefficients.
3 CLASSIFICATION - FEATURE
FUSION
In the particular binary classification problem
(indoor vs. outdoor and city vs. landscape) the
classification step was performed using two well-
known classification algorithms, K-NN
(Theodoridis, S., and Koutroumbas, K., 1997) and
Support Vector Machines (SVM) (Cortes C., and
Vapnik, V., 1995)( Vapnik, V., 1998)( Chang, C.C.,
and Lin, C.-J.).
Formally, the support vector machines (SVM)
require the solution of an optimisation problem,
given a training set of instance-label pairs (x
i
, y
i
),
i=1,…,m, where
n
i
R∈ and {1, 1}
m
i
y ∈− . The
optimisation problem is defined as follows :
,,
1
1
min
2
(())1
0
m
T
i
b
i
T
ii i
i
C
subject to y x b
ωξ
ωω ξ
φξ
ξ
=
+
≥−
≥
∑
(5)
According to this, training vectors x
i
are mapped into
a higher dimensional space by the function
. Then,
SVM finds a linear separating hyperplane with the
maximal margin in this higher dimensional space.
For this search, there are a few parameters that play
a critical role at the classification performance.
Firstly, the parameter C at Eq. 5, that applies a
penalty at the error term. Secondly, the so-called
kernel function denoted as :
(, ) () ()
T
ij i j
xx x x
φφ
≡
.
One of the main aspects in classification is the
interaction between the features and the available
classifiers. Mainly, there are two trends in this
interaction. Either different features are combined
into a final feature vector as the input to the
classifier (Lim, H-H., and Jin, J.S., 2005), (Stauder
et al., 2004), or feature vectors associated with
different modalities are fed into independent pattern
classifiers whose classification outputs are then
combined (Serrano et al., 2004), (Szummer, M., and
Picard, R., 1998), (Payne, A., and Singh, S., 2005).
These basic trends have shown both advantages and
disadvantages. A disadvantage of the latter trend is
that the training of multiple classifiers on individual
features may not be viable at all, as single feature
does not provide sufficient discriminative power,
resulting in many poor classifiers for fusion.
In our approach, we follow the former trend, where
the classifier’s input feature vector consists of a
concatenation of each feature that is considered for
the corresponding classification problem (indoor vs.
outdoor, or city vs. landscape). A detailed discussion
about these features has already been given at
Section 2.
SCENE CATEGORIZATION USING LOW-LEVEL VISUAL FEATURES
157