define a new framework for classification problems
where we have one partition of the data to learn (in
that case the subject partition) and another partition
that is independent from the firs one (in that case the
artifacts partition.
In this paper, we propose a feature selection
method to add discriminant information to the PCA
algorithm using the mutual information measure that
is suitable for this new defined classification frame-
work. In contrast to previous works using mutual in-
formation in feature selection, our proposal allows to
add independent selection criteria, in order to neu-
tralize, in practise, possible artifact effects that are
equally present in the whole data space. Misleading
relevant components not related to the classification
tasks are discarded, reducing the effect of the artifacts
in the data. The proposed process is detailed and dis-
cussed in section 3.
We validate our proposal in a face recognition
problem using the AR Face data set. To the feature
selection for subject classification task, we consider
also another data partition based on the light con-
ditions and occlusions, obtaining a PCA representa-
tion that retains only the information useful for pos-
terior classification, filtering the misleading compo-
nents present in the data space. The experiments per-
formed, that are detailed in section 4, show significant
improvements in comparison to the classic PCA ap-
proach using the first eigenvectors with larger eigen-
value, and the mutual information methods found in
recent literature.
Finally, in section 5 we discuss the proposed ap-
proach and conclude the work. Moreover we suggest
some future research lines related with the proposed
new framework for classification.
2 PROBLEM STATEMENT
Let be X a set. Suppose that we have two partitions of
this set, C and K, that is
X = C
1
[
...
[
C
a
= K
1
[
...
[
K
b
(1)
where C
α
T
C
β
=
/
0 and K
α
T
K
β
=
/
0 for all α,β. Sup-
pose also that they are equidistributed in the sense that
p(C
α
) = p(C
β
) and p(K
α
) = p(K
β
) for all α, β.
We call they are independent partitions if they ac-
complish the following property:
p(C
α
|K
γ
) = p(C
β
|K
γ
) (2)
for all α, β, γ. Notice that from the Bayes Rule we
have the symmetric property
p(K
α
|C
γ
) = p(K
β
|C
γ
) (3)
Figure 1: (a). Here we can see two independent partitions
of the set: the one done by the grey labels (3 classes) and
the other by the texture (2 classes). Notice that both are
equidistributed and the property of the independent parti-
tions’ definition (also the symmetric one) is verified. (b) In
this example two equidistributed partitions are also shown.
However, in this case they are not independent. Notice,
for instance, that P(rough texture|dark grey) < P(smooth
texture|dark grey), what means that if we know information
according to one of the partitions we have implicitly infor-
mation according to the other one.
given that both partitions are equidistributed (in par-
ticular K).
An intuitive idea of this definition is the following:
when we know the class of an element in X according
one of the partitions, we do not have any information
about its class according to the other partition. Figure
1 illustrates this independence concept for partitions
in a 2-dimensional subspace.
Independent partitions of data can be found in
real problems. For example, considering a set of
manuscript symbols, they can be partitioned accord-
ing to which symbol appears in the image (partition
C) or according to the person who drew it (partition
K). On the other hand, considering a set of face im-
ages having some kind of artifacts (scarfs, sunglasses,
highlights or none) we can divide the set according to
the subject that is in the image (partition C) or ac-
cording to the appearing artifact (partition K). Then,
assuming that the artefact do not depend on the sub-
ject, we have also two independent partitions of the
set.
Let us focus in this second example of the faces
set. In that case, subject classification is a usual task
to explore in machine learning. The common proce-
dure is to consider some labelled samples (according
to C) and learn a classifier from this information.
However, suppose that we can have the training
data also labelled according the partition of the arti-
facts (K). These labels are not used in the training
VISAPP 2008 - International Conference on Computer Vision Theory and Applications
62