sample
{}
i
x of patterns in ℵ drawn independently
and identically distributed (i.i.d.) from some
unknown data distribution with density P(x), the
goal being to estimate either the density or a
functional thereof. Supervised learning consists of
estimating a functional relationship x → y between a
covariate
ℵ∈x and a class variable
{}
My ,...,2,1∈ ,
with the goal of minimizing a functional of the joint
data distribution P(x, y) such
as the probability of classification error.
The terminology “unsupervised learning” is a bit
unfortunate: the term density estimation should
probably suit better. Traditionally, many techniques
for density estimation propose a latent (unobserved)
class variable y and estimate P(x) as
mixture distribution
()
()
∑
=
M
y
yPyxP
1
. Note that y has
a fundamentally different role than in classification,
in that its existence and range c is a modeling choice
rather than observable reality.
The semi-supervised learning problem belongs to
the supervised category, since the goal is to
minimize the classification error, and an estimate of
P(x) is not sought after. The difference from a
standard classification setting is that along with a
labeled sample
()
niyxD
iil
,...,1, == drawn i.i.d.
from P(x, y) we also have access to an additional
unlabeled sample
mjxD
jnu
,...,1==
+
from the
marginal P(x). We are especially interested in cases
where n«m which may arise in situations where
obtaining an unlabeled sample is cheap and easy,
while labeling the sample is expensive or difficult.
Principal Component Analysis, also called
Karhunen-Loeve transform is a well-known
statistical method for feature extraction, data
compression and multivariate data projection and so
far it has been broadly used in a large series of signal
and image processing, pattern recognition and data
analysis applications.
The advantages of using principal components
reside from the fact that bands are uncorrelated and
no information contained in one band can be
predicted by the knowledge of the other bands,
therefore the information contained by each band is
maximum for the whole set of bits (Diamantaras,
1996).
Recently, alternative methods as discriminant
common vectors, neighborhood components analysis
and Laplacianfaces have been proposed allowing the
learning of linear projection matrices for
dimensionality reduction. (Liu, Chen, 2006;
Goldberger, Roweis, Hinton, Salakhutdinov, 2004)
The aims of the research reported in this paper
are to report experimentally derived conclusions on
the performance of a PCA-based supervised
technique in a semi-supervised environment.
The structure of a class is represented in terms
of the estimates of its principal directions computed
from data, the overall dissimilarity of a particular
object with a given class being given by the
“disturbance” of the structure, when the object is
identified as a member of this class. In case of
unsupervised framework, the clusters are computed
using the estimates of the principal directions, that is
the clusters are represented in terms of skeletons
given by sets of orthogonal and unit eigen vectors
(principal directions) of each cluster sample
covariance matrix. The reason for adopting this
representation relies on the property that a set of
principal directions corresponds to the maximum
variability of each class.
A series of conclusions experimentally
established by tests performed on samples of signals
coming from two classes are exposed in the final
section of the paper.
2 THE MATHEMATICS BEHIND
THE PROPOSED ATTEMPT
The classes are represented in terms of multivariate
density functions, and an object coming from a
certain class is modeled as a random vector whose
repartition has the density function corresponding to
this class. In cases when there is no statistical
information concerning the set of density functions
corresponding to the classes involved in the
recognition process, usually estimates based on the
information extracted from available data are used
instead.
The principal directions of a class are given by a
set of unit orthogonal eigen vectors of the
covariance matrix. When the available data is
represented by a set of objects
N
XXX ,...,,
21
,
belonging to a certain class C, the covariance matrix
is estimated by the sample covariance matrix,
()()
∑
=
−−
−
=Σ
N
i
T
NiNiN
XX
N
1
ˆˆ
1
1
ˆ
μμ
, (1)
where
∑
=
=
N
i
iN
X
N
1
1
ˆ
μ
.
Let us denote by
N
n
NN
λλλ
≥≥≥ ...
21
the eigen
values and by
N
n
N
ψψ ,...,
1
a set of orthonormal
eigen vectors of
N
Σ
ˆ
.
TOWARD A SEMI-SUPERVISED APPROACH IN CLASSIFICATION BASED ON PRINCIPAL DIRECTIONS
69