be used for estimating the
ˆ
d
PF
, we propose to con-
sider those minimizing the MISE criteria of this esti-
mator. We define this latter as:
MISE = J(k
N1
,k
N2
) = E(|d
PF
−
ˆ
d
PF
|
2
) (12)
Note that the simplicity of the MISE expression
for orthogonal density estimator does not seem to be
the same for the
ˆ
d
PF
. Hence, a numerical evaluation
of k
N
1
and k
N
2
is extremely complex. For solving the
optimal choice problem for the
ˆ
d
PF
, we propose to use
for each class orthogonal density estimator the opti-
mal value determined by the method described in sec-
tion 2. Rather than used a pre-specified values, this
choice seems to be reasonable to minimize the MISE
of the
ˆ
d
PF
. To verify this purpose, we give the follow-
ing simulation study. We generate two samples data
from two different Gaussians distributions with pa-
rameters µ
1
= 1, Var
1
= 3 and µ
2
= 3, Var
2
= 1. Each
sample have a size equal to 1000. We vary k
N1
from 1
to
√
N1 and k
N2
from 1 to
√
N2 and we calculate
ˆ
d
PF
for each pair (k
N1
,k
N2
) by considering k
N1
terms and
k
N2
terms to estimate respectively the PDF of the first
sample and the second one. The theoretical d
PF
can
be calculated since we have the analytical expression
of the Gaussian PDF of each sample. We approxi-
mate the integral in the expression of d
PF
by using
the Simpson’s method (Atkinson, 1989). To estimate
the expectation in the expression of J(k
N1
,k
N2
), we
generate samples one hundred times and we calculate
the means of the square difference between the d
PF
and its orthogonal estimation
ˆ
d
PF
. Figure 2 shows the
values of J(k
N1
,k
N2
). The pair (k
N1
,k
N2
) that mini-
mizes J(k
N1
,k
N2
) is selected to be used as the optimal
values of k
N1
and k
N2
.
Based on an extensive simulation, the values of
k
N1
and k
N2
which minimize respectively the orthog-
onal density estimate of the first class and the sec-
ond one give a sub-optimal solution to minimize
J(k
N1
,k
N2
). This choice could be useful when we
have no information about the PDFs of data which
corresponds generally to the case of real world data.
4 EXPERIMENTAL RESULTS
In this section, we intend to compare the perfor-
mances of the dimensionality reduction method based
on the
ˆ
d
PF
described above with the LDA both on sim-
ulated data and on real world dataset. To do that, we
evaluate the classification accuracy of a nonparamet-
ric Bayesian classifier that is applied on the projected
data onto the reduced space. We evaluate the classifi-
cation accuracy by counting the number of misclassi-
fied samples obtained by the classifier over all classes
of the projected data.
Figure 2: Values of J(k
N1
,k
N2
) against the different val-
ues of pair (k
N1
,k
N2
). In red color the selected minima of
J(k
N1
,k
N2
).
4.1 Experiment with Simulated Data
This experiment concerns the two-class case. Vec-
tors data from the first class are drawn from a multi-
variate Gaussian distribution with mean vector µ
1
=
(3...3)
T
. For the second class, vectors data are gen-
erated from a mixture of two multidimensional Gaus-
sians distributions. The first distribution has a mean
vector µ
2
= (2...2)
T
and the second has a mean vector
µ
3
= (4...4)
T
. We consider for all these distributions
the same covariance matrix
∑
= 2I where I denotes
the identity matrix. The sample size for each class
is equal to 1000 and generated vectors have a dimen-
sion equal to 14. We search for the projection vector
W that map the generated data onto the optimal one-
dimensional subspace according to the two methods
of reduction studied. Note that the used system of the
orthogonal functions is the trigonometric one (Hall,
1982). After finding the projection vector W accord-
ing to each method, simulated data are projected onto
the reduced space. Then we applied a Bayesian clas-
sifier on the projected data obtained. Classification
results are summarized in Table 1. We remark that the
dimensional reduction accuracy of the method based
on the
ˆ
d
PF
is better than the LDA.
Table 1: Classification results of experiment with simulated
data.
LDA
method
Method
based on
ˆ
d
PF
Misclassification
rate
0.47 0.22
The LDA method fails to find an optimal subspace
in which satisfactorily class separation is obtained
since the original simulated data contain multimodal
distribution. However, the method based on the
ˆ
d
PF
succeeds to overcome the restriction of unimodality.
The success of this latter method can be explained by
the fact that the
ˆ
d
PF
based method accounts for higher
ICPRAM2013-InternationalConferenceonPatternRecognitionApplicationsandMethods
328