level of significance is set to 0.01 (Rice, 1995).
Tables 2 and 3 give the Mean Classification Error
(MCE) obtained when not performing any
dimension reduction.
We start with two general observations: First, the
quadratic classifier, in general, gives better results
for most of the data sets. This may indicate that in
most data sets, there is indeed information
separation present in the second order moments of
the class distributions. Second, the average error
rates after reduction to d=1 or d=2 remain, in
general, smaller than those in the full space, thus
confirming that a gain in performance can be
achieved by reducing the dimensionality of the
problem.
Also, note that the average error rates of the PF
method compare favorably to those of other
techniques for the 1d and 2d subspace dimensions.
This advantage seems to correlate with the difficulty
of the classification problem. In particular, for linear
and quadratic classifier, PF is uniformly
superior to
other methods.
We begin with the analysis of the two-class
problem (data sets a, b, c and d).In case of using the
nearest mean classifier; we can see that the Patrick-
Fisher criteria as well as the LDA ranked better
result than ACC. For the quadratic and linear
classifiers, the optimal results were provided by PF
and ACC, with the best overall performance
significantly different from the best performances of
the LDA technique. Note that the performance of
LDA is seriously limited by the constraint d < K
(number of classes).
We now turn to the analysis of the multi-classes
case were the K-fold CV was used (data sets e, f, i
and j). Clearly, a similar analysis of the two class
case is observed: where the advantage of PF persists
and it is much better than LDA. Note that the PF and
ACC error rates are in order of 10
-2
whereas those of
the LDA are in the order of 10
-1
.
For data sets (g and h) where validation is based
on a test set, the best error rates are those given by
PF and LDA, these methods provide much better
separability in data set than the ACC criteria or all
classifiers results.
4 CONCLUSIONS
In this paper, 2D dimensionality reduction method is
proposed. Its novelty lies on the study of a new L
2
probabilistic dependence measure estimate obtained
by the orthogonal Fourier series expansion.
The real dataset experiments show that the
suggested method increases the separability measure
between the projected classes onto the reduced space
consistently better than the well-known LDA
method.
Since results given by the proposed method are
promising and could be used as a step before a
classification process. We will concentrate our
future work on the evaluation of the effectiveness of
this method by studying the classification accuracy
in term of the probability error.
REFERENCES
Aladjem, M. E., (1996). Two class pattern discrimination
via recursive optimization of Patrick-Fisher distance.
Proc. of the 13th International Conference on Pattern
Recognition (ICPR), vol. 2, pp. 60-64.
Devijver, P. A. and Kittler, J. (1982). Pattern Recognition:
A Statistical Approach. London: Prentice-Hall.
Drira, W. and Ghorbel, F. (2011). Une 2D-réduction de
dimension par un estimateur de la distance en
probabilité de Patrick Fisher. 43èmes Journées de
Statistique, Tunis.
Drira, W. and Ghorbel F. (2010). Réduction de dimension
par un nouvel estimateur de la distance de Patrick
Fisher à l’aide des fonctions orthogonales. 42èmes
Journées de Statistique, Marseille.
Fisher, R. A. (1936). The Use of Multiple Measurements
in Taxonomic Problems. Annals of Eugenics, vol. 7,
179-188.
Fukunaga, K. (1990) Introduction to Statistical Pattern
Recognition. New York: Academic Press.
Hillion, A. (1988). Une méthode de classification de
textures par extraction linéaire non paramétrique de
caractéristiques. Traitement du signal, Volume 5,N° 4.
Loog, M. et al. (2001). Multiclass Linear Dimension
Reduction by Weighted Pairwise Fisher Criteria, IEEE
trans. on PAMI., Vol. 23 N°7.
Murphy, P. M. and Aha D. W., (2004).UCI Repository of
Machine Learning Databases, http://archive.ics.uci.edu
/ml/citation_policy.html.
Nenadic, Z., (2007) Information Discriminant Analysis:
Feature Extraction with an Information-Theoric
Objective. IEEE Trans. on PAMI, Vol. 29 N° 8.
Patrick, E A. and Fisher P F (1969). Non parametric
feature selection. IEEE Trans On In. Theory, Vol. 15,
577-84.
Rice, J. A. (1995), Mathematical Statistics and Data
Analysis, second ed. Belmont: Duxbury Press.
DIMENSION REDUCTION BY AN ORTHOGONAL SERIES ESTIMATE OF THE PROBABILISTIC DEPENDENCE
MEASURE
317