distance will be described. After that, we will devote
section 4 to the introduction of the criterion which
will be given as the comparison study between the
neural and the statistical classifiers. Simulation
results are presented and analyzed in section 5. In the
last section, we will apply this comparative study to
the evaluation of real pattern recognition problem. So
we intend to test the different classifiers stability and
performance for the handwritten digits recognition
problem by classifying their corresponding Fourier
Descriptors. Such features form a set of invariant
parameters under similarity transformations and
closed curve parameterizations. This set has good
proprieties as completeness and stability.
2 NEURAL APPROACHES
The most used and studied networks category is the
mixed NNs, which present a combination of the
features extractors NNs and the classifiers ones. Once
the first networks layers carry out the primitive
extraction, the last layers classify the extracted
features. An interesting example is the Multi-Layer
Perceptron.
2.1 Multi-Layer Perceptron: MLP
Based on the results from (Steven, 1991), a MLP with
one hidden layer is generally sufficient for most
problems including the classification. Thus, all used
networks in this study will have a unique hidden
layer. The number of neurons in the hidden layer
could only be determined by experience and no rule
is specified. However, the number of nodes in the
input and output layers is set to match the number of
input and target parameters of the given process,
respectively. Thus, the NNs have a complex
architecture that the task of designing the optimal
model for such application is far from easy.
In order to reduce the difference between the
ANN outputs and the known target values, the
training algorithm estimates the weights matrices,
such that an overall error measure is minimized. The
proposed technique requires improvements for MLP
with the back-propagation algorithm.
2.2 Neural Networks Critics
Although the effectiveness and significant progress of
ANNs in several applications, and especially the
classification process, they present several limits.
First, the MLP desired outputs are considered as
homogeneous to a posterior probability. Till today, no
proof of this approximation quality has been
presented. Second, the NNs have a complex
architecture that the task of designing the optimal
model for such application is far from easy. Unlike
the simple linear classifiers which may underfit the
data, the NNs architecture complexity tends to overfit
the data and causes the model instability. Breiman
proved, in (Breiman, 1996), the instability of ANNs
classification results. Therefore, a large variance in its
prediction results can be introduced after small
changes in the training sets. Thus, a good model
should find the equilibrium between the under-fitting
and the over-fitting processes.
Qualified by their instability, the neural classifiers
produce a black box model in terms of only crisp
outputs, and hence cannot be mathematically
interpreted as in statistical approaches. Thus, we
recall in the next section some statistical methods
such the basic linear discriminate analysis and the
proposed Patrick-Fischer distance estimator.
3 STATISTICAL APPROACHES
The traditional statistical classification methods are
based on the Bayesian decision rule, which presents
the ideal classification technique in terms of the
minimum of the probability error. However, in the
non parametric context, applying Bayes classifier
requires the estimation of the conditional probability
density functions. It is well known that such task
needs a large samples size in high dimension.
However, a dimension reduction is required in the
first step.
3.1 Linear Discriminate Analysis: LDA
The linear discriminate analysis is the most well-
known approach in supervised linear dimension
reduction methods since this popular method is based
on scatter matrices. In the reduced space, the between
scatter matrices are maximized while the within class
ones are minimized. To that purpose, the LDA
considers searching for orthogonal linear projection
matrix W that maximizes the following so-called
Fisher optimization criterion (Fukunaga, 1990):
)(
)(
)(
WSWtrace
WSWtrace
WJ
w
T
b
T
(1)
S
w
is the within class scatter matrix and S
b
is the
between class scatter one. Their two well-known
expressions are given by:
ComparisonofStatisticalandArtificialNeuralNetworksClassifiersbyAdjustedNonParametricProbabilityDensity
FunctionEstimate
673