Tibshirani &al, 2001) each object is assumed to
come from one of a known set of classes, the
problem being to infer the true class for each data.
The test performed on data are based on a finite
feature set determined either by mathematical
techniques or empirically using a training set
containing data whose true classifications are
known.
During the past decade the classification and
assignment procedures have both found a large
series of applications related to information
extraction from large size data sets, this field being
referred as data mining and knowledge discovery in
databases. (Fayyad&al, 1996; Hastie, Tibshirani
&al, 2001)
Since similarity plays a key role for both
clustering and classification purposes, the problem
of finding a relevant indicators to measure the
similarity between two patterns drawn from the
same feature space became of major importance.
The most popular ways to express the
similarity/dissimilarity between two objects involve
distance measures on the feature space. (Jain, Murty,
Flynn, 1999). In case of high dimensional data, the
computation complexity could become prohibitive,
consequently the use of simplified schemes based on
principal components, respectively principal
coordinates, provides good approximations. (Chae,
Warde, 2006) Recently, alternative methods as
discriminant common vectors, neighborhood
components analysis and Laplacianfaces have been
proposed allowing the learning of linear projection
matrices for dimensionality reduction. (Liu, Chen,
2006; Goldberger, Roweis, Hinton, Salakhutdinov,
2004)
2 DISCRIMINANT ANALYSIS
There are several different ways in which linear
decision boundaries among classes can be stated. A
direct approach is to explicitly model the boundaries
between the classes as linear. For a two-class
problem in a n-dimensional input space, this
amounts to modeling the decision boundary as a
hyperplane that is a normal vector an a cut point.
One of the methods that explicitly looks for
separating hyperplanes is the well known perceptron
model of Rosenblatt (1958), that yielded to an
algorithm that finds a separating hyperplane in the
training data if one exists.
Another method, due to Vapnik (1996) finds an
optimally separating hyperplane if one exists, else
finds a hyperplane that minimizes some measures of
overlap in the training data.
In the particular case of linearly separable
classes, in discriminating between two classes, the
optimal separating hyperplane separates and
maximizes a distance to the closest point from either
class. Not only does this provide an unique solution
to the separating hyperplane problem, but by
maximizing the margin between the two classes on
the training data this leads to better classification
performance on test data and generalization
capacities.
When the data are not separable, there will be
now feasible solutions to this problem, and
alternative formulation is needed. The disadvantage
of enlarging the space using basis transformations is
that an artificial separation through over-fitting
usually results. A more attractive alternative seems
to be the support vector machine (SVM) approach,
which allows for overlap but minimizes a measures
of the extent of this overlap.
The basis expansion method represents the most
popular technique for moving beyond linearity. It is
based on the idea of augmenting/replacing the vector
of inputs with additional variables which are
transformations of it and the use of linear models in
the augmented new space of derived input features.
The use of the basis expansions allows the
achievement of more flexible representations of
data. Polynomials, also there are limited by their
global nature, piecewise-polynomials and splines
that allow for local polynomial representations,
wavelet basis, especially useful for modeling signals
and images are just few examples of sets of basis
functions. All of them produce a dictionary
consisting of typically a very large number of basis
functions, far more than one can afford to fit to data.
Along with the dictionary, a method is required for
controlling the complexity of the model using basis
functions from the dictionary. Some of the most
popular approaches are restriction methods, where
we decide before-hand to limit the class of functions,
selections methods, which adaptively scan the
dictionary and include only those basis functions
that contribute significantly to the fit of the model
and regularization methods (as, for instance, Ridge
regression), where the entire dictionary is used but
restrict the coefficients.
Support Vector Machines (SV) are an algorithm
introduced by Vapnik and coworkers theoretically
motivated by VC theory. (Cortes, Vapnik, 1995;
Friess, Cristianini & al., 1998) SVM algorithm
works by mapping training data for classification
tasks into a higher dimensional feature space. In this
SIGMAP 2007 - International Conference on Signal Processing and Multimedia Applications
86