method selects optimal feature subsets according to
Lorentzian metric.
3.2 Pre-processing and Optimal
Parameters
In classification problem occasionally a
preprocessing step is necessarily. Because of better
representing and making usable a data set this
operation can enhance the classification success rate.
In this study, the preprocessing step is composed only
from matrix multiplication (compression) (Marcus
and Minc, 1992). This transformation matrix is used
with the aim to make the data meaningful in
Lorentzian space. Thus, after doing compression over
n-dimensional =
(
,
,…,
)
training set in
Euclidean space it is transformed as
=
(
,
,…,
) and becomes suitable for training and
classification in Lorentzian space. This preprocessing
step can be defined as the following expression:
=
(12)
Where, is the diagonal matrix which can be
expressed by
=0,if≠∀,∈
1,2,…,
.
Hence, the transformation matrix that forms the
preprocessing step for two dimensional data is
determined as following formulas:
=
0
0
or
=
0
0
(13)
where, ,∈.
In this study, the first form of transformation
matrix was used. The relation between the
parameters , of this matrix
is as=20∗.
Hence, the primary case is assumed as:
=
20
00.1
However, our research shows us that these
parameters meanings are significant in term of
classification success.
Because of this the optimal
meanings of parameters which produce the best
classification output were also investigated in
experiments.
4 LORENTZIAN CLASSIFIER
Generally, a classification process consists from
training and test steps. In this study, preparing the
data for training is done in two steps. First of all, the
optimal feature pair subsets are selected by new
proposed FSLS method. Subsequently, over these
feature subsets the pre-processing operation is
applied that mentioned in third section. For training
of selected and transformed feature subsets the
Classification via Lorentzian Metric (CLM)
(Kerimbekov et al., 2016) method was improved. The
classification algorithm CLM is valid in two
dimensional Lorentzian space and based on
Lorentzian distance. The CLM classifier assigns the
class label of new sample according to Lorentzian
distances that explained by formula (1). It means that,
the k nearest pairs are selected by Lorentzian metric.
These pairs define the relation of a test sample
between k training set samples and finally the
classification can be done by using the majority rule.
The CLM method was described as a classifier in two
dimensional Lorentzian space. However, in our
research, we use the multidimensional data sets.
Therefore, the CLM method was improved by adding
the supplementary decision rule and hereinafter
referred to as the Lorentzian Distance Classifier for
Multiple Features (LDCMF).
The proposed novel LDC method is the aggregate
of next stages. The novel LDC method takes as the
inputs ,∈ℝ training and test sets. However, as
mentioned before, the training data sets are separated
to feature pair subsets by (4). Namely, in first step
from the training set all possible
(
,2
)
feature
pair subsets are occurred as
=
(
,
,
,
,…,
,
)
. Subsequently, the
produced
(
,2
)
feature pair subsets are weighted by
criterion. Thereafter, the =(1,) number
optimal feature pair subsets are selected by FSLS
method that based on Lorentzian metric. Here,
defines the total number of feature combination (fc)
pairs. The selected feature pair subsets are
compressed by (12) formula and becomes ready for
training. The new LDC classifier has iteration in
length. This value is also used as a threshold for
stopping in the proposed algorithm. According to
how will be defined the meaning of less or more the
computational time of proposed algorithm is changed.
Furthermore, was found that the selected feature pair
subsets
by including the efficient features
represents the original data set in best way. Thus, the
selected feature pair subsets
are used in proposed
LDC classifier as training data set.
For new sample coming from
test set feature
selection and preprocessing step that explained before
are applied as like in training samples case.
Subsequently, the class labels of test samples are
assigned as
,=(1,). The determined
is the
class label of . feature pair from
which respective
to
. It means that, the new proposed LDC classifier
in testing stage of new coming sample is iterated
times. In every iteration the new proposed classifier
ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods