2.4. Get
m
ℵ as the union of the resulted m-
dimensional versions.
2.5. Apply the classification scheme to
m
ℵ
4 EXPERIMENTAL
PERFORMANCE
EVALUATION OF THE
PROPOSED ALGORITHM
A series of tests were performed in order to derive
conclusions about the performance of our method as
well as to test its performance against the k-means
algorithm. The stopping condition Cond=True holds
if IN the current iteration resulted at most NoRe re-
allots; in our tests NoRe was set to NoRe=10. The
tests were performed for M=4, the data being
randomly generated by sampling from normal
repartitions. Some of the repartitions were selected
to correspond to “well separated” classes some
others being generated to correspond to “bad
separated” subsets of classes, the working
assumption 2 not being necessarily fulfilled.
In order to obtain conclusions concerning
algorithm sensitivity to data dimensionality, several
tests were performed for n=2, n=4, n=6, n=8, n=10.
The tests on our algorithm and k-means pointed out
the following conclusions.
1. In cases when there is a natural grouping
tendency in data, the initial system of skeletons is pretty
close to the true one. In these cases, our algorithm gets
stabilized in a small number of iterations.
2. In case of data of relatively small size, the
number of misclassified components by our
algorithm is significantly less than the number of
misclassified data using k-means.
3. In cases of data of relatively small size, the
performance of k-means algorithm in identifying the
cluster structures is significantly less than the
performance of our method.
4. The k-means algorithm is significantly more
sensitive to data dimensionality, its performance
decreasing dramatically as the dimension n
increases.
5. In case of large sample sizes, the performance
of our method is comparable to the performance of
k-means.
Several tests were performed for “well
separated” classes, relatively “separated” and “bad
separated” respectively. In all tests, the performance
of k-means proved moderated, while our method
managed to identify the class structures and to
correctly classify most of data. The closeness degree
between the classes is computed in terms of the
Mahalanobis distance.
Some of the results are reported below.
A. M=4, n=4 and relative small size data. The
classes are weakly separated , the values of the
Mahalanobis distances are
⎟
⎟
⎟
⎟
⎟
⎠
⎞
⎜
⎜
⎜
⎜
⎜
⎝
⎛
09349.3691386.8466289.351
9349.36903931.2655993.214
1386.8463931.26501542428
6289.3515993.21415424280
.
.
In this case, the classification scheme managed
to discover the true structure of data in the initial
space, but using the compression for
3
m and
2
m its performance degraded dramatically. The k-
means algorithm did not manage to identify the
existing structure in the initial space. Some of he
results are summarized in the following table.
Note that for the samples
S
1
, S
2
and S
4
the k-
means failed to identify the cluster structures.
Table 1: The comparison of our method against k-means.
The sample
S
1
S
2
S
3
S
4
Number of
misclassified examples
by our method
0 2 0 0
Number of
misclassified examples
by k-means
276 253 19 311
umber of iterations 3 2 2 2
B. M=4, n=4 and relative small size data. In this
case, the true classes are better separated. The values
of the Mahalanobis distances are
⎟
⎟
⎟
⎟
⎟
⎠
⎞
⎜
⎜
⎜
⎜
⎜
⎝
⎛
06171.019.19733.0
6171.002827.04139.0
19.12827.004183.0
9733.04139.04183.00
10
3
In this case good results were obtained by
applying the proposed classification scheme in the
initial space as well as for m=3. All tests proved
better performances of our method as compared to k-
means. Some of the results are summarized in the
following table.
A NEW LEARNING ALGORITHM FOR CLASSIFICATION IN THE REDUCED SPACE
159