quently, decent improvement within the available
headroom is possible.
In order to provide further insights into the ef-
fect of neural feature transformation, a graphical rep-
resentation of the patterns in the original and trans-
formed feature space is given. Since the data are of
high dimensionality, we employed principal compo-
nent analysis for a two-dimensional projection of pat-
terns, which can conveniently be plotted in a diagram.
Here, we can see all patterns projected onto the plane
defined by the two orthogonal directions with max-
imal variance in the data set. These directions are
not necessarily the best directions in terms of sepa-
ration of the categories, but are based on the variance
within the entire set of patterns. We show the two-
dimensional principal component plots of the original
and mapped points in Figure 4.
These plots can give us a fairly good understand-
ing of the classification accuracy with varying K as
depicted in Figure 3. Despite the fact that they are
only two-dimensional and there are one or more di-
mensions missing, they do illustrate the ability of
the evolved neural transformation functions to break
up linear dependencies within the first two principal
axes. We can clearly see the ability to optimize the
local neighborhood of most data points in terms of
class membership. But we can also see their inability
to create distinct clusters for specific categories.
5 SUMMARY AND
CONCLUSIONS
In this paper we have introduced the use of multi-
layer perceptrons as nonlinear functions for feature
construction in classification tasks. Our key contri-
bution is that we evolve a transformation function in-
stead of a classifier. An evolutionary algorithm is used
to evolve weights and biases of the neural networks
directly encoded in a bit string. The classification ac-
curacy of a K-nearest-neighbor classifier with K = 1
has been used to determine the fitness of the neural
networks transforming the original feature vectors to
a lower dimension. Plots of the development of the
fitness values over time indicate that this approach is
able to find excellent solutions, and that a stable opti-
mization does take place. We evaluated this approach
on four commonly used data sets using jackknifing
(leave-one-out) for evaluating the classification accu-
racy. To the extent possible we compared the perfor-
mance of our approach with related work. In addi-
tion we measured a performance baseline on the raw
(untransformed) data. The neural feature construction
presented in this paper delivers performance improve-
ments of 4, 5, 12, and 13 percentage points over these
baseline figures, outperforming the related work in
three out of four cases. We believe that we have thus
delivered a proof of concept for evolutionary neural
transformation functions on actual data.
REFERENCES
Aggarwal, C. (2010). The Generalized Dimensionality Re-
duction Problem. In Proceedings of the SIAM Interna-
tional Conference on Data Mining, SDM 2010, pages
607–618. SIAM.
Aizerman, A., Braverman, E. M., and Rozoner, L. I. (1964).
Theoretical Foundations of the Potential Function
Method in Pattern Recognition Learning. Automation
and Remote Control, 25:821–837.
B¨ack, T. (1996). Evolutionary Algorithms in Theory and
Practice. Oxford University Press.
Bishop, C. M. (1995). Neural Networks for Pattern Recog-
nition. Oxford University Press, Inc., New York, NY,
USA.
Chin, T.-J. and Suter, D. (2006). Incremental Kernel PCA
for Efficient Non-linear Feature Extraction. In Pro-
ceedings of the 17th British Machine Vision Confer-
ence, pages 939–948. British Machine Vision Associ-
ation.
Coelho, A., Weingaertner, D., and von Zuben, F. J. (2001).
Evolving Heteregeneous Neural Networks for Classi-
fication Problems. In Proceedings of the Genetic and
Evolutionary Computation Conference, pages 266–
273, San Francisco. Morgan Kaufmann.
Cortez, P., Cerdeira, A., Almeida, F., Matos, T., and Reis, J.
(2009). Modeling Wine Preferences by Data Mining
from Physicochemical Properties. Decision Support
Systems, 47(4):547–553.
Frank, A. and Asuncion, A. (2010). UCI Machine Learning
Repository.
Guo, H. and Nandi, A. K. (2006). Breast Cancer Diagnosis
using Genetic Programming Generated Feature. Pat-
tern Recognition, 39(5):980–987.
Ittner, A. and Schlosser, M. (1996). Discovery of Rele-
vant New Features by Generating Non-Linear Deci-
sion Trees. In Proceedings of the Second International
Conference on Knowledge Discovery and Data Minin,
pages 108–113. AAAI.
John, G. H., Kohavi, R., and Pfleger, K. (1994). Irrelevant
Features and the Subset Selection Problem. In In-
ternational Conference on Machine Learning, pages
121–129.
Kim, K. I., Franz, M. O., and Scholkopf, B. (2005). Itera-
tive Kernel Principal Component Analysis for Image
Modeling. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 27(9):1351–1366.
Mayer, A. and Mayer, H. A. (2006). Multi–Chromosomal
Representations in Neuroevolution. In Proceedings
of the Second IASTED International Conference on
Computational Intelligence. ACTA Press.
NONLINEAR FEATURE CONSTRUCTION WITH EVOLVED NEURAL NETWORKS FOR CLASSIFICATION
PROBLEMS
43