on large datasets, we believe that this is not a short-
coming, but merely shows that the need to select fea-
tures is less pressing if enough data are available.
From the point of view of computational expenses,
in order to make AMIFS more applicable to large
amount of data, one has to think about an approxi-
mate implementation which can cut down the compu-
tational complexity or, for example, consider using a
hybrid scheme, i.e. starting with AMIFS and then af-
ter some iterations switching to ATM, which does not
require estimating multivariatedensities and therefore
is computationally cheaper.
In the future, we want to develop a neural imple-
mentation of our feature selection scheme. The brain
certainly faces a similar problem when it has to decide
which features are really relevant to classify a new ob-
servation. A neural model could thus provide insights
into how this ability can be achieved. Furthermore,
we would like to investigate to what extent informa-
tion theory provides guiding principles for informa-
tion processing in the brain. In addition, adaptive
feature selection could be accomplished via recurrent
processing interleaving bottom-up and top-down pro-
cesses.
REFERENCES
Abe, S. (2005). Modified backward feature selection by
cross validation. In Proc. of the Thirteenth European
Symposium on Artificial Neural Networks, pages 163–
168, Bruges, Belgium.
Battiti, R. (1994). Using mutual information for selecting
feature in supervised neural net learning. IEEE Trans.
Neural Networks, 5(4):537–550.
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu,
R., Desjardins, G., Turian, J., Warde-Farley, D., and
Bengio, Y. (n.d.). Deep learning tutorials. Retrieved
from: http://deeplearning.net/tutorial/lenet.html.
Bonnlander, B. V. (1996). Nonparametric selection of in-
put variables for connectionist learning. PhD thesis,
University of Colorado at Boulder.
Bonnlander, B. V. and Weigend, A. S. (1994). Selecting in-
put variables using mutual information and nonpara-
metric density estimation. In International Sympo-
sium on Artificial Neural Networks, pages 42–50, Tai-
wan.
Cover, T. M. and Thomas, J. A. (1991). Elements of in-
formation theory. Wiley-Interscience, New York, NY,
USA. pp. 12-49.
Ding, C. H. Q. and Peng, H. (2005). Minimum redun-
dancy feature selection from microarray gene expres-
sion data. J. Bioinformatics and Computational Biol-
ogy, 3(2):185–206.
Duch, W., Wieczorek, T., Biesiada, J., and Blachnik, M.
(2004). Comparison of feature ranking methods based
on information entropy. In Proc. of the IEEE Inter-
national Joint Conference on Neural Networks, pages
1415–1419, Budapest, Hungary.
Geman, D. and Jedynak, B. (1996). An active testing model
for tracking roads in satellite images. IEEE Trans.
Pattern Analysis and Machine Intel, 18(1):1–14.
Hubel, D. and Wiesel, T. (2005). Brain and visual percep-
tion: the story of a 25-year collaboration. Oxford
University Press US. p. 106.
Jiang, H. (2008). Adaptive feature selection in pattern
recognition and ultra-wideband radar signal analysis.
PhD thesis, California Institute of Technology.
Kwak, N. and Choi, C. (2002). Input feature selection by
mutual information based on parzen window. IEEE
Trans. Pattern Analysis and Machine Intel, 24:1667–
1671.
LeCun, J. and Cortes, C. (n.d.). The mnist
dataset of handwritten digits. Retrieved from:
http://yann.lecun.com/exdb/mnist/.
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998).
Gradient-based learning applied to document recogni-
tion. Proc. of the IEEE, 86(11):2278–2324.
Najemnik, J. and Geisler, W. S. (2005). Optimal eye move-
ment strategies in visual search. Nature, 434:387–391.
Narendra, P. and Fukunaga, K. (1977). A branch and bound
algorithm for feature subset selection. IEEE Trans.
Comput., 28(2):917–922.
Parzen, E. (1962). On estimation of a probability den-
sity and mode. Annals of Mathematical Statistics,
35:1065–1076.
Raudys, S. J. and Jain, A. K. (1991). Small sample size ef-
fects in statistical pattern recognition: Recommenda-
tions for practitioners. IEEE Trans. on Pattern Analy-
sis and Machine Intelligence, 13:252–264.
Renninger, L. W., Verghese, P., and Coughlan, J. (2007).
Where to look next? Eye movements reduce local un-
certainty. Journal of Vision, 7(3):117.
Rosenblatt, M. (1956). Remarks on some nonparametric es-
timates of a density function. Annals of Mathematical
Statistics, 27:832–837.
Scott, D. W. (1992). Multivariate Density Estimation: The-
ory, Practice, and Visualization. John Wiley. pp. 125-
206.
Silverman, B. W. (1986). Density estimation for statistics
and data analysis. Chapman and Hall.
Turlach, B. A. (1993). Bandwidth selection in kernel den-
sity estimation: a review. In CORE and Institut de
Statistique, pages 23–493.
Webb, A. (1999). Statisctical Pattern Recognition. Arnold,
London. pp. 213-226.
Zhang, X., King, M. L., and Hyndman, R. J. (2004). Band-
width selection for multivariate kernel density estima-
tion using MCMC. Technical report, Monash Univer-
sity.
IJCCI2012-InternationalJointConferenceonComputationalIntelligence
482