Adaptive Sequential Feature Selection for Pattern Classification

Liliya Avdiyenko, Nils Bertschinger, Juergen Jost


Feature selection helps to focus resources on relevant dimensions of input data. Usually, reducing the input dimensionality to the most informative features also simplifies subsequent tasks, such as classification. This is, for instance, important for systems operating in online mode under time constraints. However, when the training data is of limited size, it becomes difficult to define a single small subset of features sufficient for classification of all data samples. In such situations, one should select features in an adaptive manner, i.e. use different feature subsets for every testing sample. Here, we propose a sequential adaptive algorithm that for a given testing sample selects features maximizing the expected information about its class. We provide experimental evidence that especially for small data sets our algorithm outperforms two the most similar information-based static and adaptive feature selectors.


  1. Abe, S. (2005). Modified backward feature selection by cross validation. In Proc. of the Thirteenth European Symposium on Artificial Neural Networks, pages 163- 168, Bruges, Belgium.
  2. Battiti, R. (1994). Using mutual information for selecting feature in supervised neural net learning. IEEE Trans. Neural Networks, 5(4):537-550.
  3. Bonnlander, B. V. (1996). Nonparametric selection of input variables for connectionist learning. PhD thesis, University of Colorado at Boulder.
  4. Bonnlander, B. V. and Weigend, A. S. (1994). Selecting input variables using mutual information and nonparametric density estimation. In International Symposium on Artificial Neural Networks, pages 42-50, Taiwan.
  5. Cover, T. M. and Thomas, J. A. (1991). Elements of information theory. Wiley-Interscience, New York, NY, USA. pp. 12-49.
  6. Ding, C. H. Q. and Peng, H. (2005). Minimum redundancy feature selection from microarray gene expression data. J. Bioinformatics and Computational Biology, 3(2):185-206.
  7. Duch, W., Wieczorek, T., Biesiada, J., and Blachnik, M. (2004). Comparison of feature ranking methods based on information entropy. In Proc. of the IEEE International Joint Conference on Neural Networks, pages 1415-1419, Budapest, Hungary.
  8. Geman, D. and Jedynak, B. (1996). An active testing model for tracking roads in satellite images. IEEE Trans. Pattern Analysis and Machine Intel, 18(1):1-14.
  9. Hubel, D. and Wiesel, T. (2005). Brain and visual perception: the story of a 25-year collaboration. Oxford University Press US. p. 106.
  10. Jiang, H. (2008). Adaptive feature selection in pattern recognition and ultra-wideband radar signal analysis. PhD thesis, California Institute of Technology.
  11. Kwak, N. and Choi, C. (2002). Input feature selection by mutual information based on parzen window. IEEE Trans. Pattern Analysis and Machine Intel, 24:1667- 1671.
  12. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proc. of the IEEE, 86(11):2278-2324.
  13. Najemnik, J. and Geisler, W. S. (2005). Optimal eye movement strategies in visual search. Nature, 434:387-391.
  14. Narendra, P. and Fukunaga, K. (1977). A branch and bound algorithm for feature subset selection. IEEE Trans. Comput., 28(2):917-922.
  15. Parzen, E. (1962). sity and mode. 35:1065-1076.
  16. Raudys, S. J. and Jain, A. K. (1991). Small sample size effects in statistical pattern recognition: Recommendations for practitioners. IEEE Trans. on Pattern Analysis and Machine Intelligence, 13:252-264.
  17. Renninger, L. W., Verghese, P., and Coughlan, J. (2007). Where to look next? Eye movements reduce local uncertainty. Journal of Vision, 7(3):117.
  18. Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. Annals of Mathematical Statistics, 27:832-837.
  19. Scott, D. W. (1992). Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley. pp. 125- 206.
  20. Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall.
  21. Turlach, B. A. (1993). Bandwidth selection in kernel density estimation: A review. In CORE and Institut de Statistique, pages 23-493.
  22. Webb, A. (1999). Statisctical Pattern Recognition. Arnold, London. pp. 213-226.
  23. Zhang, X., King, M. L., and Hyndman, R. J. (2004). Bandwidth selection for multivariate kernel density estimation using MCMC. Technical report, Monash University.

Paper Citation

in Harvard Style

Avdiyenko L., Bertschinger N. and Jost J. (2012). Adaptive Sequential Feature Selection for Pattern Classification . In Proceedings of the 4th International Joint Conference on Computational Intelligence - Volume 1: NCTA, (IJCCI 2012) ISBN 978-989-8565-33-4, pages 474-482. DOI: 10.5220/0004146804740482

in Bibtex Style

author={Liliya Avdiyenko and Nils Bertschinger and Juergen Jost},
title={Adaptive Sequential Feature Selection for Pattern Classification},
booktitle={Proceedings of the 4th International Joint Conference on Computational Intelligence - Volume 1: NCTA, (IJCCI 2012)},

in EndNote Style

JO - Proceedings of the 4th International Joint Conference on Computational Intelligence - Volume 1: NCTA, (IJCCI 2012)
TI - Adaptive Sequential Feature Selection for Pattern Classification
SN - 978-989-8565-33-4
AU - Avdiyenko L.
AU - Bertschinger N.
AU - Jost J.
PY - 2012
SP - 474
EP - 482
DO - 10.5220/0004146804740482