A Pitfall in Determining the Optimal Feature Subset Size

Juha Reunanen

Abstract

Feature selection researchers often encounter a peaking phenomenon: a feature subset can be found that is smaller but still enables building a more accurate classifier than the full set of all the candidate features. However, the present study shows that this peak may often be just an artifact due to the still too common mistake in pattern recognition — that of not using an independent test set.

References

  1. I. Guyon and A. Elisseeff. An introduction to variable and feature selection. J. Mach. Learn. Res., 3:1157-1182, 2003.
  2. M. Kudo and J. Sklansky. Comparison of algorithms that select features for pattern classiefirs. Pattern Recognition, 33(1):25-41, 2000.
  3. M. Egmont-Petersen, W.R.M. Dassen, and J.H.C. Reiber. Sequential selection of discrete features for neural networks - a Bayesian approach to building a cascade. Patt. Recog. Lett., 20(11-13):1439-1448, 1999.
  4. P. Pudil, J. Novovcio?vá, and J. Kittler. Floating search methods in feature selection. Patt. Recog. Lett., 15(11):1119-1125, 1994.
  5. R. Kohavi and G. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1-2):273-324, 1997.
  6. D. Charlet and D. Jouvet. Optimizing feature set for speaker vericfiation. Patt. Recog. Lett., 18(9):873-879, 1997.
  7. P. Somol and P. Pudil. Oscillating search algorithms for feature selection. In Proc. ICPR'2000, pages 406-409, Barcelona, Spain, 2000.
  8. R. Bellman. Adaptive Control Processes: A Guided Tour. Princeton University Press, 1961.
  9. P.A. Devijver and J. Kittler. Pattern Recognition: A Statistical Approach. Prentice-Hall International, 1982.
  10. G.V. Trunk. A problem of dimensionality: A simple example. IEEE Trans. Pattern Anal. Mach. Intell., 1(3):306-307, 1979.
  11. A.W. Whitney. A direct method of nonparametric measurement selection. IEEE Trans. Computers, 20(9):1100-1103, 1971.
  12. P. Somol, P. Pudil, J. Novovcio?vá, and P. Pa ckil . Adaptive oflating search methods in feature selection. Patt. Recog. Lett., 20(11-13):1157-1163, 1999.
  13. J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
  14. C.M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995.
  15. M. Riedmiller and H. Braun. A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In Proc. ICNN93, pages 586-591, San Francisco, CA, USA, 1993.
  16. G.H. John, R. Kohavi, and K. Peflger. Irrelevant features and the subset selection problem. In Proc. ICML-94, pages 121-129, New Brunswick, NJ, USA, 1994.
  17. P. Perner and C. Apet. Empirical evaluation of feature subset selection based on a real world data set. In Proc. PKDD-2000 (LNAI 1910), pages 575-580, Lyon, France, 2000.
  18. A.F. Frangi, M. Egmont-Petersen, W.J. Niessen, J.H.C. Reiber, and M.A. Viergever. Bone tumor segmentation from MR perfusion images with neural networks using multi-scale pharmacokinetic features. Image and Vision Computing, 19(9-10):679-690, 2001.
Download


Paper Citation


in Harvard Style

Reunanen J. (2004). A Pitfall in Determining the Optimal Feature Subset Size . In Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2004) ISBN 972-8865-01-5, pages 176-185. DOI: 10.5220/0002650001760185


in Bibtex Style

@conference{pris04,
author={Juha Reunanen},
title={A Pitfall in Determining the Optimal Feature Subset Size},
booktitle={Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2004)},
year={2004},
pages={176-185},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002650001760185},
isbn={972-8865-01-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2004)
TI - A Pitfall in Determining the Optimal Feature Subset Size
SN - 972-8865-01-5
AU - Reunanen J.
PY - 2004
SP - 176
EP - 185
DO - 10.5220/0002650001760185