The Critical Feature Dimension and Critical Sampling Problems

Bernardete M. Ribeiro, Andrew Sung, Divya Suryakumar, Ram Basnet

Abstract

Efficacious data mining methods are critical for knowledge discovery in various applications in the era of big data. Two issues of immediate concern in big data analytic tasks are how to select a critical subset of features and how to select a critical subset of data points for sampling. This position paper presents ongoing research by the authors that suggests: 1. the critical feature dimension problem is theoretically intractable, but simple heuristic methods may well be sufficient for practical purposes; 2. there are big data analytic problems where the success of data mining depends more on the critical feature dimension than the specific features selected, thus a random selection of the features based on the dataset’s critical feature dimension will prove sufficient; and 3. The problem of critical sampling has the same intractable complexity as critical feature dimension, but again simple heuristic methods may well be practicable in most applications.

References

  1. Blum, A., Langley, P., 1997. Selection of relevant features and examples in machine learning, Artificial Intelligence, vol. 97, pp.1-2.
  2. Domingo, C., Gavaldà, R. and Watanabe, O. 2002. Adaptive sampling methods for scaling up knowledge discovery algorithms, Data Mining and Knoledge Discovery, Kluwer Academic Publishers, Vol. 6 No. 2, pp.131-152, 2002.
  3. Frank, A., Asuncion, A., 2013. UCI Machine Learning Repository, School of Information and Computer Science, University of California, Irvine, http://archive.ics.uci.edu/ml.
  4. Garey, M.R., Johnson, D.S., 1979. Computers and Intractability: A Guide to the Theory of NPCompleteness, W. H. Freeman and Compnay.
  5. Guyon, I., Elisseeff, A., 2003. An Introduction to Variable and Feature Selection, Journal of Machine Learning Research, Vol 3, pp.1157-1182.
  6. National Research Council, 2013. Frontiers in Massive Data Analysis, The National Academies Press.
  7. Papadimitriou, C.H.,Yannakakis, M., 1984. The complexity of facets (and some facets of complexity), Journal of Computer and System Sciences 28:244-259.
  8. Provost, F., Jensen, D. and Oates, T. 1999. Efficient Progressive Sampling. Proceeding of the Fifth International Conference on Knowledge Discovery and Data Mining, ACM KDD-99, pp.23-32.
  9. Suryakumar, D., 2013. The Critical Dimension Problem - No Compromise Feature Selection, Ph.D. Dissertation, New Mexico Institute of Mining and Technology.
Download


Paper Citation


in Harvard Style

Ribeiro B., Sung A., Suryakumar D. and Basnet R. (2015). The Critical Feature Dimension and Critical Sampling Problems . In Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-076-5, pages 360-366. DOI: 10.5220/0005282403600366


in Bibtex Style

@conference{icpram15,
author={Bernardete M. Ribeiro and Andrew Sung and Divya Suryakumar and Ram Basnet},
title={The Critical Feature Dimension and Critical Sampling Problems},
booktitle={Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2015},
pages={360-366},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005282403600366},
isbn={978-989-758-076-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - The Critical Feature Dimension and Critical Sampling Problems
SN - 978-989-758-076-5
AU - Ribeiro B.
AU - Sung A.
AU - Suryakumar D.
AU - Basnet R.
PY - 2015
SP - 360
EP - 366
DO - 10.5220/0005282403600366