K-NN: ESTIMATING AN ADEQUATE VALUE FOR PARAMETER K

Bruno Borsato, Alexandre Plastino, Luiz Merschmann

2008

Abstract

The k-NN (k Nearest Neighbours) classification technique is characterized by its simplicity and efficient performance on many databases. However, the good performance of this method relies on the choice of an appropriate value for the input parameter k. In this work, we propose methods to estimate an adequate value for parameter k for any given database. Experimental results have shown that, in terms of predictive accuracy, k-NN using the estimated value for k usually outperforms k-NN with the values commonly used for k, as well as well-known methods such as decision trees and naive Bayes classification.

References

  1. Aha, D. W. (1992). Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. International Journal of Man-Machine Studies, 36(2):267- 287.
  2. Angiulli, F. (2005). Fast condensed nearest neighbour rule. In Proceedings of the 22nd International Conference on Machine Learning, pages 25-32, Bonn, Germany.
  3. Borsato, B., Merschmann, L., and Plastino, A. (2006). Empregando a técnica de agrupamento na estimativa de um valor de k para o método k-nn. In Anais do II Workshop em Algoritmos e Aplicac¸o˜es de Minerac¸ a˜o de Dados, realizado em conjunto com o XXI Simpósio Brasileiro de Banco de Dados, pages 33-40.
  4. Duda, R. and Hart, P. (1973). Pattern Classification and Scene Analysis. John Wiley & Sons, New York.
  5. Duda, R., Hart, P., and Stork, D. (2000). Pattern classification. John Wiley & Sons.
  6. Fix, E. and Hodges, J. L. (1951). Discriminatory analysis, non-parametric discrimination: Consistency properties. Technical Report 21-49-004(4), USAF School of Aviation Medicine, Randolph Field, Texas.
  7. Guo, G., Wang, H., Bell, D., Bi, Y., and Greer, K. (2003). kNN model-based approach in classification. In Proceedings of CoopIS/DOA/ODBASE (LNCS 2888), pages 986-996.
  8. Han, J. and Kamber, M. (2005). Data Mining: Concepts and Techniques. Morgan Kaufmann.
  9. Haykin, S. (1994). Foundation. York. Neural Networks: A Comprehensive Macmillan Publishing Company, New
  10. Heckerman, D. (1997). Bayesian networks for data mining. Data Mining and Knowledge Discovery, 1(1):79-119.
  11. Johns, M. V. (1961). Studies in Item Analysis and Prediction. Stanford University Press, Palo Alto, CA.
  12. Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. WileyInterscience.
  13. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Symposium on Mathemtical Statistics and Probability, volume 1, pages 281-297.
  14. Newman, D. J., Hettich, S., Blake, C. L., and Merz, C. J. (1998). UCI repository of machine learning databases.
  15. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1):81-106.
  16. Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.
  17. Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer-Verlag, New York.
  18. Wang, H. (2003). Nearest neighbours without k: A classification formalism based on probability. Technical Report CS-03-02, Faculty of Informatics, University of Ulster, N.Ireland, UK.
  19. Wettschereck, D. and Dietterich, T. G. (1994). Locally adaptive nearest neighbor algorithms. In Cowan, J. D., Tesauro, G., and Alspector, J., editors, Advances in Neural Information Processing Systems, volume 6, pages 184-191. Morgan Kaufmann, San Mateo, CA.
  20. Witten, I. H. and Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, second edition.
Download


Paper Citation


in Harvard Style

Borsato B., Plastino A. and Merschmann L. (2008). K-NN: ESTIMATING AN ADEQUATE VALUE FOR PARAMETER K . In Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-8111-37-1, pages 459-466. DOI: 10.5220/0001686104590466


in Bibtex Style

@conference{iceis08,
author={Bruno Borsato and Alexandre Plastino and Luiz Merschmann},
title={K-NN: ESTIMATING AN ADEQUATE VALUE FOR PARAMETER K},
booktitle={Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2008},
pages={459-466},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001686104590466},
isbn={978-989-8111-37-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - K-NN: ESTIMATING AN ADEQUATE VALUE FOR PARAMETER K
SN - 978-989-8111-37-1
AU - Borsato B.
AU - Plastino A.
AU - Merschmann L.
PY - 2008
SP - 459
EP - 466
DO - 10.5220/0001686104590466