
Weighted k Nearest Neighbor (IPW-k-NN). Three
distance functions have been used in our experiments:
Euclidean, Camberra and Chebychev.
We have shown that improvements over classic k-NN
are achieved for some values of K
p
, K
c
, α and β, as
well that a characterization of such values is possible
using C4.5 and Euclidean distance.
Further research involves a more straightforward
characterization of the values of parameters for which
improvement holds.
An extension of the presented approach is to select
among the feature subset that better performance
presents regarding to classification. A Feature Subset
Selection (Inza et al., 2000; Sierra et al., 2001)
technique could be applied in order to select which
of the predictor variables should be used. This could
take advantage in the classifier execution process, as
well as in the accuracy. A combination with another
paradigms to improve the accuracy of each of them
(Dietterich, 1997; Lazkano and Sierra, 2003) will
also be experimented.
Experiments with different values of α, corre-
sponding to different levels of neighbourhood β,
might also be another line of research.
ACKNOWLEDGMENTS
This work is supported by the University of the
Basque Country.
REFERENCES
Blake, B. and Mertz, C. (1998). Uci repository of machine
learning databases.
Clark, P. and Nibblet, T. (1989). The cn2 induction algo-
rithm. Machine Learning, 3(4):261–283.
Cover, T. and Hart, P. (1967). Nearest neighbour pattern
classification. IEEE Transactions on Information The-
ory, 13(1):21–27.
Cowell, R. G., Dawid, A. P., Lauritzen, S., and Spiegelhar-
ter, D. J. (1999). Probabilistic Networks and Expert
Systems. Springer.
Dasarathy, B. (1991). Nearest Neighbor (NN) Norms: NN
Pattern Recognition Classification Techniques. IEEE
Computer Society Press.
Dietterich, T. G. (1997). Machine learning research: four
current directions. AI Magazine, 18(4):97–136.
Inza, I., Larra
˜
naga, P., Etxeberria, R., and Sierra, B. (2000).
Feature Subset Selection by Bayesian network-based
optimization. Artificial Intelligence, 123(1-2):157–
184.
Kohavi, R. (1996). Scaling up the accuracy of naive-bayes
classifiers: a decision-tree hybrid. In Proceedings of
the Second International Conference on Knowledge
Discovery and Data Mining, pages 147–149.
Lazkano, E. and Sierra, B. (2003). Bayes-nearest: a new
hybrid classifier combining bayesian network and dis-
tance based algorithms. In Proceedings of the EPIA
2003 Conference. Lecture Notes on Computer Sci-
ence. Springer-Verlag.
Michalsky, R., Stepp, R., and Diday, E. (1981). A Re-
cent Advance in Data Analysis: Clustering Objects
into Classes Characterized by Conjunctive Concepts,
pages 33–56. North-Holland.
Mitchell, T. (1997). Machine Learning. McGraw-Hill.
Quinlan, J. (1986). Induction of decision trees. Machine
Learning, 1(1):81–106.
Quinlan, J. (1993). C4.5: Programs for Machine Learning.
Morgan Kaufmann Publishers, Inc.
Sierra, B. and Larra
˜
naga, P. (1998). Predicting survival
in malignant skin melanoma using bayesian networks
automatically induced by genetic algorithms. an em-
pirical comparision between different approaches. Ar-
tificial Intelligence in Medicine, 14:215–230.
Sierra, B. and Lazkano, E. (2002). Probabilistic-weighted k
nearest neighbor algorithm: a new approach for gene
expression based classification. In KES’2002, Sixth
International Conference on Knowledge-Based Intel-
ligent Information Engineering Systems, pages 932–
939. IOS Press.
Sierra, B., Lazkano, E., Inza, I., Merino, M., Larra
˜
naga,
P., and Quiroga., J. (2001). Prototype Selection and
Feature Subset Selection by Estimation of Distribu-
tion Algorithms. A case Study in the survival of cir-
rhotic patients treated with TIPS. In Proceedings of
the Eighth Artificial Intelligence in Medicine in Eu-
rope. Lecture Notes on Artificial Intelligence, pages
20–29. Springer-Verlag.
Stone, M. (1974). Cross-validation choice and assesment of
statistical predictions. Journal of the Royal Statistic
Society, 36:111–147.
Wilcoxon, F. (1945). Individual comparisons by ranking
methods. Biometrics, 1:80–83.
ANALYSIS OF THE ITERATED PROBABILISTIC WEIGHTED K NEAREST NEIGHBOR METHOD, A NEW
DISTANCE-BASED ALGORITHM
239