Improving the Accuracy of Features Weighted k-Nearest Neighbor
using Distance Weight
K. U. Syaliman
1
, Ause Labellapansa
2
, Ana Yulianti
2
1
Department of Informatics, Politeknik Caltex Riau, Pekanbaru, Indonesia
2
Department of Informatics, Universitas Islam Riau, Pekanbaru, Indonesia
Keywords:
Accuracy, Distance Weight, FWk-NN, K-NN, Vote Majority
Abstract:
FWk-NN is an improvement of k-NN, where FWk-NN gives weight to each data feature thereby reducing the
influence of features that are less relevant to the target. Feature weighting is proven to be able to improve
the accuracy of k-NN. However, the FWK-NN still uses the majority vote system for class determination
to new data. Whereby the majority vote system is considered to have several weaknesses, it ignores the
similarity between data and the possibility of a double majority class. To overcome the issue of vote majority
at FWk-NN, the research will change the voting majority by using distance weight. This study uses a dataset
obtained from the UCI repository and a water quality data set. The data used from the UCI repository are iris,
ionosphere, hayes-Roth, and glass. Based on the tests carried out using UCI repository dataset it is proven that
FWk-NN using distance weight has averaged an increase about2%, with the highest increase of accuracy of
4.23% in the glass dataset. In water quality data, FWk-NN using distance weight can achieve an accuracy of
92.58% or has increased 2% from FWk-NN. From all the data tested, it is proven that the distance weight is
able to increase the accuracy of the FWk-NN with an average increase about 1.9%.
1 INTRODUCTION
k-Nearest Neighbor or commonly known as kNN is
one of the popular classification methods for dealing
with problems in the field of mining data, including
text categorization, pattern recognition, classification,
etc (Bhatia and Vandana, 2010; Jabbar et al., 2013;
Rui-Jia and Xing, 2014; S
´
anchez et al., 2016; Zheng
et al., 2017). This is because kNN has advantages
including simple methods, quite interesting, easy to
implement, intuitive, can be exploited in various
domains, and is quite efficient (Wang et al., 2007;
Garca-Pedrajas and Ortiz-Boyer, 2009; Ougiaroglou
and Evangelidis, 2012; Feng et al., 2016; Pan et al.,
2017; S
´
anchez et al., 2016; Song et al., 2017).
kNN still has weaknesses that make the results
of accuracy remain relatively low, even more so
when compared with other classification algorithms.
(Danades et al., 2016; Tamatjita and Mahasta, 2016).
The low accuracy value of kNN is caused by several
factors. One of them is because each feature
has the same effect on determining the similarity
between data. The solution is to give weight to
each data feature or commonly called Feature Weight
k-NN (Kuhkan, 2016; Duneja and Puyalnithi, 2017;
Nababan et al., 2018).
FWk-NN is proven to improve the accuracy of
the kNN method. It can be seen in the research
conducted by Duneja (2017) and Nababan, et al
(2018) which gives weights for each data feature
using the Gain Ratio. In determining the class for
new data, FWk-NN still adopts the votes system,
where the majority vote system ignores the similarity
between data, and another problem is the possible
emergence of a double majority class(Gou and Xiong,
2011; Yan et al., 2015; Syaliman et al., 2017).
The solution to the majority vote system problem
has been done by Mitani et al. (2006) . In this
research, it was proposed to make a method change
in class determination for new data, initially used the
voting majority to be exchanged using local mean,
so the class for new data is no longer based on
the majority class, but is determined based on the
similarity of the local mean vector.The results of this
research proved that the local mean was able to reduce
misclassification caused by the vote majority system.
Another solution to overcome the weaknesses
in the vote majority system is to use the method
proposed by Batista & Silva (2009). In this research
it is recommended to use a distance weight while to
326
Syaliman, K., Labellapansa, A. and Yulianti, A.
Improving the Accuracy of Features Weighted k-Nearest Neighbor using Distance Weight.
DOI: 10.5220/0009390903260330
In Proceedings of the Second International Conference on Science, Engineering and Technology (ICoSET 2019), pages 326-330
ISBN: 978-989-758-463-3
Copyright
c
2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved