4.2 Case Study 2
In this study, we use the dataset from Lichman’s
repository (Lichman, 2013) which contains personal
information of age, work class, final weight, edu-
cation, marital-status, occupation, relationship, race,
gender, capital gain, capital loss, working hours per
week, native country, and whether the salary is over
50K or not. The training dataset contains 32,561 data
elements which are used to classify the testing dataset
of 16,281 data elements. The B-kNN algorithm is
used to predict whether the person in a testing data
makes over 50K in salary. There is no missing data in
both the training and testing datasets.
Table 5 shows the confusion matrix of applying
the traditional kNN algorithm to the original training
dataset. The table shows that 10,625 instances are
correctly predicted as the person’s salary over 50K,
while 594 instances are falsely predicted. On the
other hand, 1,165 instances are falsely predicted as
the person’s salary under 50K, while 3,897 instances
are correctly predicted.
Table 5: Confusion Matrix for kNN.
Salary ↑ 50K (Pred.) ↓ 50K (Pred.)
↑ 50K(Act.) 10,625 1,165
↓ 50K (Act.) 594 3,897
Table 6 shows the accuracy, precision, recall, F1,
response time (in second) of the kNN algorithm on
the confusion matrices in Table 5.
Table 6: Results of kNN.
Alg. Acc. Prec. Rec. F1 Time
kNN 0.892 0.947 0.901 0.924 26.797
Table 7 shows the confusion matrix of applying
the B-kNN algorithm to the MMP and BS of the train-
ing dataset.
Table 7: Confusion Matrix for B-kNN.
Salary ↑ 50K (Pred.) ↓ 50K (Pred.)
↑ 50K(Act.) 10,563 1,083
↓ 50K (Act.) 656 3,979
Table 8 shows the results of the B-kNN algorithm
on the confusion matrix in Table 7. Similar to the
observation made in Table 4, we can observe that the
accuracy, precision, recall, and F1 are very close to
those of the kNN algorithm in Table 6. However, the
response time is significantly improved by 99.3%.
Table 8: Results of B-kNN.
Alg. Acc. Prec. Rec. F1 Time
B-kNN 0.893 0.942 0.907 0.924 0.194
Improv. +0.1% -0.5% +0.6% 0.0% +99.3%
4.3 Discussion
As seen in the two case studies, the B-kNN algorithm
gives a similar performance on accuracy, precision,
recall, and F1. However, it dramatically improves re-
sponse time by 97% in average. This becomes more
significant for larger datasets. We used the value 1 for
k in the kNN algorithm in the case studies. We also
experimented higher values for k up to 7. In Case
Study 1, the accuracy and response time of kNN us-
ing the original training dataset are slightly improved
(less than 1%), while B-kNN remains more or less
the same. In Case Study 2, the accuracy of kNN de-
creases about 11 % at k=2 and becomes stabilizes
since then. Similarly, the response time of kNN in-
creases about 35% in average since k=2. However,
the accuracy and the response time of B-kNN remains
stable as in Case Study 1. The different results in the
two studies are attributed to the size of the training
dataset of Case Study 2 being much larger than that
in Case Study 1. The experiments show that B-kNN
gives stable results for higher k values compared to
kNN using the original training dataset. This benefit
becomes significant for larger datasets as shown in the
two case studies.
5 CONCLUSION
We have presented the B-kNN algorithm to improve
the efficiency of the traditional kNN, while maintain-
ing similar performance on accuracy, precision, re-
call, and F1. The improvement is attributed to the
two-fold preprocessing scheme using the MMP and
BS of training datasets. B-kNN also addresses the de-
fect of uneven distributions of training samples which
may cause the multi-peak effect by updating the BS
as a new training sample is added. The two case stud-
ies presented in this paper validate the B-kNN algo-
rithm by demonstrating its improvement on efficiency
over the kNN algorithm. The results show a signifi-
cant enhancement in efficiency with little sacrifice of
accuracy compared to the traditional kNN algorithm.
In the future work, we plan to apply the B-kNN algo-
rithm to self-adaptive systems in the robotic domain
to enhance response time which is crucial for self-
adaptive operations.
B-kNN to Improve the Efficiency of kNN
131