according to the attributes and enables parallel pro-
cessing. The number of attributes in each partition
is decided by the division of the number of total at-
tributes and the number of partitions. The algorithm
is executed in parallel on the attributes in each parti-
tion and the best local attribute for split operation is
selected. Following this, the results of these parallel
computations are combined in order to select the best
global attribute to split, and to grow the tree (Kourtel-
lis et al., 2016).
3 RELATED WORK
In the literature of stream mining, there is a variety
of studies on stream classification including nearest
neighbor methods, decision tree based methods and
ensemble classifiers. One of such studies is VHT, in
which a vertical parallelism is applied on the features
of streaming data (Kourtellis et al., 2016). Another
one is MC-NN which is a data stream classifier based
on the statistical summary of data (Tennant et al.,
2017). Both of these methods, as described in Section
2, are used for comparison. In (Tennant et al., 2014),
the kNN is applied within sliding windows. We used
this method for accuracy comparison, as well.
Another windowing approach for stream learning
is PAW (Probabilistic Adaptive Window), which in-
cludes a mechanism to include older examples as well
as the most recent ones. Therefore, it is possible to
maintain information on past concept drifts while be-
ing able to adapt quickly to new ones (Bifet et al.,
2013).
Law and Zaniolo propose ANNCAD (Adaptive
Nearest Neighbor Classification Algorithm for Data
Streams), which is an incremental classification al-
gorithm using a multi-resolution data representation
to find adaptive nearest neighbors of a given data in-
stance. As the basic difference from the traditional
kNN method, instead of using a fixed number of
neighbors, they adaptively expand the neighborhood
area until the classification reaches a satisfactory level
(Law and Zaniolo, 2005).
ADWIN (ADaptive WINdowing) is a method of-
fered to maintain a window of variable size. In AD-
WIN2, the method is further improved for memory
usage and time efficiency. The authors further com-
bine ADWIN2 with the Naive Bayes classifier and
analyse their method using synthetic and real world
data sets (Bifet and Gavalda, 2007).
Stefanowski proposes a new data stream classi-
fier called the AUE2 (Accuracy Updated Ensemble).
The aim of this classifier is reacting equally well to
different types of drift. AUE2 combines accuracy-
based weighting mechanisms and Hoeffding Trees
(Brzezinski and Stefanowski, 2013).
Another ensemble classification algorithm is pro-
posed in (Chen et al., 2018) in order to deal with noise
and concept drift in streams. This algorithm is based
on attribute reduction and makes use of sliding win-
dow. It is aimed to reach a high performance in noisy
data streams with low computation complexity.
Fong et al. propose an improved version of VFDT
(Very Fast Decision Tree) that makes use of mis-
classified results for post-learning. Their approach
is called MR (Misclassified Recall) and it is a post-
processing step for relearning a new concept. They
apply their method on HAR (Human Activity Recog-
nition) dataset where most misclassified instances be-
long to ambiguous movements (Fong et al., 2017).
4 PROPOSED METHOD:
ENHANCEMENTS FOR
SLIDING WINDOW BASED
DATA STREAM CLASSIFIERS
In this work, we propose two enhancements for the
use of kNN on stream classification under sliding
window. The first one is called m-kNN (Mean Ex-
tended kNN), which utilizes traditional kNN with the
addition that one of the neighbors is chosen out of the
current window to reflect the past behavior. The sec-
ond one is called CSWB (Combined Sliding Window
Based) and it is a combination of m-kNN, kNN and
Naive Bayes.
4.1 m-kNN Classifier
In m-kNN, we apply kNN within sliding windows
with the difference from the traditional kNN that k-1
instances are selected within the window, whereas the
last instance is used as an average of the history. At
the beginning of the method we fill the current win-
dow with the most recent past instances. After that,
within the current window, by using Euclidean dis-
tance, k-1 nearest neighbors of the incoming instance
are found. Additionally, we also calculate centroids
of the classes by using the past instances. Hence, we
obtain class representatives from the history. Among
the class representatives, we determine the most sim-
ilar one, and this instance is used as the k
th
nearest
neighbor. As in the conventional kNN, the class la-
bel is determined with majority voting among these k
instances.
Assuming that we learn the actual class of the in-
stance in the next time instance, the representative of
Enhancements for Sliding Window based Stream Classification
183