ST2 with
0.186. Both used ∆
0.005.
The evaluation of the model were performed at
each 3 thousands examples.
4.3 Experiments Results
The obtained results are summarized in Table 1,
using the mean accuracy percentage and execution
time of each experiment. Figure 4 shows the
percentage of correct classifications and tree size
according to the number of examples.
Table 1: Experiments results.
Datasets
Mean Accuracy / % Execution Time / s
VFDT ST1 ST2 VFDT ST1 ST2
Hyperplane 91.29 91.35 91.04 56.01 39.86 40.41
RBF 75.45 76.24 77.88 44.94 39.83 46.66
Skin Seg. 98.80 99.68 99.24 0.57 0.57 0.55
Electricity 77.44 78.91 78.53 0.25 0.40 0.39
As we can observe in Table 1, using the Hyperplane
dataset ST1 has obtained the best mean accuracy and
execution time, with 91.35% in 39.86 seconds,
followed respectively by VFDT with 91.29% in
56.01 seconds and ST2 with 91.04% in 40.41
seconds. In Figure 4 (a) it is possible to observe that
the algorithms had similar accuracy variations. ST2
finished the classification process with the best final
accuracy, 92.6%, and the smallest tree, with 503
nodes, according to Figure 4 (b). VFDT reached
92.1% of accuracy with 6,637 nodes and ST1
achieved 91.6% with 861 nodes. ST handled with
noise better than VFDT, achieving the highest mean
and final accuracy, building the smallest tree and
obtaining the best execution time.
According to Table 1, in the experiment using
the Random RBF dataset, ST obtained the best mean
accuracy in both configurations of ST. ST2 has
obtained the best mean accuracy with 77.88% in
46.66 seconds. The second higher accuracy has been
achieved by ST1 with 76.24% with accuracy in
39.83 seconds (the best execution time) and VFDT
has obtained 75.45% of mean accuracy in 44.94
seconds (the second best execution time). Figure 4
(c) shows that ST2 had the best accuracy in almost
all the time, finalizing the classification with 77.8%,
but with a bigger tree, as it is possible to see in
Figure 4 (d), with 9,735 nodes. The final accuracy of
ST1 and VFDT were, respectively, 77.7% with
2,391 nodes and 76.9% with 1,593 nodes. In general,
using Random RBF dataset, ST achieved the best
accuracy with a good execution time, in comparison
with VFDT. Although ST2 obtained the best
accuracy variation, it has created the biggest tree.
Using the Skin Segmentation dataset, ST also
obtained the best mean accuracy in both
configurations of ST, as it is possible to observe in
Table 1. ST1 achieved the best mean accuracy,
99.68% in 0.57 seconds, followed by ST2 with
99.24% of accuracy in 0.55 seconds, the best
execution time, and VFDT with 98.8% of accuracy
in 0.57 seconds.
Figure 4 (e) and (f) shows that ST1 had the best
accuracy during all the classification process and
generated the larger tree, finishing with 99.9% of
accuracy with 189 nodes. Although VFDT and ST2
achieved lower accuracies, they generated the
smaller trees with, respectively, 99.2% of accuracy
with 95 nodes and 98.9% of accuracy with 127
nodes.
As it is possible to observe in Table 1, using the
Electricity dataset ST2 and ST1 achieved the best
mean accuracies, with 78.91% of accuracy in 0.4
seconds and 78.53% of accuracy in 0.39 seconds,
respectively. VFDT has obtained 77.44% of mean
accuracy in 0.25 seconds, the smallest execution
time.
According to Figure 4 (g) and (h) all algorithms
obtained a similar variation of accuracy, while
VFDT produced the smallest tree. VFDT, ST1 and
ST2 achieved as final accuracy 77.8% with 47
nodes, 81.7% with 313 nodes and 80.2% with 309
nodes, respectively.
As it is possible to observe in Figure 4, in the
first examples processed, ST described the data first
than VFDT, which needed more examples to
improve its accuracy.
Although in three experiments ST constructed
bigger trees, the execution time obtained was close
(when not lower) in comparison to VFDT.
According the obtained results and the
configurations used, it is possible to conclude that
combining different values for the ST parameters
∆μ
and σ
, the algorithm can achieve a better
accuracy result, but sometimes creating a bigger
tree. Thus, the user can modify the parameters
values in order to achieve a higher accuracy
according to the total of data and memory available.
AStatisticalDecisionTreeAlgorithmforDataStreamClassification
221