the two classes in the pair (Mitzev and Younan,
2015). The training starts with (N-3) random
sequences, where N is the length of the shortest time
series from the dataset. The lengths of these random
sequences are different. All present sequences are
considered candidate shapelet. On every iteration of
the PSO algorithm, the values of the random
sequences are changed in a way to improve the
information gain (which measures the separation
between the two classes). The initial proposal from
(Mitzev and Younan, 2015) suggested using N-3
random sequences, but our tests showed that
decreasing the number of competing sequences does
not influence significantly the accuracy. Thus, the
number of competing candidate shapelets was
reduced to 20. That saves processing time and
decreases the overall training time. Pseudo code
from Algorithm 1 gives detailed picture of the
process. The changes of each candidate’s values are
dictated by the cognitive constants C1 and C2, the
inertia weight constant W, and the randomness of the
process is maintained by R1, R2 random values
(lines 11-15). The function CheckCandidate (line
21) checks the fitness of the current candidate
shapelet and maintains the candidate’s best
information gain. The iteration process stops when
the best gain from the current iteration is not
significantly better than the previously found best
information gain (line 29). The class labels pairs
along with corresponding shapelets form the nodes
of the decision tree for a given combination.
The final step of the training process is building a
decision pattern for every time series from the train
dataset. The time series from the train dataset is
classified by the present decision trees. One decision
tree produces a decision path during this
classification, adding character “R” to the decision
path if the process takes the right tree branch and
character “L” respectively if the process takes the
left branch (Fig. 1). The decision paths from all
present trees are concatenated in order to produce
the decision pattern (Fig. 2). It appears that time
series from the same class have similar decision
patterns, but significantly differ from the decision
patterns of the rest of the classes. The decision
patterns for all the time series from the train dataset
are kept and used for classification of the incoming
time series from the test dataset.
3.1.2 Classification
The incoming time series from the test dataset that is
about to be classified also produces decision pattern.
This decision pattern is compared with the kept
decision patterns from the training process. The two
decision pattern strings are compared character by
character- by value and place (Fig. 3). The
comparison of the decision pattern is qualified with
a comparison coefficient. The comparison
coefficient is equal to the number of the characters
that coincide by place and value- divided by the
number of all characters from the decision pattern.
The incoming time series is associated with the class
to which it has most similar decision pattern
(defined by the highest comparison coefficient).
3.2 CDP Method Extension for
Datasets with Less Class Labels
The original algorithm, as specified by (Mitzev and
Younan, 2016), limits the number of combinations
into the subset. In case of only two classes, there
will be only one such combination. In the case of
“Gun_point” this combination is {1, 2}. Testing that
decision tree with test time series from the
“Gun_point” dataset produces 67.33% of accuracy.
Our research confirmed that on every run the PSO
algorithm produces different shapelet and an optimal
split distance associated with the pair {1, 2}. That is
based on the fact that the initial candidates are
randomly generated and on every trial they will be
different. Thus, even if the decision trees have the
same indexes they have different decision
conditions. The different decision conditions give
different viewpoint that contributes to a new
decision path to the decision pattern. Table 1
illustrates the concept of using the same indexes
decision tree with different decision conditions for
the “Gun_point” dataset. Table 1 shows three
scenarios- with one, two, and three decision trees.
As shown, every presented decision tree node has a
different shapelet and split distance. Increasing the
pattern length from 1 up to 3 for this particular case
increases the overall accuracy by almost 10%.
Experiments with other datasets confirmed that the
accuracy increases when the CDP re-trains and
combines paths from the same-indexes decision
trees. Increasing the pattern length leads to a higher
accuracy, but there is a certain plateau achieved after
certain pattern lengths. The reuse of the same-
indexes trees may also be applied to datasets with
more than 5 class indexes, but the goal of this work
is to overcome the initial limits of the CDP method
and show that it is applicable for every dataset.
ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods