was replaced with:
S
0
i
← Set of subsequences in P
0
i
of length omega
In the third case, both lines 3 and 4 in Algorithm 1
were replaced with:
S
0
i
← Set of subsequences in P
i
of length omega
The results obtained using the best performing pa-
rameters are given in Table 3. The accuracy results
are average results obtained using TCV. Considering
silent gap removal (SGR) first, the first two rows in
the table, it can be seen that SGR has a slight adverse
effect on accuracy. It did improve runtime although
this is not obvious from this table because different ω
and max values produced the best results (recall that
low ω and max values result in efficiency gains be-
cause they entail less calculation). Candidate frequent
motif generation (CFMG), on the other hand, had a
positive effect on accuracy and resulted in significant
speed up (although again it should be noted that the
results reported in Table 3 were obtained using differ-
ent ω values). When the two are run together, as in the
case of the earlier experiments, accuracy was slightly
reduced, because of the negative effect of SGR, but
runtime is enhanced considerably.
Table 3: The best classification accuracy for finding fre-
quent motifs with different preprocessing techniques.
Preproc. Tech. Attributes Results
SGR CFMG
Meth-
ω max k
NNC
Acc.
Runtime
od (Sec.)
X X HV 300 20 3 0.721 8.23
X X HV 100 60 1 0.714 16.44
X X SD 300 20 3 0.736 3.69
X X SD 100 20 3 0.730 0.46
6 CONCLUSIONS
An approach to Frequent Motif Discovery, applica-
ble to PCG time series, has been proposed. The pro-
posed method addresses the challenge of finding dis-
criminative motifs in long time series by proposing
two pruning mechanisms: (i) silent gap removal and
(ii) candidate frequent motif generation. The moti-
vation for the first was that little useful information
could be extracted from “silent gaps”. The second
mechanism featured a novel way of clustering sub-
sequences, without comparing all subsequences with
all other subsequences, to identify the most frequently
occurring subsequences. The performance of the pro-
posed approaches was ascertained in the context of
runtime and the quality of the motifs identified; the
latter analysed in terms of a classification scenario.
The results indicated a classification accuracy compa-
rable with other motif-based approaches but offering
significant runtime advantages.
REFERENCES
Agarwal, P., Shroff, G., Saikia, S., and Khan, Z. (2015).
Efficiently discovering frequent motifs in large-scale
sensor data. In Proceedings of the Second ACM IKDD
Conference on Data Sciences (CoDS’15), pages 98–
103.
Alhijailan, H., Coenen, F., Dukes-McEwan, J., and Thiya-
galingam, J. (2018). Segmenting sound waves to
support phonocardiogram analysis: The pcgseg ap-
proach. In Geng, X. and Kang, B.-H., editors, PRICAI
2018: Trends in Artificial Intelligence, pages 100–
112, Cham. Springer International Publishing.
Dasarathy, B. V. (1991). Nearest Neighbor (NN) Norms:
NN Pattern Classification Techniques. IEEE Com-
puter Society Press tutorial. IEEE Computer Society
Press.
Dau, H. A. and Keogh, E. (2017). Matrix profile v: A
generic technique to incorporate domain knowledge
into motif discovery. In Proceedings of the 23rd
ACM SIGKDD International Conference on Knowl-
edge Discovery and Data Mining, KDD ’17, pages
125–134, NY, USA. ACM.
Gao, Y., Lin, J., and Rangwala, H. (2017). Iterative
grammar-based framework for discovering variable-
length time series motifs. In IEEE International Con-
ference on Data Mining, pages 111–116. IEEE.
Keogh, E. J. and Pazzani, M. J. (2001). Derivative dynamic
time warping. In Proceedings of the 2001 SIAM Inter-
national Conference on Data Mining, pages 1–11.
Krejci, A., Hupp, T. R., Lexa, M., Vojtesek, B., and
Muller, P. (2016). Hammock: a hidden markov model-
based peptide clustering algorithm to identify protein-
interaction consensus motifs in large datasets. Bioin-
formatics, 32(1):9–16.
Mueen, A., Keogh, E., Zhu, Q., Cash, S., and Westover, B.
(2009). Exact discovery of time series motifs. In Pro-
ceedings of the 2009 SIAM International Conference
on Data Mining, pages 473–484.
Ram
´
ırez, J., Segura, J., Ben
´
ıtez, C.,
´
Angel Torre, and Ru-
bio, A. (2004). Efficient voice activity detection al-
gorithms using long-term speech information. Speech
Communication, 42(3):271 – 287.
Sohn, J., Kim, N. S., and Sung, W. (1999). A statistical
model-based voice activity detection. IEEE Signal
Processing Letters, 6(1):1 – 3.
Wu, C. and Chau, K. (2010). Data-driven models for
monthly streamflow time series prediction. Engineer-
ing Applications of Artificial Intelligence, 23(8):1350
– 1367.
Yang, X., Tan, B., Ding, J., Zhang, J., and Gong, J.
(2010). Comparative study on voice activity detection
algorithm. In Proceedings of the 2010 International
Conference on Electrical and Control Engineering,
ICECE ’10, pages 599–602, Washington, DC, USA.
IEEE Computer Society.
Effective Frequent Motif Discovery for Long Time Series Classification: A Study using Phonocardiogram
273