computed given the models, hence by avoiding any
form of data pre-processing. We conducted an em-
pirical evaluation of our approach based on a dataset
representing sub-sequences of the Bolus chase pro-
cedure, a technique in peripheral arteriography, per-
formed on Philips Healthcare’s Image Guided Ther-
apy interventional systems. The gathered results sug-
gest that Growing N-Grams can return cluster com-
positions with lower entropy scores of classification,
which indicate the algorithm’s ability to retrieve dif-
ferent usage behaviours.
Growing N-Grams appears to be more prone to
self-adaptation than other algorithms considered in
our study, which also implies it could be better fit for
our ultimate goal. We also highlighted a plethora of
future work which would ultimately lead to the defi-
nition of a fully adaptive algorithm.
ACKNOWLEDGEMENTS
The research is carried out as part of the ITEA3 14035
REFLEXION project under the responsibility of the
Embedded Systems Innovation by TNO (ESI) with
Royal Philips as carrying industrial partner. The RE-
FLEXION research is supported by the Netherlands
Organisation for Applied Scientific Research TNO
and Netherlands Ministry of Economic Affairs.
REFERENCES
Ackroyd, M. (1980). Isolated word recognition using the
weighted levenshtein distance. IEEE Transactions on
Acoustics, Speech, and Signal Processing, 28(2):243–
244.
Aggarwal, C. C. and Zhai, C. (2012). A survey of text clus-
tering algorithms. In Mining text data, pages 77–128.
Springer.
Anderberg, M. R. (1973). Cluster analysis for applica-
tions. Technical report, Office of the Assistant for
Study Support Kirtland AFB N MEX.
Andrews, N. O. and Fox, E. A. (2007). Recent develop-
ments in document clustering.
Balabantaray, R. C., Sarma, C., and Jha, M. (2015). Docu-
ment clustering using k-means and k-medoids. arXiv
preprint arXiv:1502.07938.
Bengio, Y., Ducharme, R., Vincent, P., and Jauvin, C.
(2003). A neural probabilistic language model. Jour-
nal of machine learning research, 3(Feb):1137–1155.
Bishop, C., Bishop, C. M., et al. (1995). Neural networks
for pattern recognition. Oxford university press.
Brown, P. F., Desouza, P. V., Mercer, R. L., Pietra, V.
J. D., and Lai, J. C. (1992). Class-based n-gram mod-
els of natural language. Computational linguistics,
18(4):467–479.
Buchta, C., Kober, M., Feinerer, I., and Hornik, K. (2012).
Spherical k-means clustering. Journal of Statistical
Software, 50(10):1–22.
Bungum, L. and Gamb
¨
ack, B. (2012). Efficient ngram lan-
guage modeling for billion word webcorpora. In Pro-
ceedings of the 8th International Conference on Lan-
guage Resources and Evaluation, pages 6–12.
Chen, S. F. and Goodman, J. (1999). An empirical study of
smoothing techniques for language modeling. Com-
puter Speech & Language, 13(4):359–394.
Cutting, D. R., Karger, D. R., Pedersen, J. O., and Tukey,
J. W. (2017). Scatter/gather: A cluster-based approach
to browsing large document collections. In ACM SI-
GIR Forum, volume 51, pages 148–159. ACM.
da Cruz Nassif, L. F. and Hruschka, E. R. (2013). Docu-
ment clustering for forensic analysis: an approach for
improving computer inspection. IEEE transactions on
information forensics and security, 8(1):46–54.
Dai, A. M., Olah, C., and Le, Q. V. (2015). Document
embedding with paragraph vectors. arXiv preprint
arXiv:1507.07998.
Dhillon, I. S. and Modha, D. S. (2001). Concept decom-
positions for large sparse text data using clustering.
Machine learning, 42(1-2):143–175.
Duda, R. O. and Hart, P. E. (1973). Pattern classification
and scene analysis. A Wiley-Interscience Publication,
New York: Wiley, 1973.
Ismail, S. and Rahman, M. S. (2014). Bangla word clus-
tering based on n-gram language model. In Electri-
cal Engineering and Information & Communication
Technology (ICEEICT), 2014 International Confer-
ence on, pages 1–5. IEEE.
Jain, A. K. and Dubes, R. C. (1988). Algorithms for Clus-
tering Data. Prentice-Hall, Inc., Upper Saddle River,
NJ, USA.
Jaro, M. A. (1989). Advances in record-linkage methodol-
ogy as applied to matching the 1985 census of tampa,
florida. Journal of the American Statistical Associa-
tion, 84(406):414–420.
Jurafsky, D. and Martin, J. H. (2014). Speech and language
processing, volume 3. Pearson London.
Ke
ˇ
selj, V., Peng, F., Cercone, N., and Thomas, C. (2003).
N-gram-based author profiles for authorship attribu-
tion. In Proceedings of the conference pacific asso-
ciation for computational linguistics, PACLING, vol-
ume 3, pages 255–264.
Kim, Y. (2014). Convolutional neural networks for sentence
classification. arXiv preprint arXiv:1408.5882.
Kiros, R., Zhu, Y., Salakhutdinov, R. R., Zemel, R., Ur-
tasun, R., Torralba, A., and Fidler, S. (2015). Skip-
thought vectors. In Advances in neural information
processing systems, pages 3294–3302.
Kneser, R. and Ney, H. (1995). Improved backing-off for
m-gram language modeling. In icassp, volume 1, page
181e4.
Lau, J. H. and Baldwin, T. (2016). An empirical evalua-
tion of doc2vec with practical insights into document
embedding generation. CoRR, abs/1607.05368.
ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods
62