The Growing N-Gram Algorithm: A Novel Approach to String Clustering

Corrado Grappiolo, Eline Verwielen, Nils Noorman

Abstract

Connected high-tech systems allow the gathering of operational data at unprecedented volumes. A direct benefit of this is the possibility to extract usage models, that is, a generic representations of how such systems are used in their field of application. Usage models are extremely important, as they can help in understanding the discrepancies between how a system was designed to be used and how it is used in practice. We interpret usage modelling as an unsupervised learning task and present a novel algorithm, hereafter called Growing N-Grams (GNG), which relies on n-grams — arguably the most popular modelling technique for natural language processing — to cluster and model, in a two-step rationale, a dataset of strings. We empirically compare its performance against some other common techniques for string processing and clustering. The gathered results suggest that the GNG algorithm is a viable approach to usage modelling.

Download


Paper Citation


in Harvard Style

Grappiolo C., Verwielen E. and Noorman N. (2019). The Growing N-Gram Algorithm: A Novel Approach to String Clustering.In Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-351-3, pages 52-63. DOI: 10.5220/0007259200520063


in Bibtex Style

@conference{icpram19,
author={Corrado Grappiolo and Eline Verwielen and Nils Noorman},
title={The Growing N-Gram Algorithm: A Novel Approach to String Clustering},
booktitle={Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2019},
pages={52-63},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007259200520063},
isbn={978-989-758-351-3},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - The Growing N-Gram Algorithm: A Novel Approach to String Clustering
SN - 978-989-758-351-3
AU - Grappiolo C.
AU - Verwielen E.
AU - Noorman N.
PY - 2019
SP - 52
EP - 63
DO - 10.5220/0007259200520063