Information Granules Filtering for Inexact Sequential Pattern Mining by Evolutionary Computation
Enrico Maiorino, Francesca Possemato, Valerio Modugno, Antonello Rizzi
2014
Abstract
Nowadays, the wide development of techniques to communicate and store information of all kinds has raised the need to find new methods to analyze and interpret big quantities of data. One of the most important problems in sequential data analysis is frequent pattern mining, that consists in finding frequent subsequences (patterns) in a sequence database in order to highlight and to extract interesting knowledge from the data at hand. Usually real-world data is affected by several noise sources and this makes the analysis more hallenging, so that approximate pattern matching methods are required. A common procedure employed to identify recurrent patterns in noisy data is based on clustering algorithms relying on some edit distance between subsequences. When facing inexact mining problems, this plain approach can produce many spurious patterns due to multiple pattern matchings on the same sequence excerpt. In this paper we present a method to overcome this drawback by applying an optimization-based filter that identifies the most descriptive patterns among those found by the clustering process, able to return clusters more compact and easily interpretable. We evaluate the mining system’s performances using synthetic data with variable amounts of noise, showing that the algorithm performs well in synthesizing retrieved patterns with acceptable information loss.
References
- Agrawal, R. and Srikant, R. (1995). Mining sequential patterns. In Data Engineering, 1995. Proceedings of the Eleventh International Conference on, pages 3-14. IEEE.
- Bargiela, A. and Pedrycz, W. (2003). Granular computing: an introduction. Springer.
- Ji, X. and Bailey, J. (2007). An efficient technique for mining approximately frequent substring patterns. In Data mining workshops, 2007. ICDM workshops 2007. Seventh IEEE international conference on, pages 325-330. IEEE.
- Pavesi, G., Mereghetti, P., Mauri, G., and Pesole, G. (2004). Weeder web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic acids research, 32(suppl 2):W199-W203.
- Rizzi, A., Del Vescovo, G., Livi, L., and Frattale Mascioli, F. M. (2012). A New Granular Computing Approach for Sequences Representation and Classification. In Neural Networks (IJCNN), The 2012 International Joint Conference on, pages 2268-2275.
- Rizzi, A., Possemato, F., Livi, L., Sebastiani, A., Giuliani, A., and Mascioli, F. M. F. (2013). A dissimilaritybased classifier for generalized sequences by a granular computing approach. In IJCNN, pages 1-8. IEEE.
- Sinha, S. and Tompa, M. (2003). Ymf: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic acids research, 31(13):3586-3588.
- Yan, X., Han, J., and Afshar, R. (2003). Clospan: Mining closed sequential patterns in large datasets. In Proceedings of SIAM International Conference on Data Mining, pages 166-177. SIAM.
- Zaki, M. J. (2001). Spade: An efficient algorithm for mining frequent sequences. Machine learning, 42(1- 2):31-60.
- Zhu, F., Yan, X., Han, J., and Yu, P. S. (2007). Efficient discovery of frequent approximate sequential patterns. In Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on, pages 751-756. IEEE.
Paper Citation
in Harvard Style
Maiorino E., Possemato F., Modugno V. and Rizzi A. (2014). Information Granules Filtering for Inexact Sequential Pattern Mining by Evolutionary Computation . In Proceedings of the International Conference on Evolutionary Computation Theory and Applications - Volume 1: ECTA, (IJCCI 2014) ISBN 978-989-758-052-9, pages 104-111. DOI: 10.5220/0005124901040111
in Bibtex Style
@conference{ecta14,
author={Enrico Maiorino and Francesca Possemato and Valerio Modugno and Antonello Rizzi},
title={Information Granules Filtering for Inexact Sequential Pattern Mining by Evolutionary Computation},
booktitle={Proceedings of the International Conference on Evolutionary Computation Theory and Applications - Volume 1: ECTA, (IJCCI 2014)},
year={2014},
pages={104-111},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005124901040111},
isbn={978-989-758-052-9},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Evolutionary Computation Theory and Applications - Volume 1: ECTA, (IJCCI 2014)
TI - Information Granules Filtering for Inexact Sequential Pattern Mining by Evolutionary Computation
SN - 978-989-758-052-9
AU - Maiorino E.
AU - Possemato F.
AU - Modugno V.
AU - Rizzi A.
PY - 2014
SP - 104
EP - 111
DO - 10.5220/0005124901040111