MINING SEQUENTIAL PATTERNS WITH REGULAR EXPRESSION CONSTRAINTS USING SEQUENTIAL PATTERN TREE

Meer Hamza, Khaled Mahar, Mohamed Younis

Abstract

The significant growth of sequence database sizes in recent years increase the importance of developing new techniques for data organization and query processing. Discovering sequential patterns is an important problem in data mining with a host of application domains. For effectiveness and efficiency consideration, constraints are essential for many sequential applications. In this paper, we give a brief review of different sequential pattern mining algorithms, and then introduce a new algorithm (termed NewSPIRIT) for mining frequent sequential patterns that satisfy user specified regular expression constraints. The general idea of our algorithm is to use a finite state automata to represent the regular expression constraints and build a sequential pattern tree that represents all sequences of data which satisfy this constraints by scanning the database of sequences only once. Experimental results shows that our NewSPIRIT is much more efficient than existing algorithms.

References

  1. Agrawal, R. and Srikant, R., 1995. Mining Sequential Patterns. In Proc. of the 11th Intl. Conf. on Data Engineering.
  2. Srikant, R. and Agrawal, R., 1996. Mining Sequential Patterns: Generalizations and Performance Improvements. In Proc. of the 5th Intl. Conf. on Extending Database Technology (EDBT'96).
  3. Mannila, H., Toivonen, H. and Verkamo, A. I., 1997. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1:259-289.
  4. Pei, J. et al, 2001. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. CDE'01.
  5. Zaki, M., 2001. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning, 40:31-60.
  6. Garofalakis, M., Rastogi, R. and Shim, K., 1999. Spirit: Sequential pattern mining with regular expression constraints. VLDB'99.
  7. Agrwal, R. and Srikant, R., 1994. Fast algoritms for mining association rules. VLDB'94.
  8. Han, J., Pei, J., Mortazavi-As1, B., Chen, Q., Dayal, U. and M-C. Hsu, 2000. FreeSpan: Frequent PatternProjected Sequential Pattern Mining. In Proc. 2000 Int. Conf. Knoweldge Discovery and Data Mining (KDD'00), 355 -359, Boston, MA.
  9. Ayres, J., Gehrke, J., Yiu, T. and Flannick, J., 2002. Sequential pattern Mining using A Bitmap Representation. SIGKDD'02.
  10. Ming-Yen Lin and Suh-Yin Lee, 2002. Fast discovery of sequential patterns by memory indexing. In Proc. of 2002 DaWaK, pages 150-160.
  11. Pei, J., Han, J. and Wei Wang, 2002. Mining Sequential patterns with constraints in Large Databases. CIKM'02.
  12. Pei, J. and Han, J., 2002. Constraints Frequent Pattern Mining: A Pattern-Growth View. ACM SIGKDD Explorations (Special Issue on Constraints in Data Mining), Volume 4, Issue 1, pages 31-39.
  13. Ng, R., Lakshmanan, L. V. S., Han, J., and A. Pang, 1998. Exploratory mining and pruning optimizations of constrained associations rules. SIGMOD'98.
  14. Pei, J. and Han, J., 2000. Can we push more constraints into frequent pattern mining? KDD'00.
  15. Pei, J. et al., 2001. Mining frequent itemsets with convertible constraints. ICDE'01.
Download


Paper Citation


in Harvard Style

Hamza M., Mahar K. and Younis M. (2004). MINING SEQUENTIAL PATTERNS WITH REGULAR EXPRESSION CONSTRAINTS USING SEQUENTIAL PATTERN TREE . In Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 972-8865-00-7, pages 116-121. DOI: 10.5220/0002621601160121


in Bibtex Style

@conference{iceis04,
author={Meer Hamza and Khaled Mahar and Mohamed Younis},
title={MINING SEQUENTIAL PATTERNS WITH REGULAR EXPRESSION CONSTRAINTS USING SEQUENTIAL PATTERN TREE},
booktitle={Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2004},
pages={116-121},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002621601160121},
isbn={972-8865-00-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - MINING SEQUENTIAL PATTERNS WITH REGULAR EXPRESSION CONSTRAINTS USING SEQUENTIAL PATTERN TREE
SN - 972-8865-00-7
AU - Hamza M.
AU - Mahar K.
AU - Younis M.
PY - 2004
SP - 116
EP - 121
DO - 10.5220/0002621601160121