Extracting Multi-item Sequential Patterns by Wap-tree Based Approach

Kezban Dilek Onal, Pinar Karagoz

2014

Abstract

Sequential pattern mining constitutes a basis for solution of problems in web mining, especially in web usage mining. Research on sequence mining continues seeking faster algorithms. WAP-Tree based algorithms that emerged from the web usage mining literature have shown a remarkable performance on single-item sequence databases. In this study, we investigate the application of WAP-Tree based mining to multi-item sequential pattern mining and we present MULTI-WAP-Tree, which extends WAP-Tree for multi-item sequence databases. In addition, we propose a new algorithm MULTI-FOF-SP (MULTI-FOF-Sibling Principle) that extracts patterns on MULTI-WAP-Tree. MULTI-FOF-SP is based on the previous WAP-Tree based algorithm FOF (First Occurrence Forest) and an early pruning strategy called ”Sibling Principle” from the literature. Experimental results reveal that MULTI-FOF-SP finds patterns faster than PrefixSpan on dense multi-item sequence databases with small alphabets.

References

  1. Agrawal, R., Imelinski, T., and Swami, A. (1993). Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 207-216. ACM.
  2. Agrawal, R. and Srikant, R. (1995). Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering (ICDE'95), pages 3-14. IEEE.
  3. Ezeife, C. and Lu, Y. (2005). Mining web log sequential patterns with position coded pre-order linked wap-tree. Data Mining and Knowledge Discovery, 10(1):5-38.
  4. Han, J., Pei, J., and Yan, X. (2005). Sequential pattern mining by pattern-growth: Principles and extensions*. In Chu, W. and Lin, T., editors, Foundations and Advances in Data Mining, volume 180 of Studies in Fuzziness and Soft Computing, pages 183-220.
  5. Liu, L. and Liu, J. (2010). Mining web log sequential patterns with layer coded breadth-first linked wap-tree. In International Conference of Information Science and Management Engineering (ISME'2010), volume 1, pages 28-31. IEEE.
  6. Mabroukeh, N. and Ezeife, C. (2010). A taxonomy of sequential pattern mining algorithms. ACM Computing Surveys (CSUR), 43(1):3.
  7. Masseglia, F., Poncelet, P., and Cicchetti, R. (2000). An e cient algorithm for web usage mining. Networking and Information Systems Journal, 2(5/6):571-604.
  8. Mooney, C. H. and Roddick, J. F. (2013). Sequential pattern mining - approaches and algorithms. ACM Comput. Surv., 45(2):19:1-19:39.
  9. Pei, J., Han, J., Mortazavi-Asl, B., and Zhu, H. (2000). Mining access patterns e ciently from web logs. Knowledge Discovery and Data Mining. Current Issues and New Applications, pages 396-407.
  10. Peterson, E. and Tang, P. (2008). Mining frequent sequential patterns with first-occurrence forests. In Proceedings of the 46th Annual Southeast Regional Conference (ACMSE), pages 34-39. ACM.
  11. Song, S., Hu, H., and Jin, S. (2005). Hvsm: A new sequential pattern mining algorithm using bitmap representation. In Li, X., Wang, S., and Dong, Z., editors, Advanced Data Mining and Applications, volume 3584 of Lecture Notes in Computer Science, pages 455- 463. Springer Berlin Heidelberg.
  12. Tang, P., Turkia, M., and Gallivan, K. (2006). Mining web access patterns with first-occurrence linked waptrees. In Proceedings of the 16th International Conference on Software Engineering and Data Engineering (SEDE'07), pages 247-252. Citeseer.
Download


Paper Citation


in Harvard Style

Onal K. and Karagoz P. (2014). Extracting Multi-item Sequential Patterns by Wap-tree Based Approach . In Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST, ISBN 978-989-758-024-6, pages 215-222. DOI: 10.5220/0004788102150222


in Bibtex Style

@conference{webist14,
author={Kezban Dilek Onal and Pinar Karagoz},
title={Extracting Multi-item Sequential Patterns by Wap-tree Based Approach},
booktitle={Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,},
year={2014},
pages={215-222},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004788102150222},
isbn={978-989-758-024-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,
TI - Extracting Multi-item Sequential Patterns by Wap-tree Based Approach
SN - 978-989-758-024-6
AU - Onal K.
AU - Karagoz P.
PY - 2014
SP - 215
EP - 222
DO - 10.5220/0004788102150222