Note that the growth rate is sharper for the number
of frequent patterns than that of the execution time.
Thus, by decreasing the minimum support, the extra
execution time is worthy of the insights gained from
the extra frequent patterns.
Table 6: Execution time for minSupp equal to 0.0100
0.0125%, 0.0150%, 0.0175%, 0.0200% of |D|.
database
no. of freq patterns / execution time (second)
0.0100% 0.0125% 0.0150% 0.0175% 0.0200%
2D100K 1633/125 1295/114
973/102 714/95 579/88
2D200K 2920/394 2235/371
1509/352 1287/327 1133/304
2D300K 1843/557 1340/511
1062/467 859/426 692/391
4D100K 1902/237 1688/211
1326/200 1163/193 747/180
4D200K 7460/794 3429/586
2117/502 1648/453 1381/425
4D300K 7185/1206 3280/826
1760/687 937/452 589/415
5 CONCLUSIONS
We designed the PHSPAM algorithm to remedy the
problems that the set of frequent hybrid sequential
patterns obtained by previous researches is
incomplete and that the execution time does not
scale with growing database sizes. PHSPAM obtains
the complete set by first collecting items that might
appear in the frequent patterns. PHSPAM then uses
the pattern growth techniques to calculate the
support of patterns. PHSPAM was implemented and
compared with GFP2 and CHSPAM. The
experiments demonstrated that PHSPAM indeed
obtained more frequent patterns than GFP2. In
addition, achieving the same completeness result,
the execution time of PHSPAM is better than GFP2
and much better than CHSPAM, due to the
accumulated counts preserved in the projected data
structures.
Our future research regarding hybrid sequential
pattern mining includes:
(1) Apply PHSPAM in real world applications, like
web page traversal paths mining through web
logs.
(2) Examine the effect of replacing the support
definition such that a transaction could
contribute at most one in the support counting
of a pattern.
(3) Consider application specific constraint of the
patterns, like timing limitations.
ACKNOWLEDGEMENTS
This work was supported in part by Taiwan’s Natio-
nal Science Council under Grant NSC :97-2221-E-
032-046.
REFERENCES
Agrawal, R., Imielinski, T., Swami, A., 1993. Mining
association rules between sets of items in large
databases. In Proc. of the 1993 ACM SIGMOD
International Conference on Management of Data,
Washington D.C., U.S.A., pp. 207-216.
Agrawal, R., Srikant, R., 1994. Fast algorithm for mining
association rules. In Proc. of the 20th International
Conference on VLDB, Santiago, pp. 487-499.
Agrawal, R., Srikant, R., 1995. Mining sequential patterns.
In Proc. of the 11th International Conference on Data
Engineering, Taipei, Taiwan, pp. 3-14.
Agrawal, R., Srikant, R., 1996. Mining sequential patterns:
generalizations and performance improvements. In
Lecture Notes in Computer Science, Vol.1057, pp. 3-17.
Chen, M., Park, J.S., and Yu, P.S., 1998. Efficient data
mining for path traversal patterns. IEEE Trans.
Knowledge Data Engineering, Vol. 10(2), pp. 209-221.
Chen, Y.L., Chen, S.S., Hsu, P.Y., 2002. Mining hybrid
sequential patterns and sequential rules. Information
Systems, Vol. 27(5), pp. 345-362.
Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U.,
Hsu, M.C., 2000a. Freespan: frequent pattern-
projected sequential pattern mining. In Proc. of the 6th
ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, Boston,
U.S.A., pp. 355-359.
Han, J., Pei, J., Yin, Y.W., 2000b. Mining frequent
patterns without candidate generation. In Proc. of the
2000 ACM SIGMOD International Conference on
Management of Data, New York, U.S.A. pp. 1-12.
Jou, C., 2006. Mining Complete Hybrid Sequential
Patterns. In Proc. of the DMIN 2006 International
Conference on Data Mining, pp. 218-223, Las Vegas,
USA, June 26-29.
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q.,
Dayal, U., Hsu, M.C., 2001. PrefixSpan: mining
sequential patterns efficiently by prefix projected
pattern growth. In Proc. of the 17th International
Conference on Data Engineering, Heidelberg,
Germany, pp. 106-115.
Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H.,
Chen, Q., Dayal, U., Hsu, M.C., 2004. Mining
sequential patterns by pattern growth: the PrefixSpan
approach. IEEE Trans. on Knowledge and Data
Engineering, Vol. 16(10), pp. 1-17.
Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H., 2000. Mining
access patterns efficiently from web logs. In Proc. of
the 4
th
Pacific-Asia Conference on Knowledge Disco-
very and Data Mining, Kyoto, Japan, pp. 396-407.
Zaki, M. J., 2001. SPADE: an efficient algorithm for
mining frequent sequences. Machine Learning,
Special Issue on Unsupervised Learning, Vol.42(1-2),
pp. 31-60.
A PROJECTION-BASED HYBRID SEQUENTIAL PATTERNS MINING ALGORITHM
157