counting is considerably more time consuming than
candidate generation in Apriori-based methods.
Second of the conducted experiments examined
how well the algorithms scale with the increasing
number of concurrently executed queries. In order to
keep the queries equally similar, the level of
overlapping between each pair of subsequent queries
inside the batch was fixed at 75%. As can be seen in
Figure 3, the generation time of CCT grows linearly
with the increase of the number of queries in a batch,
while CCan remains largely insensitive. Total
execution times increase similarly for both methods,
with CCan performing slightly better, especially
with more queries in a batch.
Figure 3: Generation and total execution times for
different numbers of similar queries.
6 CONCLUSIONS
In this paper we addressed the problem of efficient
processing of batches of frequent itemset queries in
the context of the Apriori algorithm. We proposed a
new algorithm, called Common Candidates, built
upon Common Candidate Tree, offering further
integration of computations performed for a batch of
queries thanks to the integrated candidate generation
procedure.
The conducted experiments showed that the new
method results in significant reduction of the total
time spent on candidate generation. The impact of
the integrated candidate generation procedure on the
overall execution time is less spectacular but still
noticeable.
In the future we plan to investigate the possible
impact of several optimizations applied to Apriori by
its practical implementations on our batch
processing algorithms.
REFERENCES
Agrawal, R., Imielinski, T., Swami, A., 1993. Mining
Association Rules Between Sets of Items in Large
Databases, In Proc. of the 1993 ACM SIGMOD Conf.
Agrawal, R., Mehta, M., Shafer, J., Srikant, R., Arning,
A., Bollinger, T., 1996. The Quest Data Mining
System, In Proc. of the 2nd KDD Conference.
Agrawal, R., Srikant, R., 1994. Fast Algorithms for
Mining Association Rules, In Proc. of the 20th VLDB
Conference.
Baralis, E., Psaila, G.,1999. Incremental Refinement of
Mining Queries, In Proceedings of the 1st DaWaK
Conference.
Blockeel, H., Dehaspe, L., Demoen, B., Janssens, G.,
Ramon, J., Vandecasteele, H., 2002. Improving the
Efficiency of Inductive Logic Programming Through
the Use of Query Packs, Journal of Artificial
Intelligence Research, Vol. 16.
Grudzinski, P., Wojciechowski, M., 2007. Integration of
Candidate Hash Trees in Concurrent Processing of
Frequent Itemset Queries Using Apriori, In Proc. of
the 3rd ADMKD Workshop.
Imielinski, T., Mannila, H., 1996. A Database Perspective
on Knowledge Discovery, Communications of the
ACM, Vol. 39.
Jin, R., Sinha, K., Agrawal, G., 2005. Simultaneous
Optimization of Complex Mining Tasks with a
Knowledgeable Cache, In Proc. of the 11th KDD
Conference.
Meo, R., 2003. Optimization of a Language for Data
Mining, In Proc. of the ACM SAC Conference.
Pei, J., Han, J., 2000. Can We Push More Constraints into
Frequent Pattern Mining?, In Proc. of the 6th KDD
Conference.
Sellis, T., 1988. Multiple-query optimization, ACM
Transactions on Database Systems, Vol. 13.
Wojciechowski, M., Zakrzewicz, M., 2002. Methods for
Batch Processing of Data Mining Queries, In Proc. of
the 5th DB&IS Conference.
KDIR 2010 - International Conference on Knowledge Discovery and Information Retrieval
490