# Comparison of Sampling Size Estimation Techniques for Association Rule Mining

### Tuğba Halıcı, Utku Görkem Ketenci

#### Abstract

Fast and complete retrieval of individual customer needs and "to the point" product offers are crucial aspects of customer satisfaction in todays’ highly competitive banking sector. Growing number of transactions and customers have excessively boosted the need for time and memory in market basket analysis. In this paper, sampling process is included into analysis aiming to increase the performance of a product offer system. The core logic of a sample, is to dig for smaller representative of the universe, that is to generate accurate association rules. A smaller sample of the universe reduces the elapsed time and the memory consumption devoted to market basket analysis. Based on this content; the sampling methods, the sampling size estimation techniques and the representativeness tests are examined. The technique, which gives complete set of association rules in a reduced amount of time, is suggested for sampling retail banking data.

#### References

- Agrawal, R., ImieliÁski, T., and Swami, A. (1993). Mining association rules between sets of items in large databases. In ACM SIGMOD Record, volume 22, pages 207-216. ACM.
- Agresti, A. (1996). An introduction to categorical data analysis, volume 135. Wiley New York.
- Chakaravarthy, V. T., Pandit, V., and Sabharwal, Y. (2009). Analysis of sampling techniques for association rule mining. In Proceedings of the 12th international conference on database theory, pages 276-283. ACM.
- Durbin, J. (1973). Distribution theory for tests based on the sample distribution function, volume 9. Siam.
- Har-Peled, S. and Sharir, M. (2011). Relative (p, e)- approximations in geometry. Discrete & Computational Geometry, 45(3):462-496.
- Hidber, C. (1999). ume 28. ACM.
- Hipp, J., Güntzer, U., and Nakhaeizadeh, G. (2000). Algorithms for association rule mining-a general survey and comparison. ACM sigkdd explorations newsletter, 2(1):58-64.
- L öffler, M. and Phillips, J. M. (2009). Shape fitting on point sets with probability distributions. In Algorithms-ESA 2009, pages 313-324. Springer.
- Mannila, H., Toivonen, H., and Verkamo, A. I. (1994). E cient algorithms for discovering association rules. In KDD-94: AAAI workshop on Knowledge Discovery in Databases, pages 181-192.
- Pei, J., Han, J., Lu†, H., Nishio, S., Tang, S., and Yang, D. (2007). H-mine: Fast and space-preserving frequent pattern mining in large databases. IIE Transactions, 39(6):593-605.
- Pei, J., Han, J., Mao, R., et al. (2000). Closet: An efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD workshop on research issues in data mining and knowledge discovery, volume 4, pages 21- 30.
- Riondato, M. and Upfal, E. (2012). Efficient discovery of association rules and frequent itemsets through sampling with tight performance guarantees. In Machine Learning and Knowledge Discovery in Databases, pages 25-41. Springer.
- Toivonen, H. et al. (1996). Sampling large databases for association rules. In VLDB, volume 96, pages 134- 145.
- Vapnik, V. N. and Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability & Its Applications, 16(2):264-280.
- Zaki, M. J. and Hsiao, C.-J. (2002). Charm: An efficient algorithm for closed itemset mining. In SDM, volume 2, pages 457-473. SIAM.
- Zaki, M. J., Parthasarathy, S., Li, W., and Ogihara, M. (1997). Evaluation of sampling for data mining of association rules. In Research Issues in Data Engineering, 1997. Proceedings. Seventh International Workshop on, pages 42-50. IEEE.
- Zhang, H., Zhao, Y., Cao, L., and Zhang, C. (2008). Combined association rule mining. In Advances in Knowledge Discovery and Data Mining, pages 1069-1074. Springer.

#### Paper Citation

#### in Harvard Style

Halıcı T. and Ketenci U. (2015). **Comparison of Sampling Size Estimation Techniques for Association Rule Mining** . In *Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)* ISBN 978-989-758-158-8, pages 195-202. DOI: 10.5220/0005589801950202

#### in Bibtex Style

@conference{kdir15,

author={Tuğba Halıcı and Utku Görkem Ketenci},

title={Comparison of Sampling Size Estimation Techniques for Association Rule Mining},

booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)},

year={2015},

pages={195-202},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0005589801950202},

isbn={978-989-758-158-8},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)

TI - Comparison of Sampling Size Estimation Techniques for Association Rule Mining

SN - 978-989-758-158-8

AU - Halıcı T.

AU - Ketenci U.

PY - 2015

SP - 195

EP - 202

DO - 10.5220/0005589801950202