HASHMAX: A NEW METHOD FOR MINING MAXIMAL FREQUENT ITEMSETS

Natalia Vanetik, Ehud Gudes

Abstract

Mining maximal frequent itemsets is a fundamental problem in many data mining applications, especially in the case of dense data when the search space is exponential. We propose a top-down algorithm that employs hashing techniques, named HashMax, in order to generate maximal frequent itemsets efficiently. An empirical evaluation of our algorithm in comparison with the state-of-the-art maximal frequent itemset generation algorithm Genmax shows the advantage of HashMax in the case of dense datasets with a large amount of maximal frequent itemsets.

References

  1. Agarwal, R., Aggarwal, C., and Prasad, V. (2000). Depth first generation of long patterns. In ACM SIGKDD Conf.
  2. Bayardo, R. J. (1998). Efficiently mining long patterns from databases. In ACM SIGMOD Conf. on Management of Data, pages 85-93.
  3. Burdick, D., Calimlim, M., and Gehrke, J. (2001). Mafia, a maximal frequent itemset algorithm for transactional databases. In IEEE Intl. Conf. on Data Engineering, pages 443-452.
  4. Genmax (2011). Genmax implementation. http://www.cs.rpi.edu/ zaki/wwwnew/pmwiki.php/Software.
  5. Gouda, K. and Zaki, M. J. (2005). Genmax: An efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery 11(3), pages 223- 242.
  6. Han, J., Pei, J., and Yin, Y. (2000). Mining frequent patterns without candidate generation. In ACM SIGMOD Conf. on Management of Data, pages 1-12.
  7. Hu, T., Sung, S. Y., Xiong, H., and Fu, Q. (2008). Discovery of maximum length frequent itemsets. In Inf. Sci. 178(1), pages 69-87.
  8. Lin, D.-I. and Kedem, Z. M. (1998). Pincer search: A new algorithm for discovering the maximum frequent set. In EDBT, pages 105-119.
  9. UCI (2011). Uci machine learning data repository. http://archive.ics.uci.edu/ml/index.html.
  10. Yang, G. (2004). The complexity of mining maximal frequent itemsets and maximal frequent patterns. In KDD, pages 344-353.
  11. Zaki, M. J., Parthasarathy, S., Ogihara, M., and Li, W. (1997). New algorithms for fast discovery of association rules. In Third Int 1 Conf. on Knowledge Discovery in Databases and Data Mining, pages 283-286.
Download


Paper Citation


in Harvard Style

Vanetik N. and Gudes E. (2011). HASHMAX: A NEW METHOD FOR MINING MAXIMAL FREQUENT ITEMSETS . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 132-137. DOI: 10.5220/0003628101400145


in Bibtex Style

@conference{kdir11,
author={Natalia Vanetik and Ehud Gudes},
title={HASHMAX: A NEW METHOD FOR MINING MAXIMAL FREQUENT ITEMSETS},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={132-137},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003628101400145},
isbn={978-989-8425-79-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - HASHMAX: A NEW METHOD FOR MINING MAXIMAL FREQUENT ITEMSETS
SN - 978-989-8425-79-9
AU - Vanetik N.
AU - Gudes E.
PY - 2011
SP - 132
EP - 137
DO - 10.5220/0003628101400145