Symmetry Breaking in Itemset Mining

Belaïd Benhamou, Saïd Jabbour, Lakhdar Sais, Yakoub Salhi

Abstract

The concept of symmetry has been extensively studied in the field of constraint programming and in propositional satisfiability. Several methods for detection and removal of these symmetries have been developed, and their integration in known solvers of these domain improved dramatically their effectiveness on a large variety of problems considered difficult to solve. The concept of symmetry may be exported to other domains where some structures can be exploited effectively. Particularly in data mining where some tasks can be expressed as constraints. In this paper, we are interested in the detection and elimination of symmetries in the problem of finding frequent itemsets of a transaction database and its variants. Recent works have provided effective encodings as Boolean constraints for these data mining tasks and some recent works on symmetry detection and elimination in itemset mining problems have been proposed. In this work we propose a generic framework that could be used to eliminate symmetries for data mining task expressed in a declarative constraint language. We show how symmetries between the items of the transactions are detected and eliminated by adding symmetry-breaking predicate (SBP) to the Boolean encoding of the data mining task.

References

  1. Agrawal, R., ImieliÁski, T., and Swami, A. (1993). Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, SIGMOD 7893, pages 207-216, New York, NY, USA. ACM.
  2. Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 7894, pages 487-499, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
  3. Aloul, F. A., Markov, I. L., and Sakallah, K. A. (2003a). Shatter: efficient symmetry-breaking for boolean satisfiability. In DAC, pages 836-839. ACM.
  4. Aloul, F. A., Ramani, A., Markov, I. L., and Sakallah, K. A. (2002). Solving difficult SAT instances in the presence of symmetry. In Proceedings of the 39th Design Automation Conference (DAC 2002), pages 731-736. ACM Press.
  5. Aloul, F. A., Ramani, A., Markov, I. L., and Sakallah, K. A. (2003b). Solving difficult instances of boolean satisfiability in the presence of symmetry. IEEE Trans. on CAD of Integrated Circuits and Systems, 22(9):1117- 1137.
  6. Aloul, F. A., Ramani, A., Markov, I. L., and Sakallak, K. A. (2004). Symmetry breaking for pseudo-boolean satisfiabilty. In ASPDAC'04, pages 884-887.
  7. Benhamou, B. (1994). Study of symmetry in constraint satisfaction problems. PPCP'94, pages 246-254.
  8. Benhamou, B. and Sais, L. (1992a). Theoretical study of symmetries in propositional calculus and application. In CADE'11, pages 281-294.
  9. Benhamou, B. and Sais, L. (1992b). Theoretical study of symmetries in propositional calculus and applications. In CADE, pages 281-294.
  10. Benhamou, B. and Sais, L. (1994a). Tractability through symmetries in propositional calculus. In JAR, 12:89- 102.
  11. Benhamou, B. and Sais, L. (1994b). Tractability through symmetries in propositional calculus. J. Autom. Reasoning, 12(1):89-102.
  12. Besson, J., Boulicaut, J.-F., Guns, T., and Nijssen, S. (2010). Generalizing itemset mining in a constraint programming setting. In Inductive Databases and Constraint-Based Data Mining, pages 107-126. Springer.
  13. Bonchi, F. and Lucchese, C. (2007). Extending the stateof-the-art of constraint-based pattern discovery. Data Knowl. Eng., 60(2):377-399.
  14. Bucila?, C., Gehrke, J., Kifer, D., and White, W. (2003). Dualminer: A dual-pruning algorithm for itemsets with constraints. Data Mining and Knowledge Discovery, 7(3):241-272.
  15. Burdick, D., Calimlim, M., and Gehrke, J. (2001). Mafia: A maximal frequent itemset algorithm for transactional databases. In In ICDE, pages 443-452.
  16. Crawford, J., Ginsberg, M., Luks, E., and Roy, A. (1996). Symmetry-breaking predicates for search problems. In Knowledge Representation (KR), pages 148-159. Morgan Kaufmann.
  17. Darga, P. T., Sakallah, K. A., and Markov, I. L. (2008). Faster symmetry discovery using sparsity of symmetries. In Proceedings of the 45th Annual Design Automation Conference, DAC 7808, pages 149-154, New York, NY, USA. ACM.
  18. Desrosiers, C., Galinier, P., Hansen, P., and Hertz, A. (2007). Improving frequent subgraph mining in the presence of symmetry. In MLG.
  19. Freuder, E. (1991). Eliminating interchangeable values in constraints satisfaction problems. AAAI-91, pages 227-233.
  20. Grahne, G. and Zhu, J. (2005). Fast algorithms for frequent itemset mining using fp-trees. IEEE Trans. on Knowl. and Data Eng., 17(10):1347-1362.
  21. Guns, T., Dries, A., Tack, G., Nijssen, S., and Raedt, L. D. (2013). Miningzinc: A modeling language for constraint-based mining. In International Joint Conference on Artificial Intelligence, pages -, Beijing, China.
  22. Guns, T., Nijssen, S., and De Raedt, L. (2011a). Itemset mining: A constraint programming perspective. Artif. Intell., 175(12-13):1951-1983.
  23. Guns, T., Nijssen, S., and de Raedt, L. (2011b). k-pattern set mining under constraints. IEEE TKDE, 99(PrePrints).
  24. Gly, A., Medina, R., Nourine, L., and Renaud, Y. (2005). Uncovering and reducing hidden combinatorics in guigues-duquenne bases. In Ganter, B. and Godin, R., editors, ICFCA, Lecture Notes in Computer Science, pages 235-248. Springer.
  25. Han, J., Pei, J., and Yin, Y. (2000). Mining frequent patterns without candidate generation. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 7800, pages 1-12, New York, NY, USA. ACM.
  26. Henriques, R., Lynce, I., and Manquinho, V. M. (2012). On when and how to use sat to mine frequent itemsets. CoRR, abs/1207.6253.
  27. Jabbour, S., Khiari, M., Sais, L., Salhi, Y., and Tabia, K. (2013a). Symmetry-based pruning in itemset mining. In 25th International Conference on Tools with Artificial Intelligence(ICTAI'13), Washington DC, USA. IEEE Computer Society.
  28. Jabbour, S., Sais, L., and Salhi, Y. (2013b). Boolean satisfiability for sequence mining. In CIKM, pages 649-658.
  29. Jabbour, S., Sais, L., and Salhi, Y. (2013c). Top-k frequent closed itemset mining using top-k sat problem. In European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD'13), volume 146, pages 131-140. Springer.
  30. Jabbour, S., Sais, L., Salhi, Y., and Tabia, K. (2012). Symmetries in itemset mining. In 20th European Conference on Artificial Intelligence (ECAI 7812), pages 432- 437. IOS Press.
  31. Khiari, M., Boizumault, P., and Crmilleux, B. (2010). Constraint programming for mining n-ary patterns. In Cohen, D., editor, CP, volume 6308 of Lecture Notes in Computer Science, pages 552-567. Springer.
  32. Krishnamurthy, B. (1985). Short proofs for tricky formulas. Acta Inf., 22(3):253-275.
  33. Krishnamurty, B. (1985). Short proofs for tricky formulas. Acta Inf., (22):253-275.
  34. Métivier, J.-P., Boizumault, P., Crémilleux, B., Khiari, M., and Loudni, S. (2012). A constraint language for declarative pattern discovery. In Proceedings of the 27th Annual ACM Symposium on Applied Computing, SAC 7812, pages 119-125, New York, NY, USA. ACM.
  35. Minato, S. I. (2006). Symmetric item set mining based on zero-suppressed bdds. In Todorovski, L., Lavrac, N., and Jantke, K. P., editors, Discovery Science, volume 4265 of Lecture Notes in Computer Science, pages 321-326. Springer.
  36. Minato, S. I., Uno, T., and Arimura, H. (2007). Fast generation of very large-scale frequent itemsets using a compact graph-based representation.
  37. Murtagh, F. and Contreras, P. (2010). Hierarchical clustering for finding symmetries and other patterns in massive, high dimensional datasets. CoRR, abs/1005.2638.
  38. Pei, J., Han, J., and Lakshmanan, L. V. S. (2004). Pushing convertible constraints in frequent itemset mining. Data Min. Knowl. Discov., 8(3):227-252.
  39. Puget, J. F. (1993). On the satisfiability of symmetrical constrained satisfaction problems. In In J. Kamorowski and Z. W. Ras,editors, Proceedings of ISMIS'93, LNAI 689, pages 350-361.
  40. Raedt, L. D., Guns, T., and Nijssen, S. (2008). Constraint programming for itemset mining. In KDD, pages 204- 212.
  41. Raedt, L. D., Guns, T., and Nijssen, S. (2010). Constraint programming for data mining and machine learning. In AAAI.
  42. Tiwari, A., Gupta, R., and Agrawal, D. (2010). A survey on frequent pattern mining: Current status and challenging issues. Inform. Technol. J, 9:1278-1293.
  43. Tseitin, G. S. (1968). On the complexity of derivation in propositional calculus. In Structures in the constructive Mathematics and Mathematical logic, pages 115- 125. H.A.O Shsenko.
  44. Uno, T., Asai, T., Uchida, Y., and Arimura, H. (2003). Lcm: An efficient algorithm for enumerating frequent closed item sets. In In Proceedings of Workshop on Frequent itemset Mining Implementations (FIMI03. T., Kiyomi, M., and Arimura, H. (2004).
  45. Lcm ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets. In FIMI.
  46. Vanetik, N. (2010). Mining graphs with constraints on symmetry and diameter. In Shen, H. T., Pei, J., zsu, M. T., Zou, L., Lu, J., Ling, T.-W., Yu, G., Zhuang, Y., and Shao, J., editors, WAIM Workshops, volume 6185 of Lecture Notes in Computer Science, pages 1-12. Springer.
  47. Zaki, M. J. and Hsiao, C.-J. (2005). Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. on Knowl. and Data Eng., 17(4):462- 478.
Download


Paper Citation


in Harvard Style

Benhamou B., Jabbour S., Sais L. and Salhi Y. (2014). Symmetry Breaking in Itemset Mining . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014) ISBN 978-989-758-048-2, pages 86-96. DOI: 10.5220/0005078200860096


in Bibtex Style

@conference{kdir14,
author={Belaïd Benhamou and Saïd Jabbour and Lakhdar Sais and Yakoub Salhi},
title={Symmetry Breaking in Itemset Mining},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)},
year={2014},
pages={86-96},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005078200860096},
isbn={978-989-758-048-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)
TI - Symmetry Breaking in Itemset Mining
SN - 978-989-758-048-2
AU - Benhamou B.
AU - Jabbour S.
AU - Sais L.
AU - Salhi Y.
PY - 2014
SP - 86
EP - 96
DO - 10.5220/0005078200860096