Constraint-programming Approach for Multiset and Sequence Mining

Pablo Gay, Beatriz López, Joaquim Meléndez

Abstract

Constraint-based data mining is a field that recently has started to receive more attention. Describing a problem through a declarative model enables very descriptive and easy to extend implementations. Our work uses a previous itemset mining model in order to extend it with the capabilities to discover different and interesting patterns that have not been explored yet: multisets and sequences. The classic example domain is the retailer organizations, trying to mine the most common combinations of items bought together. Multisets would allow mining not only this itemsets but also the quantities of each item and sequences the order in with the items are retrieved. In this paper, we provide the background of the original work and we describe the modifications done to the model to extend it and support these new patterns. We also test the new models using real world data to prove their feasibility.

References

  1. (2012). Frequent itemset mining dataset repository. http:// fimi.ua.ac.be/data/.
  2. Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules, volume 1215, pages 487- 499. Morgan Kaufmann.
  3. Agrawal, R. and Srikant, R. (1995). Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering, ICDE 7895, pages 3- 14, Washington, DC, USA. IEEE Computer Society.
  4. Bonchi, F., Giannotti, F., Lucchese, C., Orlando, S., Perego, R., and Trasarti, R. (2009). A constraint-based querying system for exploratory pattern discovery. Information Systems, 34(1):3-27.
  5. Bonchi, F. and Lucchese, C. (2007). Soft constraint-based pattern mining. Data Knowl. Eng., 60:377-399.
  6. Brand, M. (1998). Pattern discovery via entropy minimization. Technical report, MERL - A Mitsubishi Electric Research Laboratory.
  7. Burke, R. (1999). The wasabi personal shopper: a casebased recommender system. In Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence, AAAI 7899/IAAI 7899, pages 844- 849, Menlo Park, CA, USA. American Association for Artificial Intelligence.
  8. Cadez, I., Heckerman, D., Meek, C., Smyth, P., and White, S. (2000). Visualization of navigation patterns on a web site using model-based clustering. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 7800, pages 280-284, New York, NY, USA. ACM.
  9. David, J. and Nourine, L. (Submitted to Theoretical Computer Science). A generic algorithm for sequence mining. 15 pages. http://www-lipn.univparis13.fr/ david/old.version/articles/generation.pdf [Accessed: 9.2.2012].
  10. De Raedt, L., Guns, T., and Nijssen, S. (2008). Constraint programming for itemset mining. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 7808, pages 204-212, New York, NY, USA. ACM.
  11. De Raedt, L., Guns, T., and Nijssen, S. (2010). Constraint programming for data mining and machine learning. In AAAI, pages 1671-1675.
  12. De Raedt, L. and Kramer, S. (2001). The levelwise version space algorithm and its application to molecular fragment finding. In Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2, pages 853-859, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
  13. Frank, A. and Asuncion, A. (2010). UCI machine learning repository. http://archive.ics.uci.edu/ml.
  14. Frisch, A., Harvey, W., Jefferson, C., Martinez-Hernandez, B., and Miguel, I. (2008). Essence: A constraint language for specifying combinatorial problems. Constraints, 13:268-306. 10.1007/s10601-008-9047-y.
  15. Gal, Y., Reddy, S., Shieber, S. M., Rubin, A., and Grosz, B. J. (2012). Plan recognition in exploratory domains. Artificial Intelligence, 176(1):2270 - 2290.
  16. Guns, T., Nijssen, S., and Raedt, L. D. (2011). Itemset mining: A constraint programming perspective. Artificial Intelligence, 175(12-13):1951 - 1983.
  17. Han, J., Cheng, H., Xin, D., and Yao, X. (2007). Frequent pattern mining: current status and future directions. Data Min Knowl Disc, 15:55-86.
  18. Nethercote, N., Stuckey, P. J., Becket, R., Brand, S., Duck, G. J., and Tack, G. (2007). Minizinc: towards a standard cp modelling language. In Proceedings of the 13th international conference on Principles and practice of constraint programming, CP'07, pages 529- 543, Berlin, Heidelberg. Springer-Verlag.
  19. Schmidt, C., Sridharan, N., and Goodson, J. (1978). The plan recognition problem: An intersection of psychology and artificial intelligence. Artificial Intelligence, 11(1-2):45 - 83. Applications to the Sciences and Medicine.
  20. Soulet, A., Kléma, J., and Cremilleux, B. (2006). Efficient mining under flexible constraints through several datasets. In Dz?eroski, S. and Struyf, J., editors, Proceedings of 5th International Workshop on Knowledge Discovery in Inductive Databases, pages 131- 142, Berlin, Germany. Humbolt Universität Berlin, 2006, s.
  21. Srikant, R. and Agrawal, R. (1996). Mining sequential patterns: Generalizations and performance improvements. In Apers, P. M. G., Bouzeghoub, M., and Gardarin, G., editors, Proc. 5th Int. Conf. Extending Database Technology, EDBT, volume 1057, pages 3- 17. Springer-Verlag.
  22. Zhang, Z., Kwok, J. T., and Yeung, D.-Y. (2003). Parametric distance metric learning with label information. In Proceedings of the 18th international joint conference on Artificial intelligence, pages 1450-1452, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
Download


Paper Citation


in Harvard Style

Gay P., López B. and Meléndez J. (2012). Constraint-programming Approach for Multiset and Sequence Mining . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012) ISBN 978-989-8565-29-7, pages 212-220. DOI: 10.5220/0004135302120220


in Bibtex Style

@conference{kdir12,
author={Pablo Gay and Beatriz López and Joaquim Meléndez},
title={Constraint-programming Approach for Multiset and Sequence Mining},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012)},
year={2012},
pages={212-220},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004135302120220},
isbn={978-989-8565-29-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012)
TI - Constraint-programming Approach for Multiset and Sequence Mining
SN - 978-989-8565-29-7
AU - Gay P.
AU - López B.
AU - Meléndez J.
PY - 2012
SP - 212
EP - 220
DO - 10.5220/0004135302120220