Classification of Datasets with Frequent Itemsets is Wild

Natalia Vanetik

Abstract

The problem of dataset classification with frequent itemsets is defined as the problem of determining whether or not two different datasets have the same frequent itemsets without computing these itemsets explicitly. The reasoning behind this approach is high computational cost of computing frequent itemsets. Finding welldefined and understandable normal forms for this classification task would be a breakthrough in dataset classification field. The paper proves that classification of datasets with frequent itemsets is a hopeless task since canonical forms do not exist for this problem.

References

  1. Agrawal, R., Srikant, R. (1994). Fast algorithms for mining association rules in large databases. Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, pages 487-499, Santiago, Chile.
  2. Belitskii, G., Sergeichuk, V. (2003). Complexity of matrix problems, Linear Algebra Appl. 361, pp. 203-222.
  3. Calders, T. (2004). Computational complexity of itemset frequency satisfiability. In: Proc. 23rd ACM PODS 04, pp. 143-154, ACM Press.
  4. Drozd, J. (1980). Tame and wild matrix problems. Lecture Notes in Mathematics, Volume 832, 242-258.
  5. Flouvat, F., De Marchi, F., Petit, JM. (2010), A new classification of datasets for frequent itemsets. J. Intell. Inf. Syst. 34, pp. 1-19.
  6. Friedland, S. (1983). Simultaneous similarity of matrices, Adv. Math. 50 pp. 189-265.
  7. Gabriel, P. (1972). Unzerlegbare Darstellungen I, Manuscripta Math. 6 pp. 71-103.
  8. Giesbrecht, M. (1995). Nearly Optimal Algorithms For Canonical Matrix Forms, SIAM Journal on Computing, v.24 n.5, pp.948-969.
  9. Gouda, K., Zaki, J. M. (2005). GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3) pp.223-242.
  10. Han, J., Pei, J., Yin, Y. (2000). Mining frequent patterns without candidate generation. In: Proceeding of the 2000 ACM-SIGMOD international conference on management of data (SIGMOD00), Dallas, TX, pp 1- 12.
  11. Kuznetsov, S. O. (1989). Interpretation on Graphs and Complexity Characteristics of a Search for Specific Patterns, Nauchn. Tekh. Inf., Ser. 2 (Automatic Document Math Linguist), vol. 23, no. 1, pp. 23-37.
  12. Palmerini, P., Orlando, S., Perego, R. (2004). Statistical properties of transactional databases. In H. Haddad, A. Omicini, R. L. Wainwright, L. M. Liebrock (Eds.), SAC (pp. 515-519). New York: ACM.
  13. Parthasarathy S., Ogihara, M. (2000). Clustering Distributed Homogeneous Datasets. in Proceedings PKDD 2000, pp. 566-574
  14. Sergeichuk, V. (1977), The classification of metabelian p-groups (Russian), Matrix problems, Akad. Nauk Ukrain. SSR Inst. Mat., Kiev, pp. 150-161.
  15. Subramonian, R. (1998). Defining diff as a Data Mining Primitive. in Proceedings of KDD1998, pp. 334-338
  16. Lipyanski, R., Vanetik N.(2010). The classification problem for graphs and lattices is wild, in Proceeding of the International Conference on Modern Algebra and its Applications, pp. 107-111, Batumi, September 20th26th.
  17. Yang, G. (2004). The complexity of mining maximal frequent itemsets and maximal frequent patterns. Proc. Int. Conf. Knowl. Discov. Data Mining, pp. 344-353.
  18. Ye, Y., Wang, D., Li, T., Ye, D. (2007). IMDS: intelligent malware detection system, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1043-1047.
  19. Zaïane, O. R., Antonie, M. L. (2002). Classifying Text Documents by Associating Terms With Text Categories, Australasian Database Conference.
  20. Zaki, M. J. (2000) Scalable algorithms for association mining, IEEE TransKnowl Data Eng 12:372-390.
  21. Zaki, M. J., Carothers, C. D., Szymanski B. K. (2010). VOGUE: A Variable Order Hidden Markov Model with Duration based on Frequent Sequence Mining. ACM Transactions on Knowledge Discovery in Data, 4(1).
Download


Paper Citation


in Harvard Style

Vanetik N. (2012). Classification of Datasets with Frequent Itemsets is Wild . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012) ISBN 978-989-8565-29-7, pages 386-389. DOI: 10.5220/0004167903860389


in Bibtex Style

@conference{kdir12,
author={Natalia Vanetik},
title={Classification of Datasets with Frequent Itemsets is Wild},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012)},
year={2012},
pages={386-389},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004167903860389},
isbn={978-989-8565-29-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012)
TI - Classification of Datasets with Frequent Itemsets is Wild
SN - 978-989-8565-29-7
AU - Vanetik N.
PY - 2012
SP - 386
EP - 389
DO - 10.5220/0004167903860389