datasets with (maximal, closed, all) frequent itemsets.
We show that high computational cost is not the only
problem of this approach and that there is in fact a
deeper reason to why the approach fails. We use
category-theoretic results to prove that well-described
normal forms for the dataset classification problem do
not exist.
REFERENCES
Agrawal, R., Srikant, R. (1994). Fast algorithms for min-
ing association rules in large databases. Proceedings
of the 20th International Conference on Very Large
Data Bases, VLDB, pages 487–499, Santiago, Chile.
Belitskii, G., Sergeichuk, V. (2003). Complexity of matrix
problems, Linear Algebra Appl. 361, pp. 203-222.
Calders, T. (2004). Computational complexity of itemset
frequency satisfiability. In: Proc. 23rd ACM PODS
04, pp. 143–154, ACM Press.
Drozd, J. (1980). Tame and wild matrix problems. Lecture
Notes in Mathematics, Volume 832, 242–258.
Flouvat, F., De Marchi, F., Petit, JM. (2010), A new classi-
fication of datasets for frequent itemsets. J. Intell. Inf.
Syst. 34, pp. 1–19.
Friedland, S. (1983). Simultaneous similarity of matrices,
Adv. Math. 50 pp. 189–265.
Gabriel, P. (1972). Unzerlegbare Darstellungen I,
Manuscripta Math. 6 pp. 71–103.
Giesbrecht, M. (1995). Nearly Optimal Algorithms For
Canonical Matrix Forms, SIAM Journal on Comput-
ing, v.24 n.5, pp.948–969.
Gouda, K., Zaki, J. M. (2005). GenMax: An Efficient Al-
gorithm for Mining Maximal Frequent Itemsets. Data
Mining and Knowledge Discovery: An International
Journal, 11(3) pp.223–242.
Han, J., Pei, J., Yin, Y. (2000). Mining frequent pat-
terns without candidate generation. In: Proceeding of
the 2000 ACM-SIGMOD international conference on
management of data (SIGMOD00), Dallas, TX, pp 1–
12.
Kuznetsov, S. O. (1989). Interpretation on Graphs and Com-
plexity Characteristics of a Search for Specific Pat-
terns, Nauchn. Tekh. Inf., Ser. 2 (Automatic Docu-
ment Math Linguist), vol. 23, no. 1, pp. 23–37.
Palmerini, P., Orlando, S., Perego, R. (2004). Statistical
properties of transactional databases. In H. Haddad,
A. Omicini, R. L. Wainwright, L. M. Liebrock (Eds.),
SAC (pp. 515–519). New York: ACM.
Parthasarathy S., Ogihara, M. (2000). Clustering Dis-
tributed Homogeneous Datasets. in Proceedings
PKDD 2000, pp. 566–574
Sergeichuk, V. (1977), The classification of metabelian
p-groups (Russian), Matrix problems, Akad. Nauk
Ukrain. SSR Inst. Mat., Kiev, pp. 150–161.
Subramonian, R. (1998). Defining diff as a Data Mining
Primitive. in Proceedings of KDD1998, pp. 334–338
Lipyanski, R., Vanetik N.(2010). The classification problem
for graphs and lattices is wild, in Proceeding of the
International Conference on Modern Algebra and its
Applications, pp. 107-111, Batumi, September 20th–
26th.
Yang, G. (2004). The complexity of mining maximal fre-
quent itemsets and maximal frequent patterns. Proc.
Int. Conf. Knowl. Discov. Data Mining, pp. 344–353.
Ye, Y., Wang, D., Li, T., Ye, D. (2007). IMDS: intelligent
malware detection system, Proceedings of the 13th
ACM SIGKDD international conference on Knowl-
edge discovery and data mining, pp. 1043–1047.
Za
¨
ıane, O. R., Antonie, M. L. (2002). Classifying Text Doc-
uments by Associating Terms With Text Categories,
Australasian Database Conference.
Zaki, M. J. (2000) Scalable algorithms for association min-
ing, IEEE TransKnowl Data Eng 12:372–390.
Zaki, M. J., Carothers, C. D., Szymanski B. K. (2010).
VOGUE: A Variable Order Hidden Markov Model
with Duration based on Frequent Sequence Mining.
ACM Transactions on Knowledge Discovery in Data,
4(1).
ClassificationofDatasetswithFrequentItemsetsisWild
389