those of the author(s) and do not necessarily reflect
the views of the funding agencies.
REFERENCES
Agrawal, R. and Shafer, J. C. (1996). Parallel mining of
association rules. IEEE Transactions on Knowledge
& Data Engineering, (6):962–969.
Agrawal, R., Srikant, R., et al. (1994). Fast algorithms for
mining association rules. In Proc. 20th int. conf. very
large data bases, VLDB, volume 1215, pages 487–
499.
Ahmed, C. F., Tanbeer, S. K., Jeong, B.-S., and Lee, Y.-
K. (2009). Efficient tree structures for high util-
ity pattern mining in incremental databases. Knowl-
edge and Data Engineering, IEEE Transactions on,
21(12):1708–1721.
Alves, R., Rodriguez-Baena, D. S., and Aguilar-Ruiz, J. S.
(2009). Gene association analysis: a survey of
frequent pattern mining from gene expression data.
Briefings in Bioinformatics, page bbp042.
Andreopoulos, B., An, A., Wang, X., and Schroeder, M.
(2009). A roadmap of clustering algorithms: finding a
match for a biomedical application. Briefings in Bioin-
formatics, 10(3):297–314.
BMSWebView1 (2016). Smpf: An open-source
data mining library, accessed: 2016-06-14.
http://www.philippe-fournier-viger.com/spmf/index.
php?link=datasets.php.
Brijs, T., Swinnen, G., Vanhoof, K., and Wets, G. (1999).
Using association rules for product assortment deci-
sions: A case study. In Knowledge Discovery and
Data Mining, pages 254–260.
Brin, S., Motwani, R., Ullman, J. D., and Tsur, S. (1997).
Dynamic itemset counting and implication rules for
market basket data. In ACM SIGMOD Record, vol-
ume 26, pages 255–264. ACM.
Chan, R. C., Yang, Q., and Shen, Y.-D. (2003). Mining
high utility itemsets. In Data Mining, 2003. ICDM
2003. Third IEEE International Conference on, pages
19–26. IEEE.
Chen, K. and Liu, L. (2005). The” best k” for entropy-based
categorical data clustering.
Guha, S., Rastogi, R., and Shim, K. (1999). Rock: A robust
clustering algorithm for categorical attributes. In Data
Engineering, 1999. Proceedings., 15th International
Conference on, pages 512–521. IEEE.
Han, J., Pei, J., and Yin, Y. (2000). Mining frequent pat-
terns without candidate generation. In ACM Sigmod
Record, volume 29, pages 1–12. ACM.
Huang, Z. (1998). Extensions to the k-means algorithm for
clustering large data sets with categorical values. Data
mining and knowledge discovery, 2(3):283–304.
Lakhawat, P., Mishra, M., and Somani, A. K. (2016). A
novel clustering algorithm to capture utility informa-
tion in transactional data. In KDIR, pages 456–462.
Li, H.-F., Huang, H.-Y., Chen, Y.-C., Liu, Y.-J., and Lee, S.-
Y. (2008). Fast and memory efficient mining of high
utility itemsets in data streams. In Data Mining, 2008.
ICDM’08. Eighth IEEE International Conference on,
pages 881–886. IEEE.
Liao, S.-H., Chu, P.-H., and Hsiao, P.-Y. (2012). Data
mining techniques and applications–a decade review
from 2000 to 2011. Expert Systems with Applications,
39(12):11303–11311.
Liu, Y., Liao, W.-k., and Choudhary, A. (2005). A fast high
utility itemsets mining algorithm. In Proceedings of
the 1st international workshop on Utility-based data
mining, pages 90–99. ACM.
Naulaerts, S., Meysman, P., Bittremieux, W., Vu, T. N.,
Berghe, W. V., Goethals, B., and Laukens, K. (2015).
A primer to frequent itemset mining for bioinformat-
ics. Briefings in bioinformatics, 16(2):216–231.
Ngai, E. W., Xiu, L., and Chau, D. C. (2009). Application of
data mining techniques in customer relationship man-
agement: A literature review and classification. Ex-
pert systems with applications, 36(2):2592–2602.
RetailDataset (2016). Frequent itemset mining
dataset repository, accessed: 2016-06-14.
http://fimi.ua.ac.be/data/.
Toivonen, H. et al. (1996). Sampling large databases for
association rules. In VLDB, volume 96, pages 134–
145.
Tseng, V. S., Wu, C.-W., Fournier-Viger, P., and Yu, P. S.
(2015). Efficient algorithms for mining the concise
and lossless representation of high utility itemsets.
Knowledge and Data Engineering, IEEE Transactions
on, 27(3):726–739.
Tseng, V. S., Wu, C.-W., Shie, B.-E., and Yu, P. S. (2010).
Up-growth: an efficient algorithm for high utility
itemset mining. In Proceedings of the 16th ACM
SIGKDD international conference on Knowledge dis-
covery and data mining, pages 253–262. ACM.
Yan, H., Chen, K., Liu, L., and Yi, Z. (2010). Scale: a
scalable framework for efficiently clustering transac-
tional data. Data mining and knowledge Discovery,
20(1):1–27.
Zaki, M. J. (2000). Scalable algorithms for association min-
ing. Knowledge and Data Engineering, IEEE Trans-
actions on, 12(3):372–390.
APPENDIX
C is the set of all given clusters. A cluster C
k
∈ C is
essentially a subset of transactions from D.
C
k
= {T
1
, T
2
. . . T
k
|T
i
∈ D} (21)
I
C
k
= {a
i
|a
i
∈ T
j
∧ T
j
∈ C
k
} = item types in C
k
(22)
Cluster utility (CU), relative utility (ru) of a category
type in a cluster and the a f f inity between clusters
have the following definitions:
CU(C
k
) =
∑
T
j
∈C
k
TU(T
j
) = Cluster utility of C
k
(23)