Authors:
Jianzhong Chen
;
Mary Shapcott
;
Sally McClean
and
Kenny Adamson
Affiliation:
School of Computing and Mathematics, Faculty of Engineering, University of Ulster, United Kingdom
Keyword(s):
Hierarchical model-based clustering, relational data, frequency aggregates, EM algorithm.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Biomedical Engineering
;
Business Analytics
;
Data Engineering
;
Data Mining
;
Databases and Information Systems Integration
;
Datamining
;
Enterprise Information Systems
;
Health Information Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
Relational data mining deals with datasets containing multiple types of objects and relationships that are presented in relational formats, e.g. relational databases that have multiple tables. This paper proposes a propositional hierarchical model-based method for clustering relational data. We first define an object-relational star schema to model composite objects, and present a method of flattening composite objects into aggregate objects by introducing a new type of aggregates – frequency aggregate, which can be used to record not only the observed values but also the distribution of the values of an attribute. A hierarchical agglomerative clustering algorithm with log-likelihood distance is then applied to cluster the aggregated data tentatively. After stopping at a coarse estimate of the number of clusters, a mixture model-based method with the EM algorithm is developed to perform a further relocation clustering, in which Bayes Information Criterion is used to determine the opt
imal number of clusters. Finally we evaluate our approach on a real-world dataset.
(More)