HIERARCHICAL MODEL-BASED CLUSTERING FOR RELATIONAL DATA

Jianzhong Chen, Mary Shapcott, Sally McClean, Kenny Adamson

Abstract

Relational data mining deals with datasets containing multiple types of objects and relationships that are presented in relational formats, e.g. relational databases that have multiple tables. This paper proposes a propositional hierarchical model-based method for clustering relational data. We first define an object-relational star schema to model composite objects, and present a method of flattening composite objects into aggregate objects by introducing a new type of aggregates – frequency aggregate, which can be used to record not only the observed values but also the distribution of the values of an attribute. A hierarchical agglomerative clustering algorithm with log-likelihood distance is then applied to cluster the aggregated data tentatively. After stopping at a coarse estimate of the number of clusters, a mixture model-based method with the EM algorithm is developed to perform a further relocation clustering, in which Bayes Information Criterion is used to determine the optimal number of clusters. Finally we evaluate our approach on a real-world dataset.

References

  1. Connolly, T. M. and Begg, C. E. (2002). Database Systems: A Practical Approach to Design, Implementation, and Management. Harlow: Addison-Wesley, third edition. International computer science series.
  2. Dzeroski, S. and Lavrac, N. (2001). Relational Data Mining. Springe-Verlag, Berlin.
  3. Dzeroski, S. and Raedt, L. D. (2003). Multi-relational data mining: a workshop report. SIGKDD Explorations, 4(2):122-124.
  4. Emde, W. and Wettschereck, D. (1996). Relational instance-based learning. In Proc. ICML-96, pages 122-130, San Mateo, CA. Morgan Kaufmann.
  5. Eriksson, H.-E. and Penker, M. (1998). UML Toolkit. John Wiley and Sons, New York.
  6. Everitt, B. (1981). Cluster Analysis. Halsted Press: John Wiley and Sons, New York, second edition.
  7. Fraley, C. and Raftery, A. (1998). How many clusters? which clustering method? answers via model-based cluster analysis. The Computer Journal, 41(8):578- 588.
  8. Friedman, N., Getoor, L., Koller, D., and Pfeffer, A. (1999). Learning probabilistic relational models. In Proc. IJCAI-99, pages 1300-1307, Stockholm, Sweden. Morgan Kaufmann.
  9. Jain, A. K. and Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice-Hall.
  10. Meila, M. and Heckerman, D. (1998). An experimental comparison of several clustering and initialization methods. In Proc. UAI 98, pages 386-395, San Francisco, CA. Morgan Kaufmann.
  11. Taskar, B., Segal, E., and Koller, D. (2001). Probabilistic classi cation and clustering in relational data. In Nebel, B., editor, Proc. IJCAI-01, pages 870-878, Seattle, US.
Download


Paper Citation


in Harvard Style

Chen J., Shapcott M., McClean S. and Adamson K. (2004). HIERARCHICAL MODEL-BASED CLUSTERING FOR RELATIONAL DATA . In Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 972-8865-00-7, pages 92-97. DOI: 10.5220/0002624300920097


in Bibtex Style

@conference{iceis04,
author={Jianzhong Chen and Mary Shapcott and Sally McClean and Kenny Adamson},
title={HIERARCHICAL MODEL-BASED CLUSTERING FOR RELATIONAL DATA},
booktitle={Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2004},
pages={92-97},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002624300920097},
isbn={972-8865-00-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - HIERARCHICAL MODEL-BASED CLUSTERING FOR RELATIONAL DATA
SN - 972-8865-00-7
AU - Chen J.
AU - Shapcott M.
AU - McClean S.
AU - Adamson K.
PY - 2004
SP - 92
EP - 97
DO - 10.5220/0002624300920097