SIHC: A STABLE INCREMENTAL HIERARCHICAL CLUSTERING ALGORITHM

Ibai Gurrutxaga, Olatz Arbelaitz, José I. Martín, Javier Muguerza, Jesús M. Pérez, Iñigo Perona

Abstract

SAHN is a widely used agglomerative hierarchical clustering method. Nevertheless it is not an incremental algorithm and therefore it is not suitable for many real application areas where all data is not available at the beginning of the process. Some authors proposed incremental variants of SAHN. Their goal was to obtain the same results in incremental environments. This approach is not practical since frequently must rebuild the hierarchy, or a big part of it, and often leads to completely different structures. We propose a novel algorithm, called SIHC, that updates SAHN hierarchies with minor changes in the previous structures. This property makes it suitable for real environments. Results on 11 synthetic and 6 real datasets show that SIHC builds high quality clustering hierarchies. This quality level is similar and sometimes better than SAHN's. Moreover, the computational complexity of SIHC is lower than SAHN's.

References

  1. El-Sonbaty, Y. and Ismail, M. (1998). On-line hierarchichal clustering. Pattern Recognition Letters, 19:1285- 1291.
  2. Fisher, D. H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine learning, 2:139- 172.
  3. Jain, A. K. and Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.
  4. Mirkin, B. (2005). Clustering for Data Mining: A Data Recovery Approach. Chapman & Hall/CRC.
  5. Podani, J. (2000). Simulation of random dendrograms and comparison tests: Some comments. Journal of Classification, 17:123-142.
  6. Ribert, A., Ennaji, A., and Lecourtier, Y. (1999). An incremental hierarchical clustering. In Vision Interface 7899, pages 586-591, Trois-Rivières, Canada.
  7. Sneath, P. H. A. and Sokal, R. R. (1973). Numerical Taxonomy. Books in biology. W. H. Freeman and Company.
Download


Paper Citation


in Harvard Style

Gurrutxaga I., Arbelaitz O., Martín J., Muguerza J., Pérez J. and Perona I. (2009). SIHC: A STABLE INCREMENTAL HIERARCHICAL CLUSTERING ALGORITHM . In Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-8111-85-2, pages 300-304. DOI: 10.5220/0001857103000304


in Bibtex Style

@conference{iceis09,
author={Ibai Gurrutxaga and Olatz Arbelaitz and José I. Martín and Javier Muguerza and Jesús M. Pérez and Iñigo Perona},
title={SIHC: A STABLE INCREMENTAL HIERARCHICAL CLUSTERING ALGORITHM},
booktitle={Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2009},
pages={300-304},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001857103000304},
isbn={978-989-8111-85-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - SIHC: A STABLE INCREMENTAL HIERARCHICAL CLUSTERING ALGORITHM
SN - 978-989-8111-85-2
AU - Gurrutxaga I.
AU - Arbelaitz O.
AU - Martín J.
AU - Muguerza J.
AU - Pérez J.
AU - Perona I.
PY - 2009
SP - 300
EP - 304
DO - 10.5220/0001857103000304