ROBUST CENTROID-BASED CLUSTERING USING DERIVATIVES OF PEARSON CORRELATION

Marc Strickert, Nese Sreenivasulu, Thomas Villmann, Barbara Hammer

Abstract

Modern high-throughput facilities provide the basis of -omics research by delivering extensive biomedical data sets. Mass spectra, multi-channel chromatograms, or cDNA arrays are such data sources of interest for which accurate analysis is desired. Centroid-based clustering provides helpful data abstraction by representing sets of similar data vectors by characteristic prototypes, placed in high-density regions of the data space. This way, specific modes can be detected, for example, in gene expression profiles or in lists containing protein and metabolite abundances. Despite their widespread use, k-means and self-organizing maps (SOM) often only produce suboptimum results in centroid computation: the final clusters are strongly dependent on the initialization and they do not quantize data as accurately as possible, particularly, if other than the Euclidean distance is chosen for data comparison. Neural gas (NG) is a mathematically rigorous clustering method that optimizes the centroid positions by minimizing their quantization errors. Originally formulated for Euclidean distance, in this work NG is mathematically generalized to give accurate and robust results for the Pearson correlation similarity measure. The benefits of the new NG for correlation (NG-C) are demonstrated for sets of gene expression data and mass spectra.

Download


Paper Citation


in Harvard Style

Strickert M., Sreenivasulu N., Villmann T. and Hammer B. (2008). ROBUST CENTROID-BASED CLUSTERING USING DERIVATIVES OF PEARSON CORRELATION . In Proceedings of the First International Conference on Bio-inspired Systems and Signal Processing - Volume 2: BIOSIGNALS, (BIOSTEC 2008) ISBN 978-989-8111-18-0, pages 197-203. DOI: 10.5220/0001062601970203


in Bibtex Style

@conference{biosignals08,
author={Marc Strickert and Nese Sreenivasulu and Thomas Villmann and Barbara Hammer},
title={ROBUST CENTROID-BASED CLUSTERING USING DERIVATIVES OF PEARSON CORRELATION},
booktitle={Proceedings of the First International Conference on Bio-inspired Systems and Signal Processing - Volume 2: BIOSIGNALS, (BIOSTEC 2008)},
year={2008},
pages={197-203},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001062601970203},
isbn={978-989-8111-18-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the First International Conference on Bio-inspired Systems and Signal Processing - Volume 2: BIOSIGNALS, (BIOSTEC 2008)
TI - ROBUST CENTROID-BASED CLUSTERING USING DERIVATIVES OF PEARSON CORRELATION
SN - 978-989-8111-18-0
AU - Strickert M.
AU - Sreenivasulu N.
AU - Villmann T.
AU - Hammer B.
PY - 2008
SP - 197
EP - 203
DO - 10.5220/0001062601970203