ON AMBIGUITY DETECTION AND POSTPROCESSING SCHEMES USING CLUSTER ENSEMBLES

Amparo Albalate, Aparna Suchindranath, Mehmet Muti Soenmez, David Suendermann

Abstract

In this paper, we explore the cluster ensemble problem and propose a novel scheme to identify uncertain/ambiguous regions in the data based on the different clusterings in the ensemble. In addition, we analyse two approaches to deal with the detected uncertainty. The first, simplest method, is to ignore ambiguous patterns prior to the ensemble consensus function, thus preserving the non-ambiguous data as good ``prototypes'' for any further modelling. The second alternative is to use the ensemble solution obtained by the first method to train a supervised model (support vector machines), which is later applied to reallocate, or ``recluster'' the ambiguous patterns. A comparative analysis of the different ensemble solutions and the base weak clusterings has been conducted on five data sets: two artificial mixtures of five and seven Gaussian, and three real data sets from the UCI machine learning repository. Experimental results have shown in general a better performance of our proposed schemes compared to the standard ensembles.

References

  1. Boley, D., Gini, M., Gross, R., Han, E.-H., Karypis, G., Kumar, V., Mobasher, B., Moore, J., and Hastings, K. (1999). Partitioning-based clustering for web document categorization. Decis. Support Syst., 27(3):329- 341.
  2. Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121-167.
  3. Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. John Wiley & sons.
  4. Fern, X. Z. and Lin, W. (2008). Cluster ensemble selection. In Proceedings of the SIAM International Conference on Data Mining, pages 787-797.
  5. Joachims, T. (1998). Text categorization with support vector machines: learning with many relevant features. In Proceedings of ECML-98, 10th European Conference on Machine Learning, number 1398, pages 137-142.
  6. Kuncheva, L. I. (2004). Classifier ensembles for changing environments. In Multiple Classifier Systems, pages 1-15. Springer.
  7. Lin, Y.-M., Wang, X., Ng, W., Chang, Q., Yeung, D., and Wang, X.-L. (2006). Sphere classification for ambiguous data. In Proceedings of International Conference on Machine Learning and Cybernetics, pages 2571- 2574.
  8. Schapire, R. E. (2002). The boosting approach to machine learning: An overview. In Proceedings of the 2002 MSRI Workshop on Nonlinear Estimation and Classification, pages 149-173. Springer.
  9. Strehl, A., Ghosh, J., and Cardie, C. (2002). Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3:583-617.
Download


Paper Citation


in Harvard Style

Albalate A., Suchindranath A., Muti Soenmez M. and Suendermann D. (2010). ON AMBIGUITY DETECTION AND POSTPROCESSING SCHEMES USING CLUSTER ENSEMBLES . In Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-674-021-4, pages 623-630. DOI: 10.5220/0002734706230630


in Bibtex Style

@conference{icaart10,
author={Amparo Albalate and Aparna Suchindranath and Mehmet Muti Soenmez and David Suendermann},
title={ON AMBIGUITY DETECTION AND POSTPROCESSING SCHEMES USING CLUSTER ENSEMBLES},
booktitle={Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2010},
pages={623-630},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002734706230630},
isbn={978-989-674-021-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - ON AMBIGUITY DETECTION AND POSTPROCESSING SCHEMES USING CLUSTER ENSEMBLES
SN - 978-989-674-021-4
AU - Albalate A.
AU - Suchindranath A.
AU - Muti Soenmez M.
AU - Suendermann D.
PY - 2010
SP - 623
EP - 630
DO - 10.5220/0002734706230630