# Assessing the Number of Clusters in a Mixture Model with Side-information

### Edith Grall-Maes, Duc Tung Dao

#### Abstract

This paper deals with the selection of cluster number in a clustering problem taking into account the sideinformation that some points of a chunklet arise from a same cluster. An Expectation-Maximization algorithm is used to estimate the parameters of a mixture model and determine the data partition. To select the number of clusters, usual criteria are not suitable because they do not consider the side-information in the data. Thus we propose suitable criteria which are modified version of three usual criteria, the bayesian information criterion (BIC), the Akaike information criterion (AIC), and the entropy criterion (NEC). The proposed criteria are used to select the number of clusters in the case of two simulated problems and one real problem. Their performances are compared and the influence of the chunklet size is discussed.

#### References

- Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6):716-723.
- Biernacki, C., Celeux, G., and Govaert, G. (1999). An improvement of the nec criterion for assessing the number of clusters in a mixture model. Pattern Recognition Letters, 20(3):267-272.
- Celeux, G. and Govaert, G. (1992). A classification EM algorithm for clustering and two stochastic versions. Computational statistics & Data analysis, 14(3):315- 332.
- Celeux, G. and Govaert, G. (1995). Gaussian parcimonious clustering models. Pattern Recognition, 28:781-793.
- Celeux, G. and Soromenho, G. (1996). An entropy criterion for assessing the number of clusters in a mixture model. Journal of classification , 13(2):195-212.
- Fonseca, J. R. and Cardoso, M. G. (2007). Mixture-model cluster analysis using information theoretical criteria. Intelligent Data Analysis, 11(1):155-173.
- Grall-Maës, E. (2014). Spatial stochastic process clustering using a local a posteriori probability. In Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2014), Reims, France.
- Lebarbier, E. and Mary-Huard, T. (2006). Le critère BIC: fondements théoriques et interprétation. Research report, INRIA.
- McLachlan, G. and Basford, K. (1988). Mixture models. inference and applications to clustering. Statistics: Textbooks and Monographs, New York: Dekker, 1988, 1.
- Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, pages 461-464.
- Shental, N., Bar-Hillel, A., Hertz, T., and Weinshall, D. (2003). Computing gaussian mixture models with EM using side-information. In Proc. of the 20th International Conference on Machine Learning. Citeseer.
- Van Noortwijk, J. (2009). A survey of the application of gamma processes in maintenance. Reliability Engineering & System Safety, 94(1):2-21.

#### Paper Citation

#### in Harvard Style

Grall-Maes E. and Dao D. (2016). **Assessing the Number of Clusters in a Mixture Model with Side-information** . In *Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,* ISBN 978-989-758-173-1, pages 41-47. DOI: 10.5220/0005682000410047

#### in Bibtex Style

@conference{icpram16,

author={Edith Grall-Maes and Duc Tung Dao},

title={Assessing the Number of Clusters in a Mixture Model with Side-information},

booktitle={Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},

year={2016},

pages={41-47},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0005682000410047},

isbn={978-989-758-173-1},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,

TI - Assessing the Number of Clusters in a Mixture Model with Side-information

SN - 978-989-758-173-1

AU - Grall-Maes E.

AU - Dao D.

PY - 2016

SP - 41

EP - 47

DO - 10.5220/0005682000410047