for having an sufficiently exhaustive search the com-
plexity goes with the number of tested models, and
the model parameters.
The second ones start with a fix set of models
and sequentially adjust their configuration (including
the number of components) based on different evalu-
ation criteria. Figueiredo and Jain proposed a method
that starts with a high number of mixture parame-
ters, merging them step by step until convergence
(Figueiredo and Jain, 2002). This method can be ap-
plied to any parametric mixture where the EM algo-
rithm can be used. Pernkopf and Bouchaffra proposed
a Genetic-Based EM Algorithm capable of learning
Gaussian mixture models (Pernkopf and Bouchaffra,
2005). They first selected the number of components
by means of the minimum description length (MDL)
criterion. A combination of genetic algorithms with
the EM has been explored.
Ueda et Al. proposed a split-and-merge EM al-
gorithm to alleviate the problem of local conver-
gence of the EM method (Ueda et al., 2000). Sub-
sequently, Zhang et Al. introduced another split-and-
merge technique (Zhang et al., 2003). The split-and-
merge equations show that the merge operation is a
well-posed problem, whereas the split operation is
ill-posed. Two methods for solving this problem are
developed through singular value decomposition and
Cholesky decomposition and then a new modified EM
algorithm is constructed. They demonstrated the va-
lidity of split-and-merge approach in model selection,
in terms of convergence properties. Moreover, the
merge an split criterion is efficient in reducing number
of model hypothesis, and it is often more efficient than
exhaustive, random or genetic algorithm approaches.
1.2 Our Contribution
In this paper, we propose an algorithm for comput-
ing the number of components as well as the param-
eters of the mixture model. Similarly to other split-
and-merge methods, our technique uses a local pa-
rameter search, that reuses the information acquired
on previous steps, being suitable to problems with
slowly changing distributions or to adapt the parame-
ters when new samples are added or removed. The
algorithm starts with a fixed number of Gaussians,
and automatically decides whether increasing or re-
ducing it. The key feature of our technique is the de-
cision of when add or merge a Gaussian. In order to
accomplish this at best we introduce a new concept,
the dissimilarity index between two Gaussian distri-
butions. Moreover, in order to evade local optimal
solutions we make use of self-adaptative thresholds
for deciding when Gaussians are split or merged. Our
algorithm starts with high thresholds levels, prevent-
ing large changes in the number of Gaussian compo-
nents at the beginning. It also starts with a low ini-
tial number of Gaussians, which can be increased dur-
ing computation, if necessary. The time evolution of
the threshold values allows periods of stability in the
number of components so that they can freely adapt
to the input data. After this period of stability these
thresholds become more sensitive, promoting the es-
cape from local optimum solutions by perturbing the
system configuration when necessary, until a stopping
criterion has reached. This makes our algorithm re-
sults less sensitive to initialization. The algorithm is
presented for Gaussian mixture models.
1.3 Outline
The paper is organized as follows. In sec. 2 we de-
scribe the notation and formulate the classical Expec-
tation Maximization algorithm. In sec. 3 we intro-
duce the proposed algorithm. Specifically, we de-
scribe the insertion of a new Gaussian in sec. 3.3,
its merging in sec. 3.4, the initializations in sec. 3.2,
and the decision thresholds update rules in sec. 3.5.
Furthermore, in sec. 4 we describe our experimen-
tal set-up for testing the validity of our new technique
and the results. Finally, in sec. 5 we conclude and
propose directions for future work.
2 EXPECTATION
MAXIMIZATION ALGORITHM
2.1 EM Algorithm: The Original
Formulation
A common usage of the EM algorithm is to iden-
tify the ”incomplete, or unobserved data” ¯y =
( ¯y
1
, ¯y
2
,... , ¯y
k
) given the couple ( ¯x, ¯y) - with ¯x defined
as ¯x = { ¯x
1
, ¯x
2
,. .. , ¯x
k
}, also called ”complete data”,
which has a probability density (or joint distribution)
p( ¯x, ¯y|
¯
ϑ) = p
¯
ϑ
( ¯x, ¯y) depending on the parameter
¯
ϑ.
We define E
0
(·) the expected value of a random vari-
able, computed with respect to the density p
¯
ϑ
( ¯x, ¯y).
We define Q(
¯
ϑ
(n)
,
¯
ϑ
(n−1)
) = E
0
L(
¯
ϑ), with L(
¯
ϑ)
being the log-likelihood of the observed data:
L(
¯
ϑ) = log p
¯
ϑ
( ¯x, ¯y)
(1)
The EM procedure repeats the two following steps
until convergence, iteratively:
A PRACTICAL METHOD FOR SELF-ADAPTING GAUSSIAN EXPECTATION MAXIMIZATION
37