
The process stops when an appropriate model selec-
tion criteria is optimized.
1.1 Related Work
The selection of the mixture complexity is essential in
order to prevent overfit, finding the best compromise
between the accuracy of the data description and the
computational burden. There are different strategies
for determining the number of components in a mix-
ture. Split-based algorithms usually start with a sin-
gle component, and then increase their number during
the computation, by splitting existing components in
two new components at each stage. Since splitting a
component is an ill-posed problem, several methods
have been proposed in the literature. It has not yet
been found a theoretical way to assess the quality of a
particular algorithm so most works to assess it empir-
ically, using numerical simulations to measure preci-
sion and computational efficiency. Greedy strategies
have been proved mathematically to be effective in
learning a mixture density by maximum likelihood,
i.e. by incrementally adding components to the mix-
ture up to a certain number of components k.
In 2000, Li and Barron demonstrated that, in
case of mixture density estimation, a k-component
mixture learnt via maximum likelihood estimation
- or by an iterative likelihood algorithm - achieves
log-likelihood within order 1/k of the log-likelihood
achievable by any convex combination Li and Barron
(2000). However, the big drawback in these kind of
algorithms is the imprecision of the split criterion.
In 1999, Vlassis and Likas proposed an algo-
rithm that employs splitting operations for mono-
dimensional Gaussian mixtures, based on the evalu-
ation of the fourth order moment (Kurtosis) Vlassis
and Likas (1999). They assumed that if a component
has a Kurtosis different to that of a regular Gaussian,
then this subset of points may be better described by
more than a single component. Their splitting rule as-
signs half of the old component’s prior to the two new
one, and the same variance as the old one, while dis-
tancing the two means by one standard deviation with
respect to the old mean.
In 2002 Vlassis and Likas introduced a greedy al-
gorithm for learning Gaussian mixtures Vlassis and
Likas (2002). It starts with a single component cover-
ing all the data. However, their approach suffers from
being sensitive to a few parameters that have to be
fine-tuned. The authors propose a technique for opti-
mizing them. Nevertheless, the latter brings the total
complexity for the global search of the element to be
split O(n
2
), being n the number of input data points.
Subsequently, following this approach, Verbeek et
al. developed a greedy method to learn the mixture
model Verbeek et al. (2003) where new components
are added iteratively, and the EM is applied until it
reaches the convergence. The global search for the
optimal new component is achieved by starting ’par-
tial’ EM searches, each of them with different initial-
izations. Their approach is based on describing the
mixture by means of a parametrized equation, whose
parameters are locally optimized (rather than globally,
for saving computation resources. The real advantage
with respect to the work in Vlassis and Likas (2002)
is that the computational burden is reduced.
Considering the techniques that both increase and
reduce the mixture complexity, there are different
approaches in literature. In particular, Richardson
and Green used split-and-merge operations together
with birth and death operations to develop a re-
versible jump method and constructed a reversible
jump Markov chain Monte Carlo (RJMCMC) algo-
rithm for fully Bayesian analysis of univariate gaus-
sians mixtures Richardson and Green (1997). The
novel RJMCMC methodology elaborated by Green is
attractive because it can preferably deal with param-
eter estimation and model selection jointly in a sin-
gle paradigm. However, the experimental results re-
ported in Richardson and Green (1997) indicate that
such sampling methods are rather slow as compared
to maximum likelihood algorithms.
Ueda et al. proposed a split-and-merge EM al-
gorithm (SMEM) to alleviate the fact that EM con-
vergence is local and not global Ueda et al. (2000).
They defined the merge of two components as a lin-
ear combination of them, in terms of their parameters
(priors, means and covariance matrices), with the pri-
ors as weights. Therefore, the splitting operation is
the inverse, where there is the need for finding the op-
timal weights. Their splitting operations are based on
the component-to-split’s covariance matrix decompo-
sition (they proposed both a method based on the
SVD and the other on the Cholesky decomposition).
Zhang et al. introduced another split-and-merge
technique, based on that of Ueda et al. Zhang et al.
(2003). As a split criterion they define a local Kull-
back divergence as the distance between two distribu-
tions: the local data density around the model with
k components (k
th
model) and the density of the k
th
model specified by the current parameter estimate.
The local data density is defined as a modified em-
pirical distribution weighted by the posterior proba-
bility so that the data around the k
th
model is focused
on. They employ the technique of the Ueda’s SMEM
algorithm Ueda et al. (2000), modifying the part that
performs the partial EM step to reestimate the param-
eters of components after the split and merge opera-
ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods
310