Bayesian Mixture Estimation without Tears
ˇ
S
´
arka Jozov
´
a
1,2 a
, Ev
ˇ
zenie Uglickich
2 b
and Ivan Nagy
2 c
1
Faculty of Transportation Sciences, Czech Technical University, Na Florenci 25, 11000 Prague, Czech Republic
2
Department of Signal Processing, The Czech Academy of Sciences, Institute of Information Theory and Automation,
Pod vod
´
arenskou v
ˇ
e
ˇ
z
´
ı 4, 18208 Prague, Czech Republic
Keywords:
Data Analysis, Clustering, Classification, Mixture Model, Estimation, Prior Knowledge.
Abstract:
This paper aims at presenting the on-line non-iterative form of Bayesian mixture estimation. The model used
is composed of a set of sub-models (components) and an estimated pointer variable that currently indicates
the active component. The estimation is built on an approximated Bayes rule using weighted measured data.
The weights are derived from the so called proximity of measured data entries to individual components. The
basis for the generation of the weights are integrated likelihood functions with the inserted point estimates of
the component parameters. One of the main advantages of the presented data analysis method is a possibility
of a simple incorporation of the available prior knowledge. Simple examples with a programming code as
well as results of experiments with real data are demonstrated. The main goal of this paper is to provide
clear description of the Bayesian estimation method based on the approximated likelihood functions, called
proximities.
1 INTRODUCTION
Modeling is an important part of data analysis. It can
be said that there are two main directions the data
analysis aims at. The first one looks for the on-line
prediction of data based on the already measured his-
torical ones. Usually, the output variable is to be
predicted depending on its older values and other ex-
planatory variables which can be currently measured.
A dynamic model, e.g., of a regression type, must be
constructed and mostly also estimated in an on-line
way. Here, the task is to determine the value of the
output in a future time instant.
The second data analysis direction is interested in
working modes of a system rather than in the values of
the data themselves. In this direction, classes of sim-
ilar data are constructed and the newly coming data
records are classified to them, i.e., a class to which
the data record belongs is estimated. The question
here can be, for example, what severity of a traffic ac-
cident we can expect if the surrounding circumstances
are like those just measured.
There are well known methods which can do these
tasks. The most famous methods for clustering are
a
https://orcid.org/0000-0001-5065-633X
b
https://orcid.org/0000-0003-1764-5924
c
https://orcid.org/0000-0002-7847-1932
e.g., k-means and its variants (Jin and Han, 2011;
Kanungo et al., 2002; Likas et al., 2003), fuzzy
clustering (De Oliveira and Pedrycz, 2007; Panda
et al., 2012), DBSCAN (Kumar and Reddy, 2016)
and hierarchical clustering (Nielsen, 2016; Ward Jr,
1963). For classification, one can use e.g., neural
networks, decision trees, logistic regression (Mai-
mon and Rokach, 2005; Kaufman and Rousseeuw,
1990) or genetic algorithms (Pernkopf and Bouchaf-
fra, 2005). However, all the mentioned tasks can
be also solved using estimation of a mixture model.
Its iterative version called the EM algorithm (Bilmes,
1998) is also well known.
In this area, methods of mixture estimation based
on the Bayesian principles play an important role.
One of them, called Quasi-Bayes, has been devel-
oped in (K
´
arn
´
y et al., 1998) followed by (K
´
arn
´
y et al.,
2006). Following this research, several other methods
have been suggested, mostly for different models of
the components exploited (Nagy et al., 2011; Nagy
and Suzdaleva, 2013; Nagy et al., 2017; Suzdaleva
et al., 2017; Suzdaleva and Nagy, 2019), etc. They
bring a considerable simplification of the estimating
algorithm. The core of the last of them is the use of
the proximity of the measured data record to a distri-
bution (the model of a specific working point of the
analyzed system).
Jozová, Š., Uglickich, E. and Nagy, I.
Bayesian Mixture Estimation without Tears.
DOI: 10.5220/0010508706410648
In Proceedings of the 18th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2021), pages 641-648
ISBN: 978-989-758-522-7
Copyright
c
2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
641