Recent Advances in Statistical Mixture Models: Challenges and

Applications

Sami Bourouis

Department of Information Technology, College of Computers and Information Technology,

Keywords:

Mixture Model, Finite And Inﬁnite Models, Learning Techniques, Applications.

Abstract:

This paper discusses current advances in mixture models, as well as modern approaches and tools that make

use of mixture models. In particular, the contribution of mixture-based modeling in various area of researches

is discussed. It exposes many challenging issues, especially the way of selecting the optimal model, estimating

the parameters of each component, and so on. Some of newly emerging mixture model-based methods that can

be applied successfully are also cited. Moreover, an overview of latest developments as well as open problems

and potential research directions are discussed. This study aims to demonstrate that mixture models may be

consistently proposed as a powerful tool for carrying out a variety of difﬁcult real-life tasks. This survey can

be the starting point for beginners as it allows them to better understand the current state of knowledge and

assists them to develop and evaluate their own frameworks.

1 INTRODUCTION

Statistical machine learning (SML) has made great

progress in recent years on supervised and unsuper-

vised learning tasks including clustering, classiﬁca-

tion, and pattern identiﬁcation of large multidimen-

sional data. Data scientists may examine the con-

nections and patterns across datasets with the use of

statistical models, which are mathematical represen-

tations of observable data. It gives them a strong

foundation on which to predict data for the near fu-

ture. Additionally, analysts can approach data analy-

sis systematically by applying statistical modeling to

original data, which results in logical representations

that make it easier to ﬁnd links between variables and

make predictions. By estimating the attributes of huge

populations based on existing data, statistical models

aid in understanding the characteristics of known data

(Bouveyron and Girard, 2009; Fu et al., 2021). It is

the main principle behind machine learning. For them

to be effective, statistical models must have the capac-

ity to handle data distribution appropriately. Pattern

recognition, computer vision, and knowledge discov-

ery are just a few of the research ﬁelds where SML

has been used. Over the past few decades, there has

been a lot of cutting-edge research on SML.

Mixture models (MM), one of the various unsu-

pervised learning techniques already in use, are re-

https://orcid.org/0000-0002-6638-7039

ceiving more and more attention because they are ef-

fective at modeling heterogeneous data (Alroobaea

et al., 2020). MM are widely used for modeling un-

known distributions and also for unsupervised clus-

tering tasks. In order to partition multimodal data and

determine the membership of observations with am-

biguous cluster labels, well-principled mixture mod-

els can effectively deployed to achieve this objec-

tive(Lai et al., 2018). Many distributions have been

studied in the past to model multimodal data like

Gaussian, Gamma, inverted Beta, Dirichlet, Liou-

ville, von Mises, and many others. Nevertheless,

some of model-based distributions such as Laplace

or Gaussian entail making a strict hypothesis about

the shape of components, which might result in poor

performance. On the other hand, ﬁnding out the ex-

act number of components might be difﬁcult. When

attempting to describe complex real-world problems,

solving such problem can help and prevent issues with

over- and under-ﬁtting. As a consequence, more ﬂex-

ible mixtures have been developed to get around these

restrictions and offering an accurate approximation to

data that contains outliers. For example, some stud-

ies tried to develop inﬁnite mixture models in order

to tackle the limitations of ﬁnite instances. In reality,

incorporating an inﬁnite number of components may

enhance the statistical model’s performance. More-

over, various learning techniques (non-deterministic

and deterministic inference methods) were imple-

312

Bourouis, S.

Recent Advances in Statistical Mixture Models: Challenges and Applications.

DOI: 10.5220/0011660900003411

In Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2023), pages 312-319

ISBN: 978-989-758-626-2; ISSN: 2184-4313

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

mented and used to infer mixture model’s parame-

ters and so to make accurate prediction. The cur-

rent article’s objective is to give a short overview of

the methodological advancements that support mix-

ture model implementations. Many academics are

interested in MM-based frameworks, and they have

discovered a variety of fascinating applications for

them. As a result, the literature on mixture models

has greatly increased, and the bibliography included

here can only give a limited amount of coverage. The

structure of this article is designed to give a quick un-

derstanding of mixture models. In the next section, a

taxonomy related to this area of research is provided.

Then, Section 3 provides a summary of the principal

methods now in use that employ mixture models. A

concise and comprehensive explanation of main chal-

lenges is also provided, along with some suggestions

for the future. Finally, this article is concluded.

2 TAXONOMY

The next paragraphs present a basic and understand-

able taxonomy related to mixture models.

2.1 Statistical Modelling

Making accurate decisions is now probabilistically

possible using statistical modeling. The goal is to

build a model that might logically explain the data.

Mixture models (MM) are well-founded probabilis-

tic models with the beneﬁt of using several distribu-

tions to characterize their component elements. Mix-

ture models offer an easy-to-use yet formal statis-

tical framework for especially grouping and clas-

siﬁcation by using well-known probability distribu-

tions (such as Gaussian, Poisson, Gamma, and bino-

mial). Therefore, we can evaluate the likelihood of

belonging to a cluster and draw conclusions about the

sub-populations, unlike conventional clustering algo-

rithms. MM can successfully express multidimen-

sional distributions and heterogeneous data in a ﬁnite

(or inﬁnite) number of classes, which makes them

useful for modeling visual features (McLachlan and

Peel, 2004). The core of mixture modeling is se-

lecting the appropriate probability density functions

(PDFs) of each component in the mixture of distribu-

tion. An example of clustering task that divides the

input data set into three groups is shown in Fig.1. It

is noted that a mixture model learns from the input

data during the learning phase and then this model

is evaluated using different set of data (testing phase).

The obtained model will be applicable for further con-

cerns including classiﬁcation, grouping, and predic-

tion especially if it can produce results with high pre-

cision. Several scientiﬁc researchers are interested in

developing efﬁcient unsupervised statistical learning

approaches to address various data mining and ML

problems.

Figure 1: Example of density modelled by three Gaussian

probability functions for data clustering (or classiﬁcation).

2.2 Finite Mixture Models

Mixture models are intended to combine two or more

distributions to produce a distribution with a more

ﬂexible shape than a single distribution. In a mixture

model, observations are produced through a combina-

tion of many unique models, which is an example of a

hidden model. A convex combination of two or more

ﬁnite density functions is known as a ﬁnite mixture

model (FMM). FMM is used to estimate the likeli-

hood of being a member of each cluster, to calculate

the parameters of each component, to group data into

different classes, and to make inference. They offer

a powerful framework for identifying latent patterns

and studying ambiguous data. A general formulation

of FMM is given as follow. Let y the observed dataset

and that each instance is taken from one of K com-

ponents. In a statistical setting, a FMM might be ex-

pressed as follow:

p(y|Θ) =

∑

k=1

p(y|θ

) (1)

where Θ is the full parameters of the mixture model;

p(y|θ

) is a probability density function (pdf) of

the cluster k, and π

represent the relative mixture

weights (different proportions).

2.3 Inﬁnite Mixture Models

Although ﬁnite MM are useful, determining the ap-

propriate number of components for a given dataset

and then estimating the parameters of that mixture are

the two most difﬁcult issues that must occasionally be

solved (Brooks, 2001). In many circumstances, we

claim that we have an inﬁnite mixture if the number

Recent Advances in Statistical Mixture Models: Challenges and Applications

313

of components is equal to (or more than) the num-

ber of observations. In such cases, it is better to let

the model adjust its complexity to the volume of data

to prevent both underﬁtting and overﬁtting. Inﬁnite

models are a kind of non-parametric models. One of

the most attractive methods for converting the ﬁnite

mixture model into its inﬁnite counterpart is to use

Dirichlet process mixture model (Fan and Bouguila,

2020; Bourouis and Bouguila, 2021). When new data

is received, inﬁnite models may either add new groups

or eliminate some of the current ones. By taking into

account K → ∞, we may overcome the problem of

calculating K. As a result, the inﬁnite mixture can be

expressed as

p(y|Θ) =

∞

∑

k=1

p(y|θ

) (2)

2.4 Model Learning

The development of parameter learning techniques

has historically attracted a lot of attention. Estimating

parameters of the mixture from data may be done in a

number of ways, and this issue is still under investiga-

tion. There are several approaches known as estima-

tors that are used for model’s parameters estimation,

including the least square method (LSM), maximum

likelihood estimation (MLE), and so on. Mainly, there

exist two kind of approaches: frequentist (known

also as deterministic) and Bayesian. These two ap-

proaches may be distinguished from many others that

have been derived out. For example, the maximum

likelihood (ML) estimator is the foundation of the fre-

quentist approach. One of the key purposes of ma-

chine learning is to make predictions using the pa-

rameters learnt from the training dataset. Depending

on the kind of predictions and/or our past experience

(prior knowledge), either a frequentist or a Bayesian

technique could be adopted to accomplish the goal.

2.4.1 Frequentist Learning

Deterministic approaches assume that observed data

is drawn from a given distribution (Tissera et al.,

2022). This distribution is referred to as the likeli-

hood, or P(Data | θ) where the objective is to estimate

θ (i.e. the model’s parameters), which is assumed to

be constant number, that might maximize the likeli-

hood (MLE). In statistical modelling, when the model

depends on unobserved latent variables, some funda-

mental algorithms (such as the famous iterative EM

algorithm) could be applied to found the maximum

likelihood or the maximum a posteriori. For this case,

the procedure of maximizing the likelihood is formal-

ized as the following optimization problem:

= arg max

{log p(Y |Θ)} (3)

Analytically, this equation cannot be solved. The

expectation maximization (EM) algorithm or other

related methods may be used to generate the MLE

estimates of the mixture parameters. Indeed, the

EM method generates a series of estimates {Θ

, t =

0, 1, 2. . .} by alternately utilizing two stages until a

convergence: Expectation and Maximization steps.

2.4.2 Bayesian Learning

Under the Bayesian method (Bourouis et al., 2021a),

the parameter of the mixture denoted by Θ is viewed

as a random variable with a speciﬁc probability dis-

tribution (the prior). The latter serves to represent our

belief prior before seeing the data. For this case, the

Bayes theorem is used to update the prior distribution

based on the likelihood function. The information in

the prior distribution as well as the data is summa-

rized in a subsequent distribution known as the poste-

rior distribution which is expressed as:

p(Θ|Y ) ∝ p(Y |Θ)p(Θ) (4)

The Markov chain Monte Carlo simulation tech-

nique (MCMC), the Gibbs sampler, and Laplace’s

method are some effective Bayesian approximation

approaches that have been used to various machine

learning applications (Husmeier, 2000).

2.4.3 Variational Learning

It should be emphasized that Bayesian techniques

need a signiﬁcant computing investment, particularly

when working with massive data sets (Tan and Nott,

2014). For example, MCMCs are widely used to sam-

ple from distributions, however this method is occa-

sionally computationally expensive. As a result, vari-

ational Bayes inference has been explored to solve

these problems. It has actually been used as a more

effective alternative than MCMC. The fundamental

idea is to estimate the model posterior distribution by

minimizing the Kullback-Leibler (KL) divergence be-

tween the true posterior and an approximation distri-

bution.

2.4.4 Expectation Propagation

Expectation propagation (EP) learning (Minka, 2001)

may be viewed as a recursive approximation method

that minimizes a Kullback-Leibler (KL) divergence

between an approximation and the exact posterior

model. EP is a deterministic approach to Bayesian in-

ference that generates the optimal posterior distribu-

tion through an iterative reﬁning process. It is based

ICPRAM 2023 - 12th International Conference on Pattern Recognition Applications and Methods

314

on the so-called Assumed Density Filtering (ADF)

(Minka, 2001). The EP inference differs from the

ADF in that it does not rely on the input data’s order,

and it might be improved by utilizing more than one

data point. Furthermore, the higher computing perfor-

mance of EP over Gibbs sampling and Markov Chain

Monte Carlo (MCMC) is one of its key advantages.

2.4.5 Batch/Online Learning Algorithms

Online algorithms enable the sequential processing of

data instances, which is critical for real-time appli-

cations (Fujimaki et al., 2011). Online learning is

more attractive than batch one and this for many ap-

plications especially when dealing with huge datasets.

With mixture models, it is possible to save time

and ensure performance since we need to update the

model’s parameters progressively. The settings must

be adjusted appropriately without compromising ﬂex-

ibility and efﬁciency.

2.5 Mixture Model Selection

In order to ﬁnd the best ﬁt for modeling data, sev-

eral model selection-based techniques have been pro-

posed. Automated selection of the components num-

ber that best describes the observations has been the

subject of several studies and investigations. Finding

the optimal number of components to explain a given

set of data is one of mixture models’ most difﬁcult

tasks. This procedure is called model selection. Many

successful information criteria have been considered

in order to address this challenging issue. Among the

well-known criteria, we may ﬁnd the Akaike’s infor-

mation criterion (AIC) (Akaike, 1974) and Bayes in-

formation criterion (BIC) (Schwarz, 1978). It should

be noted that these criteria are based on penalizing

a mixture’s log likelihood function. The Kullback-

Leibler divergence between the probability density

function and the mixture model is what the AIC seeks

to reduce. By minimizing the inﬂuence of the prior,

the BIC, on the other side, roughly approximates

the marginal likelihood of the mixture. We refer to

the literature for other additional information crite-

ria such as the minimum description length (MDL)

and the minimum message length (MML) (Azam and

Bouguila, 2022). Indeed, several publications have

suggested simultaneously estimating the parameters

of the mixture and selecting the best optimal model

using for example MML or MDL criteria.

3 MIXTURE MODELS AND

APPLICATIONS: OVERVIEW

AND DISCUSSION

Due to the widespread adoption of new technologies,

which has led to millions of people producing enor-

mous volumes of heterogeneous data through smart

devices, there seems to be signiﬁcant opportunity for

expanding knowledge across a variety of scientiﬁc

disciplines. The technological revolution has made

it possible to quickly analyze and extract knowledge

from massive datasets, which is particularly beneﬁ-

cial for a wide range of sectors. Nevertheless explor-

ing the content of these sizeable multimedia databases

is a crucial challenge that may be dealt with by sta-

tistical tools. Using statistical methods like mixture

models (MM), problems like data clustering, object

segmentation, image denoising, pattern recognition,

and many more might be successfully handled. For

various applications of data analysis, a fundamental

mixture model-based architecture requires a number

of processing stages as shown in Fig.2. It has re-

cently been shown that MM can provide prospec-

tive capacity for addressing difﬁcult machine learning

problems. In particular, MM can effectively address

the issue of imbalanced samples and insufﬁcient train-

ing data (McLachlan and Peel, 2004). Typically, the

ﬁrst step in employing mixture models is to extract

effective visual features (descriptors) from the input

dataset. Different techniques, such as SIFT (Scale

Invariant Feature transform) (Lowe, 2004), could be

used to extract such characteristics. An additional

step is to implement a visual vocabulary by quanti-

fying the attributes into visual words using for exam-

ple a bag-of-words (BOW) model and a simple clus-

tering algorithm like K-means (Csurka et al., 2004).

Furthermore, a probabilistic Latent Semantic Analy-

sis (pLSA) may be utilized to reduce the dimensional-

ity and to produce a d-dimensional proportional vec-

tor (Hofmann, 2001) (that takes any value from 0 and

1). The developed statistical model is then applied as

a classiﬁer to assign each input to the category with

the highest posterior probability in accordance with

Bayes’ decision rule. Typically, datasets are randomly

divided into two parts; the ﬁrst is utilized for training

and for creating the visual vocabulary, while the sec-

ond is used for testing and assessment.

3.1 Visual Features Extraction

One of the critical step in the ﬁeld of machine learn-

ing is feature extraction. In particular, the extrac-

tion of important and relevant visual characteristics

is of great importance in the computational analysis

Recent Advances in Statistical Mixture Models: Challenges and Applications

315

Figure 2: A basic mixture model-based framework for different data analysis applications.

of multimedia data. The ﬁrst stage for any image

processing framework is to convert the multimodal

data (image, text, and video) into a collection of char-

acteristics. In order to accomplish this, some ex-

isting techniques extract the most representative at-

tributes, from the source data (data set), that aid in

differentiating between input data. The task of fea-

ture extraction has mostly been addressed in the con-

text of image analysis applications. Feature extrac-

tion may be thought of as dimensionality reduction

since only less information is used than in the initial

input data. In order to extract spatial characteristics

from images and videos, one must take into consid-

eration spatial-pixel and spatio-temporal information.

Various techniques, including statistical ones, local or

global methods have been published in the literature

for feature extraction.They might be based on the gray

level value’s ﬁrst order or higher order statistics. Lo-

cal extraction techniques (eg. SIFT, SURF, and LBP)

do not necessarily need background detection or sub-

traction since it is less prone to noise and partial oc-

clusion.

3.2 Overview

A valuable tool for examining diverse types of data

in many computer science ﬁelds is the combination

of statistical approaches with machine learning. A

foundation of machine learning is statistics. Without

it, it is difﬁcult to fully comprehend and apply ma-

chine learning. The basic goal of statistical machine

learning is the design and optimization of probabilis-

tic models for data processing, analysis, and predic-

tion. Statistical machine learning advancements have

a big inﬂuence on lots of different sectors including

artiﬁcial intelligence, signal/image processing, infor-

mation management, as well as fundamental sciences.

It is important to offer new sophisticated statistical

machine learning (SML) approaches based on a fam-

ily of ﬂexible distributions in order to solve signiﬁcant

issues with standard machine learning algorithms and

data modeling. SML has made great progress in re-

cent years on both supervised and unsupervised learn-

ing tasks including clustering, classiﬁcation, pattern

recognition, and many other data analysis-based ap-

plication.

3.2.1 Data Classiﬁcation with MM

Multimodal data classiﬁcation, which includes text,

images, and videos, is one of the most important

tasks for many computer vision applications. It in-

volves properly allocating objects to one of a num-

ber of speciﬁed classes. When using mixture mod-

els, it is reasonable to assume that the observed data

are drawn from several distributions, and the classiﬁ-

cation problem is then seen as an estimation of the

parameters of these distributions. In recent years,

mixture models have been used to develop efﬁcient

computer-aided systems that successfully classify in-

put scans and/or video sequences. For instance, they

have been used to classify retinal images and detect

diabetic retinopathy (Bourouis et al., 2019) and lung

disease in CXR images (Alharithi et al., 2021). They

have been also exploited to classify biomedical data

(Bourouis et al., 2021a; Bourouis et al., 2021c). It

is notable that a number of models were constructed

for various purposes, including texture categorization

(Norah Saleh Alghamdi, 2022).

3.2.2 Data Clustering with MM

Data clustering is a fundamental and quite well un-

supervised learning technique with use in data min-

ing and information retrieval, among other areas

(McLachlan and Peel, 2004). Often, a clustering tech-

nique is used to discover hidden patterns in data. By

ICPRAM 2023 - 12th International Conference on Pattern Recognition Applications and Methods

316

ﬁtting a variety of probability distributions to the un-

derlying data and then continually modifying their

parameters until they best match it, mixture models

(MM) have the potential to locate clusters. When

we simultaneously have low inter-cluster and signiﬁ-

cant intra-class similarities, an MM-based clustering

method will produce high quality clusters. Numer-

ous research on clustering high-dimensional data have

been done and published in the literature (Melnykov

and Wang, 2023; Tissera et al., 2022; Jiao et al., 2022;

Hu et al., 2019; Lai et al., 2018).

3.2.3 Pattern Recognition with MM

The usage of mixture models in the ﬁeld of unsuper-

vised pattern recognition has become common. In

fact, a pattern is viewed as an entity, represented

by a feature vector, and may then be described us-

ing a combination of distributions. To deal with the

problem of recognizing complex patterns, such as hu-

man activities, hand gestures, face expressions, and

so forth, several unsupervised-based mixture models

have been established (Yang et al., 2013; Najar et al.,

2020; Bouguila, 2011; Bourouis et al., 2021b; Al-

harithi et al., 2021). In (Yang et al., 2013), a method

for simulating articulated human movements, ges-

tures, and facial expressions is suggested. Addition-

ally, an expectation propagation inference approach

based on inverted Beta-Liouville mixture models was

suggested to handle diverse pattern recognition appli-

cations (Bourouis and Bouguila, 2022).

3.2.4 Data Segmentation with MM

Image segmentation is the process of partitioning an

image’s pixels into a number of homogeneous sub-

groups (segments). Image segmentation presents a

variety of challenges, including choosing the right

number of segments (Su et al., 2022). Smooth ob-

ject segmentation may be challenging, particularly if

the image contains noise, a complex foreground, poor

contrast, and irregular intensity. The main factors

for grouping pixels that we seek for in image seg-

mentation are proximity, color, and textures, which

are frequently present in pixels. Different mixture

model-based approaches were implemented to tackle

the issue of object segmentation, and some of them

have shown to produce better outcomes (Cheng et al.,

2022; Channouﬁ et al., 2018; Allili et al., 2008).

3.3 Discussion

Standard distributions, such as the Gaussian, have

been employed for many years as the primary dis-

tributions to address data analysis concerns. These

probabilities are unfortunately not the most accu-

rate approximation when dealing with non-Gaussian

data. Effectively modeling data vectors requires

both choosing the best approximation for the treated

dataset and the most effective inference technique for

learning mixture models. Recent studies have shown

that mixtures based on the Dirichlet, inverted Dirich-

let (ID), and generalized Dirichlet distributions out-

perform the conventional Gaussian for a range of data

analysis tasks. Meanwhile, certain distributions con-

tinue to have a number of shortcomings (such as the

restrictive covariance matrix structure) that restrict

their application in a number of different real-world

scenarios. Furthermore, other distributions fail to ﬁnd

the right number of components to adequately char-

acterize the input vectors without over-ﬁtting and/or

under-ﬁtting. To overcome these issues, even when

doing so requires expensive computing resources, it

is strongly advised to employ other methods and in-

corporate additional criteria, such as the Akaike in-

formation criterion (AIC), MML, and MDL. Future

studies should go deeply into the fundamental issues

with MM-based models. First, determining the opti-

mal number of components is a difﬁcult problem that

needs more investigation. Then, when dealing with

model parameter estimation using various methodolo-

gies, such as the EM algorithm, initial values selec-

tion and convergence issues are frequently encoun-

tered. On the other side, to reproduce the optimal

log-likelihood, non-normal distributed mixtures re-

quire more random starting values than do normal

distributed mixtures. Additionally, although they are

quite time-consuming, Bayesian approaches based on

the MCMC methodology have gained popularity and

may be able to assist prevent issues with EM algo-

rithms. Each component must be speciﬁed before the

entire mixture model can be constructed. As a result,

if the model is not properly deﬁned, inconsistent es-

timations for the set of parameters may happen. Fur-

thermore, when several statistics must be calculated

simultaneously, choosing the best model is a desired

but challenging problem. It is also necessary to make

more adjustments in order to compare potential mod-

els and evaluate the ﬁt quality. On the other hand, it

should be emphasized that while many probabilistic

models may successfully categorize comparable data,

they occasionally fail when the data is severely inﬂu-

enced by noise and outliers. In light of these short-

comings, it is theoretically possible that discrimina-

tive classiﬁers, in particular the support vector ma-

chine (SVM), might be used. Therefore, it is advis-

able to investigate hybrid methods that take into con-

sideration both the beneﬁts of probabilistic and dis-

criminative models in order to gain superior perfor-

Recent Advances in Statistical Mixture Models: Challenges and Applications

317

mance. For instance, designing robust mixture-based

probabilistic SVM kernels can help with this. Last but

not least, it is crucial to emphasize that the output of

various statistical MM models frequently depends on

the dataset’s sample size.

4 CONCLUSION

Mixture models (MM), an emerging statistical

method for modeling complex multimodal data,

is discussed in this paper. The current study presents

a recent brief review of the advances in MM mod-

els. Although there hasn’t been much research on

MM-based methods, and only a few publications have

time-varying indicators, we are optimistic that more

signiﬁcant and insightful results will soon be made

available to the public.

REFERENCES

Akaike, H. (1974). A new look at the statistical model iden-

tiﬁcation. IEEE Transactions on Automatic Control,

19(6):716–723.

Alharithi, F. S., Almulihi, A. H., Bourouis, S., Alroobaea,

R., and Bouguila, N. (2021). Discriminative learning

approach based on ﬂexible mixture model for med-

ical data categorization and recognition. Sensors,

21(7):2450.

Allili, M. S., Bouguila, N., and Ziou, D. (2008). Finite

general gaussian mixture modeling and application to

image and video foreground segmentation. Journal of

Electronic Imaging, 17(1):013005–013005.

Alroobaea, R., Rubaiee, S., Bourouis, S., Bouguila, N.,

and Alsufyani, A. (2020). Bayesian inference frame-

work for bounded generalized gaussian-based mixture

model and its application to biomedical images clas-

siﬁcation. Int. J. Imaging Systems and Technology,

30(1):18–30.

Azam, M. and Bouguila, N. (2022). Multivariate bounded

support asymmetric generalized gaussian mixture

model with model selection using minimum mes-

sage length. Expert Systems with Applications,

204:117516.

Bouguila, N. (2011). Bayesian hybrid generative discrimi-

native learning based on ﬁnite liouville mixture mod-

els. Pattern Recognit., 44(6):1183–1200.

Bourouis, S., Alroobaea, R., Rubaiee, S., Andejany, M., Al-

mansour, F. M., and Bouguila, N. (2021a). Markov

chain monte carlo-based bayesian inference for learn-

ing ﬁnite and inﬁnite inverted beta-liouville mixture

models. IEEE Access, 9:71170–71183.

Bourouis, S., Alroobaea, R., Rubaiee, S., Andejany, M., and

Bouguila, N. (2021b). Nonparametric bayesian learn-

ing of inﬁnite multivariate generalized normal mixture

models and its applications. Applied Sciences, 11(13).

Bourouis, S. and Bouguila, N. (2021). Nonparametric learn-

ing approach based on inﬁnite ﬂexible mixture model

and its application to medical data analysis. Int. J.

Imaging Syst. Technol., 31(4):1989–2002.

Bourouis, S. and Bouguila, N. (2022). Unsupervised learn-

ing using expectation propagation inference of in-

verted beta-liouville mixture models for pattern recog-

nition applications. Cybernetics and Systems, pages

1–25.

Bourouis, S., Sallay, H., and Bouguila, N. (2021c). A com-

petitive generalized gamma mixture model for medi-

cal image diagnosis. IEEE Access, 9:13727–13736.

Bourouis, S., Zaguia, A., Bouguila, N., and Alroobaea, R.

(2019). Deriving probabilistic SVM kernels from ﬂex-

ible statistical mixture models and its application to

retinal images classiﬁcation. IEEE Access, 7:1107–

1117.

Bouveyron, C. and Girard, S. (2009). Robust super-

vised classiﬁcation with mixture models: Learning

from data with uncertain labels. Pattern Recognition,

42(11):2649–2658.

Brooks, S. P. (2001). On bayesian analyses and ﬁnite mix-

tures for proportions. Stat. Comput., 11(2):179–190.

Channouﬁ, I., Bourouis, S., Bouguila, N., and Hamrouni,

K. (2018). Spatially constrained mixture model with

feature selection for image and video segmentation. In

Image and Signal Processing - 8th International Con-

ference, ICISP France, pages 36–44.

Cheng, N., Cao, C., Yang, J., Zhang, Z., and Chen, Y.

(2022). A spatially constrained skew student’s-t mix-

ture model for brain mr image segmentation and bias

ﬁeld correction. Pattern Recognition, 128:108658.

Csurka, G., Dance, C., Fan, L., Willamowski, J., and Bray,

C. (2004). Visual categorization with bags of key-

points. In Workshop on statistical learning in com-

puter vision, ECCV, volume 1, pages 1–2. Prague.

Fan, W. and Bouguila, N. (2020). Modeling and clustering

positive vectors via nonparametric mixture models of

liouville distributions. IEEE Trans. Neural Networks

Learn. Syst., 31(9):3193–3203.

Fu, Y., Liu, X., Sarkar, S., and Wu, T. (2021). Gaus-

sian mixture model with feature selection: An embed-

ded approach. Computers and Industrial Engineering,

152:107000.

Fujimaki, R., Sogawa, Y., and Morinaga, S. (2011). On-

line heterogeneous mixture modeling with marginal

and copula selection. In International Conference

on Knowledge Discovery and Data Mining, SIGKDD,

USA, pages 645–653.

Hofmann, T. (2001). Unsupervised learning by probabilistic

latent semantic analysis. Mach. Learn., 42(1/2):177–

196.

Hu, C., Fan, W., Du, J., and Bouguila, N. (2019). A novel

statistical approach for clustering positive data based

on ﬁnite inverted beta-liouville mixture models. Neu-

rocomputing, 333:110–123.

Husmeier, D. (2000). The bayesian evidence scheme for

regularizing probability-density estimating neural net-

works. Neural Computation, 12(11):2685–2717.

ICPRAM 2023 - 12th International Conference on Pattern Recognition Applications and Methods

318

Jiao, L., Denœux, T., ga Liu, Z., and Pan, Q. (2022). Egmm:

An evidential version of the gaussian mixture model

for clustering. Applied Soft Computing, 129:109619.

Lai, Y., Ping, Y., Xiao, K., Hao, B., and Zhang, X. (2018).

Variational bayesian inference for a dirichlet process

mixture of beta distributions and application. Neuro-

computing, 278:23–33.

Lowe, D. G. (2004). Distinctive image features from scale-

invariant keypoints. Int. J. Comput. Vis., 60(2):91–

110.

McLachlan, G. J. and Peel, D. (2004). Finite mixture mod-

els. John Wiley & Sons.

Melnykov, V. and Wang, Y. (2023). Conditional mixture

modeling and model-based clustering. Pattern Recog-

nition, 133:108994.

Minka, T. P. (2001). Expectation propagation for approxi-

mate bayesian inference. In Breese, J. S. and Koller,

D., editors, Proceedings of the 17th Conference in Un-

certainty in Artiﬁcial Intelligence, Washington, USA,

pages 362–369. Morgan Kaufmann.

Najar, F., Bourouis, S., Bouguila, N., and Belghith,

S. (2020). A new hybrid discriminative/generative

model using the full-covariance multivariate gen-

eralized gaussian mixture models. Soft Comput.,

24(14):10611–10628.

Norah Saleh Alghamdi, Sami Bourouis, N. B. (2022). Ef-

fective frameworks based on inﬁnite mixture model

for real-world applications. Computers, Materials &

Continua, 72(1):1139–1156.

Schwarz, G. (1978). Estimating the dimension of a model.

The annals of statistics, pages 461–464.

Su, H., Zhao, D., Elmannai, H., Heidari, A. A., Bourouis,

S., Wu, Z., Cai, Z., Gui, W., and Chen, M. (2022).

Multilevel threshold image segmentation for COVID-

19 chest radiography: A framework using horizontal

and vertical multiverse optimization. Comput. Biol.

Medicine, 146:105618.

Tan, S. L. and Nott, D. J. (2014). Variational approximation

for mixtures of linear mixed models. Journal of Com-

putational and Graphical Statistics, 23(2):564–585.

Tissera, D., Vithanage, K., Wijesinghe, R., Xavier,

A., Jayasena, S., Fernando, S., and Rodrigo, R.

(2022). Neural mixture models with expectation-

maximization for end-to-end deep clustering. Neuro-

computing, 505:249–262.

Yang, Y., Saleemi, I., and Shah, M. (2013). Discover-

ing motion primitives for unsupervised grouping and

one-shot learning of human actions, gestures, and ex-

pressions. IEEE Trans. Pattern Anal. Mach. Intell.,

35(7):1635–1648.

Recent Advances in Statistical Mixture Models: Challenges and Applications

319