
detection. In Proceedings of the 21st ACM interna-
tional conference on Information and knowledge man-
agement, pages 773–782.
Connor, R. J. and Mosimann, J. E. (1969). Concepts of
independence for proportions with a generalization of
the dirichlet distribution. Journal of the American Sta-
tistical Association, 64(325):194–206.
Epaillard, E. and Bouguila, N. (2019). Data-free metrics
for dirichlet and generalized dirichlet mixture-based
hmms–a practical study. Pattern Recognition, 85:207–
219.
Fan, W. and Bouguila, N. (2015). Expectation propaga-
tion learning of a dirichlet process mixture of beta-
liouville distributions for proportional data cluster-
ing. Engineering Applications of Artificial Intelli-
gence, 43:1–14.
Frank, A. (2010). Uci machine learning repository.
http://archive. ics. uci. edu/ml.
Fu, Q. and Banerjee, A. (2008). Multiplicative mix-
ture models for overlapping clustering. In 2008
Eighth IEEE International Conference on Data Min-
ing, pages 791–796. IEEE.
Griffiths, T. L. and Steyvers, M. (2004). Finding scientific
topics. Proceedings of the National academy of Sci-
ences, 101(suppl 1):5228–5235.
Harper, F. M. and Konstan, J. A. (2015). The movielens
datasets: History and context. Acm transactions on
interactive intelligent systems (tiis), 5(4):1–19.
Hofmann, T. (2001). Unsupervised learning by probabilistic
latent semantic analysis. Machine learning, 42:177–
196.
Koochemeshkian, P., Zamzami, N., and Bouguila, N.
(2020). Flexible distribution-based regression mod-
els for count data: Application to medical diagnosis.
Cybern. Syst., 51(4):442–466.
Li, T. and Ma, J. (2023). Dirichlet process mixture of gaus-
sian process functional regressions and its variational
em algorithm. Pattern Recognition, 134:109129.
Li, X., Ling, C. X., and Wang, H. (2016). The con-
vergence behavior of naive bayes on large sparse
datasets. ACM Transactions on Knowledge Discovery
from Data (TKDD), 11(1):1–24.
Luo, Z., Amayri, M., Fan, W., and Bouguila, N. (2023).
Cross-collection latent beta-liouville allocation model
training with privacy protection and applications. Ap-
plied Intelligence, pages 1–25.
Najar, F. and Bouguila, N. (2021). Smoothed general-
ized dirichlet: A novel count-data model for detect-
ing emotional states. IEEE Transactions on Artificial
Intelligence, 3(5):685–698.
Najar, F. and Bouguila, N. (2022a). Emotion recognition: A
smoothed dirichlet multinomial solution. Engineering
Applications of Artificial Intelligence, 107:104542.
Najar, F. and Bouguila, N. (2022b). Sparse generalized
dirichlet prior based bayesian multinomial estimation.
In Advanced Data Mining and Applications: 17th In-
ternational Conference, ADMA 2021, Sydney, NSW,
Australia, February 2–4, 2022, Proceedings, Part II,
pages 177–191. Springer.
Sahami, M., Hearst, M., and Saund, E. (1996). Applying the
multiple cause mixture model to text categorization.
In ICML, volume 96, pages 435–443.
Taheri, S., Mammadov, M., and Bagirov, A. M. (2010). Im-
proving naive bayes classifier using conditional prob-
abilities.
Wickramasinghe, I. and Kalutarage, H. (2021). Naive
bayes: applications, variations and vulnerabilities: a
review of literature with code snippets for implemen-
tation. Soft Computing, 25(3):2277–2293.
Wolberg, William, M. O. S. N. and Street, W.
(1995). Breast Cancer Wisconsin (Diagnos-
tic). UCI Machine Learning Repository. DOI:
https://doi.org/10.24432/C5DW2B.
Zamzami, N. and Bouguila, N. (2019a). Model selection
and application to high-dimensional count data clus-
tering - via finite EDCM mixture models. Appl. Intell.,
49(4):1467–1488.
Zamzami, N. and Bouguila, N. (2019b). A novel scaled
dirichlet-based statistical framework for count data
modeling: Unsupervised learning and exponential ap-
proximation. Pattern Recognit., 95:36–47.
Zamzami, N. and Bouguila, N. (2022). Sparse count data
clustering using an exponential approximation to gen-
eralized dirichlet multinomial distributions. IEEE
Transactions on Neural Networks and Learning Sys-
tems, 33(1):89–102.
APPENDIX
A Exponential Form of the
Generalized Dirichlet Distribution
The exponential family of distributions is a group
of parametric probability distributions with specific
mathematical characteristics, making them easily
manageable from both statistical and mathematical
perspectives. This family encompasses various distri-
butions like normal, exponential, log-normal, gamma,
chi-squared, beta, Dirichlet, Bernoulli, and more.
Given a measure η, an exponential family of proba-
bility distributions is identified as distributions whose
density (in relation to η) follows a general form:
p(x|η) = h(x)exp(η
T
T (x) − A(η)) (34)
where, h(x) is referred to as the base measure,
T (x) is the sufficient statistic. η is known as natural
parameter, and A(η) is defined as the cumulant func-
tion.
It has been shown that the generalized Dirichlet
distribution is a member of the exponential family dis-
tributions (Zamzami and Bouguila, 2019a,b, 2022), as
evidenced by its representation in the aforementioned
form, as illustrated below:
ICEIS 2024 - 26th International Conference on Enterprise Information Systems
170