and development of diabetes (Li et al., 2019; Gong et
al., 2017 & Qiu et al., 2016). In addition, God et al.
found that other lifestyle factors also influence the
incidence of diabetes (Gode et al., 2024). However,
the risk factors of diabetes mellitus in this literature
failed to be adequately studied. In addition, the
statistically obtained data are not systematic and
complete. As a result, this study concentrates on
examining a set of 16 distinct variables, which
include Gender, Age, Polyuria, Polydipsia, Sudden
weight loss, Weakness, Polyphagia, Genital thrush,
Visual blurring, Itching, Irritability, Delayed healing,
Partial paresis, Muscle stiffness, Alopecia, and
Obesity. The main goal of this research is to discern
and utilize the most appropriate statistical model to
evaluate the degree of association between the
identified risk factors and the prevalence of diabetes.
Through this analysis, the study aspires to offer a
thorough comprehension of the individual
contribution of each factor to the probability of
developing diabetes. This understanding is critical for
the formulation of more precise and impactful
prevention and intervention strategies. By elucidating
the specific roles these risk factors play, the research
aims to enhance the effectiveness of public health
initiatives and clinical practices aimed at mitigating
the incidence of diabetes.
In a similar direction, Xue et al. used a variant of
the Cox proportional risk model (Xue et al., 2022),
but due to data limitations, they did not include
individuals aged 35-45 years, whereas diabetes is also
highly prevalent in the 35-45 year age group, so this
may have affected the generalizability of the results.
In addition, the study did not use biomarkers (e.g.,
blood glucose levels or HbA1c) to confirm the
diagnosis of diabetes but relied on self-reporting and
medication use, which may have affected the
accuracy of the results. Ampeir et al. used a logistic
regression model (Ampeire, Kawugeze and Mulogo,
2023), which provided either an Odds Ratio (OR) or
an Adjusted Odds Ratio (AOR), which allows
researchers to quantify the extent to which each
predictor variable affects prediabetes risk.
Nevertheless, it should be noted that the model
presupposes a linear association between the
dependent variable, which in this case is prediabetes,
and the set of independent variables. This assumption
implies that changes in the independent variables are
directly proportional to changes in the likelihood of
prediabetes, simplifying the complexity of potential
nonlinear interactions that might exist in reality. If the
actual relationship is nonlinear, the model may not
accurately capture this relationship, leading to
inaccurate predictions and interpretations. Narjes
Hazar et al. constructed a Dersimonian and Laird
random effects model (Hazar et al., 2024). The model
can handle the heterogeneity that exists between
studies. However, the estimation of heterogeneity is
sensitive to the distribution of the data and the size of
the study. If the sample size of some studies is
extremely large or small, it may have an unbalanced
effect on the overall heterogeneity estimate.
By emphasizing the most important risk variables
that ought to be the focus of intervention, the results
of this study are anticipated to guide the practice of
medicine, influence public health efforts, and offer
novel perspectives on early detection of diabetes
tactics. Furthermore, the discovery of neglected
variables can provide avenues for further
investigation, which would ultimately lead to a more
thorough comprehension of the genesis of diabetes.
2 METHODOLOGY
2.1 Data Sources
The dataset utilized for this research has been sourced
from the repository available on Kaggle's platform.
The data for this study was gathered through the
administration of direct questionnaires to patients
receiving care at Sylhet Diabetes Hospital, located in
Sylhet, Bangladesh. The collected information was
subsequently reviewed and authorized by a medical
professional to ensure its accuracy and reliability.
2.2 Variable Selection
This study utilizes data from a cohort comprising 520
individuals, including both diabetic and non-diabetic
patients. Within this population, there are 328 males
and 192 females. The age distribution of these
subjects spans from 16 to 90 years. The data contains
16 variables (Gender, Age, Polyuria, Polydipsia,
Sudden weight loss, Weakness, Polyphagia, Genital
thrush, Visual blurring, Itching, Irritability, Delayed
healing, Partial paresis, Muscle stiffness, Alopecia,
Obesity).