has great scalability can be applied to optimize our
model. This algorithm helps in finding optimal
parameters through iterative computation (Dempster
and Rubin 1977). Our study demonstrates the
potential application of IOHMM for estimating
diabetes progression. Depending upon the personal
results obtained from our model, a specific targeted
therapy can be provided to target patient’s needs.
2 METHODS
2.1 Data Description
The dataset used in this study is a diabetic patient
record published by the University of California,
Irvine’s Machine Learning Repository. The original
dataset was collected in 1994 by a PhD student,
Michael Kahn, from Washington University. This
dataset is composed of quantitative records (e.g.,
insulin doses, insulin types, and blood glucose levels
and categorical information (e.g., the size of meal
ingestion, regular physical activity and other living
routines) lasting at least three months for each of the
70 diabetes patients.
Since not all the variables were regularly
recorded, only three well-recorded variables were
acquired from this dataset: blood glucose level (g),
meal intake size (m), and units of insulin injected (i).
Among them, insulin injection (i) and food intake (m)
are sets of numerical data that have units of
treatments received by each patient. Blood glucose
level is a quantitative variable as well, which reflects
the patients’ health conditions.
2.2 Model Development
In this study, the association among variables was
revealed and the reliable projections were realized by
constructing the input and output HMM. Not only the
observed variables (e.g., food intake) were included
in this model, but also the hidden states such as
patients' health conditions (h) over time. From degree
one to eight, the variable h quantifies patients’ health
conditions from the healthiest to the worst. This latent
variable is structured to be influenced by its previous
day health conditions, food intake, and insulin
injection dose on the same day (Fig. 1). Such an
association could generate outcomes to indicate the
daily changes in health conditions. As shown in the
heatmap (Fig. 2), the probabilities of patients having
various physical conditions were expected to be
different. Additional treatments were also included to
evaluate the impact of other factors on transition
probabilities in actual model running.
As opposed to hidden factor h, blood glucose
levels are measurable and can be used to reflect the
changes in regular physical activity and patient’s
daily habits (Fig. 1). By giving treatments, an
emission relationship was expected to be observed.
For example, when increasing the amount of food
ingestion, the blood sugar levels could be higher with
a broader range compared to the control group (Fig.
3). For patients receiving excessive amount of
insulin, they may develop a higher chance of having
lower and unstable blood glucose levels (Fig. 3).
Apart from the assumptions of intervariable
association, model parameter development is another
indispensable step.
There are three parameters that contributed to our
model λ (π, φ, ψ): the prior, π, transition probability,
φ, and emission probability, ψ. The prior parameter
(π) which represents the health condition of our
samples on day 0 is an estimated value ((Equation
(1)). The transition parameter φ, as denoted by
Equation (2), is used to estimate the next health
condition state given previous disease progression,
current insulin injection doses, and current meal size.
Compared with the transition parameter, the emission
parameter, which estimates the probability of having
a certain blood glucose level on specific health
conditions, is constituted by pre-prandial and
postprandial emission parameters (Equations (3) and
(4)). The pre-prandial emission calculates the
probability of having certain pre-prandial glucose
levels based on the patient’s health conditions
(Equation (3)). In Equation (4), the postprandial
glucose level is estimated based on the performance
of patient’s health conditions, pre-prandial blood
glucose, insulin dose and food intake. Coefficients a,
b, c, d and μ were introduced to adjust and quantify
the impact of additional variables. In this study, we
assume that both transition and emission parameters
follow Gaussian distribution. However, further
studies are needed when the parameters are limited
for other specific distribution patterns.
𝜋
𝑘
= 𝑃
ℎ
= 𝑘
(1)
𝜑ℎ
ℎ
, 𝑖
,
, 𝑚
,
=
1
2𝜋𝜎
exp−
ℎ
+ 𝑑
,
⋅ 𝑖
,
− 𝑒
,
⋅ 𝑚
,
− 𝜇
2𝜎
(2)
𝜑𝑔
, ,
ℎ
=
1
2𝜋𝜎
,
,
exp−
𝑔
, ,
− 𝜇
,
2𝜎
,
,
(3)