lor MD, 1999) and patient education (Koelling et al.,
2005) can reduce the readmission rates considerably
and improve the health outcome of the patients. In
particular, studies have shown that targeted inter-
ventions during the hospital stay, during or post-
discharge, can reduce the readmission likelihood, es-
pecially in elderly patients, and decrease the overall
medical costs (Naylor MD, 1999; Rich et al., 1995).
However, to the best of our knowledge, there does
not exist any effort that proposes an iterative analyti-
cal framework to predict 30-day readmission risk for
CHF and to recommend appropriate personalized in-
tervention strategies to the patients at different phases,
which is the primary focus of our work.
This paper has two primary objectives: 1) first,
summarize current research on predicting the 30-day
readmission risk score (or percentage) of CHF pa-
tients; 2) second, outline the challenges and oppor-
tunities in designing personalized intervention strate-
gies to assist decision making, such that the read-
mission risk of the patients gets reduced by a certain
percentage. While the former task is risk prediction,
the latter objective is referred to as risk management.
Interestingly, suggesting appropriate interventions is
tightly integrated with a patient’s current phase – for
example, the post-discharge interventions may only
be limited to appropriate follow-ups or patient educa-
tion, while physicians could suggest different proce-
dures or surgery, if the intervention is being adminis-
tered during her hospital stay. Existing research has
studied different clinical risk prediction problems in
silos. This paper is one of the first efforts to study the
risk prediction and management problem in conjunc-
tion.
Our vision is to design the solution strategies us-
ing statistical and data mining techniques. For ex-
ample, the risk prediction problem could be designed
as a statistical classification or a regression (Han and
Kamber, 2006) problem, with the objective to learn
a mathematical function that correctly outputs the as-
sociated probability of an individual’s 30-day risk of
readmission, or correctly outputs the actual number
of days until the next readmission will take place for
the individual, using different factors that causes CHF
readmission. On the other hand, the risk manage-
ment problem could also be studied using statistical
and data mining techniques. Given a set of possi-
ble interventions, if the risk prediction problem is de-
signed as a regression problem, then to achieve a tar-
geted (lower) risk score, the risk management prob-
lem could be solved by: 1) learning the reverse regres-
sion or calibration (Johnson and Wichern, 1988) of
the intervention parameters that results in the intended
targeted risk score, and 2) performing sequence min-
ing (Han and Kamber, 2006) to suggest appropriate
interventions.
2 RISK PREDICTIONS
In this section, we summarize current studies in pre-
dicting risk of hospital readmission and discuss the
limitations of the field.
2.1 Applying Extensive Data
Preprocessing to Improve Quality of
Prediction
The quality of data determines the quality of predic-
tions. This paper suggests to incorporate a wide va-
riety of data preprocessing and predictive modeling
techniques, i.e. missing value imputation, clustering,
and classification (Han and Kamber, 2006), for im-
proving the prediction of 30-day readmission risk for
CHF patients. Real world clinical data are noisy and
heterogeneous in nature, severely skewed, and con-
tain hundreds of pertinent factors. They contain in-
formation on patients’ socio-demographical charac-
teristics, such as marital status and ethnicity; clinical
data such as diagnosis, discharge information; and co-
morbidity factors
1
; other cost related factors pertain-
ing to a particular hospital admission; lab results; pro-
cedures. The proposed solution relies on data mining
and predictive analytics. Current research has investi-
gated a wide range of techniques to that end – starting
from simple Naive Bayes’ Classifier, Support Vector
Machine (Zolfaghar et al., 2013b; Zolfaghar et al.,
2013a), Regression models (Kansagara D, 2011), to
Ensemble of Multilayer classifier (Zolfaghar et al.,
2013c).
A sophisticated and effective predictive model of-
ten requires a large set of attribute values that may
not all be available (or known) at the time when a pa-
tient or a healthcare provider uses the risk assessment
tool. To transform the limited inputs to the complete
set of attribute values on which the predictive model
is trained, the first task of the risk prediction model is
to map the input values to a group (i.e., cluster) of pa-
tients who are most similar to the provided user pro-
file. The model pre-computes the clusters based on
different permutations of input attributes using the k-
mode algorithm (Han and Kamber, 2006). To accom-
modate all possible scenarios, the model constructs
k ∗ 2
n
clusters, where n is the number of factors (i.e.,
1
Comorbidities are specific patient conditions that are
secondary to the patient’s principal diagnosis and that re-
quire treatment during the stay.
HEALTHINF2014-InternationalConferenceonHealthInformatics
524