4 RELATED WORK
Discretization methods determine “cut-points” (or
split-points) for continuous features, dividing a range
of continuous values into intervals of various lengths.
In clinical data analysis, discrete features are eas-
ier to interpret for both data scientists and clinicians.
Prior research also indicates that discretization makes
learning faster and more accurate (Liu et al., 2002).
Applying discretization methods to continuous fea-
tures such as age, blood pressure, and BMI provide
insight into the profiles of patients with different vari-
ation properties, especially when used in conjunction
with interpretable predictive models such as decision
trees.‘’ Prior work (Chin et al., 2012) indicates that
the choice of cut-points could affect perceptions and
the understanding of the data. This work observe a
non-linear pattern between age and high variation of
blood sugar level (patients in their 40s show a much
greater probability compared to younger and older
patients), which is obscured in the correlation anal-
ysis where the two factors are negatively correlated.
Most prior research has applied discretization as a
data preprocessing technique to enhance the predic-
tive models. Compared to the prior research, this
work proposes discretization based visualization to
support data exploration, which in turn leads to im-
proved predictive modeling.
An increasing body of literature (Kansagara D,
2011) attempts to develop predictive models for hos-
pital readmission risk. Kansagara et al. (Kansagara D,
2011) conducts a systematic review of 26 unique
models based on data types, data collection timing,
prediction variables, etc. However, none of the exist-
ing works attempts to propose discretization for im-
proved prediction, nor do they propose data explo-
ration for prediction problem.
Compared to prior research to the problem of pre-
dicting the risk of hospital readmission, our study pro-
poses a novel visualization approach at the stage of
data exploration to provide interpretable knowledge
discovery to healthcare domain experts. In the pro-
posed framework, we illustrate how a domain expert
can be involved in the data mining process at different
stages.
5 CONCLUSIONS AND FUTURE
WORK
We propose a framework Divide-n-Discover, a princi-
pled discretization based visualization techniques for
data analysis and exploration in healthcare analytics.
We demonstrate the effectiveness of this framework
for predicting the RoR for CHF patients. Our exper-
imental study corroborates that our proposed frame-
work can potentially help filter the outliers in the data
and identify unexpected patterns in the data.
The proposed framework can be extended to a
wide range of healthcare problems. Encouraged by
the preliminary findings, we aim to expand the scope
of the applications and investigate a wider range of
numeric attributes in the future. In addition, imple-
menting the proposed interactive user interface will
allow us to perform usability tests with healthcare
professionals. User studies may reveal the strengths
and weaknesses of the approach and help improve the
data exploration approach. Future work also exam-
ines the evaluation of the proposed method on larger
datasets, identifying and solving the potential scala-
bility issues in data exploration.
REFERENCES
Chin, S.-C., Street, W. N., and Teredesai, A. (2012). Dis-
covering meaningful cut-points to predict high HbA1c
variation. In Proc. 7th INFORMS Workshop on Data
Mining and Health Informatics.
Kansagara D, E. H. (2011). Risk prediction models for
hospital readmission: A systematic review. JAMA,
306(15):1688–1698.
Kerber, R. (1992). ChiMerge: discretization of numeric at-
tributes. In Proceedings of the tenth national confer-
ence on Artificial intelligence, AAAI’92, pages 123–
128. AAAI Press.
Krumholz, H. M., Normand, S. L. T., Keenan, P. S., Lin,
Z. Q., Drye, E. E., Bhat, K. R., Wang, Y. F., Ross,
J. S., Schuur, J. D., and Stauffer, B. D. (2008). Hospi-
tal 30-day heart failure readmission measure method-
ology. Report prepared for the Centers for Medicare
& Medicaid Services.
Liu, H., Hussain, F., Tan, C. L., and Dash, M. (2002). Dis-
cretization: An enabling technique. Data Mining and
Knowledge Discovery, 6(4):393–423.
Szumilas, M. (2010). Explaining odds ratios. Journal of the
Canadian Academy of Child and Adolescent Psychia-
try, 19(3):227–229.
Zolfaghar, K., Agarwal, J., Sistla, D., Chin, S.-C., Roy,
S. B., and Verbiest, N. (2013a). Risk-o-meter: an in-
telligent clinical risk calculator. In KDD, pages 1518–
1521.
Zolfaghar, K., Meadem, N., Sistla, D., Chin, S.-C., Roy,
S. B., Verbiest, N., and Teredesai, A. (2013b). Explor-
ing preprocessing techniques for prediction of risk of
readmission for congestive heart failure patients. In
Data Mining and Healthcare Workshop.
Zolfaghar, K., Meadem, N., Teredesai, A., Roy, S. B., Chin,
S.-C., and Muckian, B. (2013c). Big data solutions
for predicting risk-of-readmission for congestive heart
failure patients. In IEEE Bigdata.
Divide-n-Discover-DiscretizationbasedDataExplorationFrameworkforHealthcareAnalytics
333