tasks, in this study, we selected L1-regularized
logistic regression in order to be able to examine EHR
feature weights alongside classification performance.
A few features positively weighted by the classifier
are not clearly related to CDI risk or likely to be
related to evolving symptomatology – for example,
service or admission location. In practice,
unexpectedly weighted characteristics also have the
potential to reflect phenomena of institutional or
clinical epidemiological interest, such as
unrecognized infection transmission routes or
previously undetected groups of patients at elevated
risk (Cohen et al., 2010; Shaughnessy, Micielli,
DePestel, et al., 2011). Thus, in a machine learning
classification system, it is desirable to be able to
examine what features are being identified by the
system as predictive, even when such features may
not be validated as risk factors by previous
epidemiological studies.
A limitation of the current study is that we include
data from only one set of archived electronic patient
records for an intensive care unit patient population,
limiting the generalizability of our results. Further
investigations are needed to cross-validate this
system and compare the clinical performance of
CREST in different healthcare facilities and for
different patient groups. In addition, other
opportunities further performance improvements may
also be accomplished through the use of alternative
core machine learning methods and optimized cross-
validation approaches. It also remains to be studied
whether changes in the risk score itself may be useful
as inputs to the system.
Given the overall relatively low prevalence of
CDI in the patient population, the sensitivity and
specificity of CREST would require improvement
before the system could be used as a diagnostic tool.
However, the ability of CREST to flag evolving high-
risk patients based on real-time clinical data makes
the system very useful for preventive interventions
and infection control epidemiology applications.
Facility-level prevention activities that present
minimal or no risk to individual patients, such as
precautionary patient isolation or increased
observation with a lowered threshold for ordering
diagnostic testing, might be considered for patients
who the system identifies as potential CDI cases.
5 CONCLUSIONS
We conclude from this study that machine learning
strategies can be productively applied to EHR data for
early identification of hospital-acquired CDI cases
and that dynamic feature variability provides
particularly strong predictive signals, beyond patient
information used for traditional clinical risk
assessments. Further investigations are needed to
cross-validate this system, to compare the
performance of this approach for different facilities
and patient groups, and to explore its ability to
discriminate among diagnoses.
ACKNOWLEDGEMENTS
Thomas Hartvigsen thanks the US Department of
Education for supporting his PhD studies via the grant
P200A150306 on “GAANN Fellowships to Support
Data-Driven Computing Research”. Cansu Sen
thanks WPI for granting her the Arvid Anderson
Fellowship (2015-2016) to pursue her PhD studies.
We also thank the DSRG and Data Science
Community at WPI for their support and feedback.
REFERENCES
‘Antibiotic Resistance Threats in the United States,’
Centers for Disease Control and Prevention, 2019.
https://www.cdc.gov/drugresistance/pdf/threats-
report/2019-ar-threats-report-508.pdf
Lessa, F.C., Mu, Y., Bamberg, W.M., Beldavs, Z.G.,
Dumyati, G.K., Dunn, J.R., and others, 2015. Burden of
Clostridium difficile infection in the United States. N
Engl J Med, 372 (9): 825-834.
Cohen, S.H., Gerding, D.N., Johnson, S., Kelly, C.P., Loo,
V.G., McDonald, L.C., and others, 2010. Clinical
practice guidelines for Clostridium difficile infection:
2010 update by the society for healthcare epidemiology
of America (SHEA) and the infectious diseases society
of America (IDSA). Infect Control Hosp Epidemiol, 31
(5): 431-455.
Evans, C.T., Safdar, N., 2015. Current Trends in the
Epidemiology and Outcomes of Clostridium difficile
Infection. Clin Infect Dis, 60 (Suppl 2): S66-71.
Burnham, C.A., Carroll, K.C., 2013. Diagnosis of
Clostridium difficile infection: an ongoing conundrum
for clinicians and for clinical laboratories. Clin
Microbiol Rev, 26(3): 604-630.
Dubberke, E.R., Olsen, M.A., 2012. Burden of Clostridium
difficile on the healthcare system. Clin Infect Dis, 55
(Suppl 2): S88-92.
Dubberke, E.R., Carling, P., Carrico, R., Donskey, C.J.,
Loo, V.G., McDonald, L.C., and others, 2014.
Strategies to prevent Clostridium difficile infections in
acute care hospitals: 2014 update. Infect Control Hosp
Epidemiol, 35(6): 628-645.
Balsells, E., Filipescu, T. Kyaw, M.H., Wiuff, C.,
Campbell, H., Nair, H., 2016. Infection prevention and
control of Clostridium difficile: a global review of