Table 1: Impact of binning method on the performance of the classifiers learned by Na
¨
ıve Bayes (NB) and J4.8: performance
is measured as accuracy (higher values are better) and SROCD (lower values are better). Accuracy is an ill choice for the task
at hand, because it shows bias to the number of labels and to the distribution of the data among the labels.
Binning method # bins Accuracy (in%) SROCD (in min)
NB J4.8 NB J4.8
EWIB 50 21.23 41.11 1,488,230 1,120,729
K-Means 50 14.39 35.41 1,639,846 1,073,231
TUBE 3+96 82.12 88.01 4,099,064 4,111,336
prior probability of a miss (wrong label assignment)
is higher if there are 50 labels than if there are only
three. Accuracy is sensitive to the number of labels,
so it is an ill choice if classifiers are learned with dif-
ferent numbers of labels or with a strong bias towards
only a few labels.
Summarizing, the instantiation of the prediction
task for the hospital resulted in predictors that im-
proved the baseline. Among the lessons learned are
the impact of discretization on the learners and the
importance of selecting a proper evaluation measure.
6 CONCLUSIONS
We presented a high-level framework for knowledge
discovery from medical protocols, and its instantia-
tion in a German hospital for the prediction of surgi-
cal room occupancy time (SRO time). Such data are
primarily recorded for medical purposes , but can be
used to support planing decisions, too, provided they
are appropriately prepared and analyzed.
In the instantiation of our framework in a German
hospital we studied intensive care unit protocols and
anesthesia protocols. Instantiation on the former is
still under data preparation, since the intensive care
units’ data were in a format not yet appropriate for
data mining. Anesthesia protocols have been success-
fully analyzed after a preprocessing task that involved
computation and discretization of the target variable
(SRO time). We reported on what steps should take
place during preprocessing and analysis, how differ-
ent algorithms can affect the predicting power of the
learned models, and how they should be compared.
Next steps include the refinement of our frame-
work towards specific activities for decision sup-
port tasks, and instantiations for knowledge discov-
ery from other types of medical protocols, foremostly
from intensive care unit protocols.
REFERENCES
Avison, D. and Young, T. (2007). Time to rethink
health care and ICT? Communications of the ACM,
50(6):69–74.
Combi, C., Keravnou-Papailiou, E., and Shahar, Y. (2010).
Temporal Information Systems in Medicine. Springer.
Dexter, F., Davis, M., Halbeis, C. E., Marjamaa, R., Marty,
J., McIntosh, C., Nakata, Y., Thenuwara, K. N.,
Sawa, T., and Vigoda, M. (2006). Mean operating
room times differ by 50% among hospitals in different
countries for laparoscopic cholecystectomy and lung
lobectomy. Journal of Anesthesia (2006) 20:319–322.
DGAI (1993). Qualit
¨
atssicherung und Datenverarbeitung in
der An
¨
asthesie. Kerndatensatz Qualit
¨
atssicherung in
der An
¨
asthesie. An
¨
asth Intensivmed, 34:331–335.
Eijkemans, M. J. C., van Houdenhoven, M., Nguyen, T.,
Boersma, E., Steyerberg, E. W., and Kazemier, G.
(2010). Predicting the unpredictable: A new predic-
tion model for operating room times using individual
characteristics and the surgeon’s estimate. Anesthesi-
ology 2010; 112:41–9.
Ho, T. K. (1995). Random decision forests. 3rd Int’l Conf.
on Document Analysis and Recognition.
Quinlan, J. (1986). Induction of decision trees. Machine
Learning 1: 81-106, 1986.
Schmidberger, G. and Eibe, F. (2005). Unsupervised dis-
cretization using tree-based density estimation. Lec-
ture Notes in Computer Science, Volume 3721/2005,
240-251.
Schult, R., Matuszyk, P., and Spiliopoulou, M. (2011). Pre-
diction of surgery duration using empirical anesthe-
sia protocols. In The First International Workshop on
Knowledge Discovery in Health Care and Medicine
(KDHCM 2011), pages 66 – 77.
Stead, W., Hammond, W., and Straube, M. (1983). A chart-
less record - is it adequate? Journal of Medicine Sys-
tems, 7:103 – 109.
WHO (2011). World health organization: International
classification of diseases (ICD). http://www.who.int/
classifications/icd/en/.
Wilson, E. V. and Tulu, B. (2010). The Rise of a Health-
IT Academic Focus. Communications of the ACM,
53(5):147–150.
Witten, I. H. and Eibe, F. (2005). Data mining : practical
machine learning tools and techniques. Amsterdam:
Elsevier; San Francisco, CA: Morgan Kaufmann.
HEALTHINF 2012 - International Conference on Health Informatics
230