Extracting Alarm Events from the MIMIC-III Clinical Database

Jonas Chromik

1 a

, Bjarne Pﬁtzner

1 b

, Nina Ihde

1 c

, Marius Michaelis

1 d

, Denise Schmidt

1 e

Sophie Anne Ines Klopfenstein

2 f

, Akira-Sebastian Poncette

2 g

, Felix Balzer

2 h

and Bert Arnrich

1 i

Hasso Plattner Institute, University of Potsdam, Germany

Charit

e – Universit

atsmedizin Berlin, Berlin, Germany

Keywords:

Patient Monitor Alarm, Medical Alarm, Intensive Care Unit, Vital Parameter, Data Cleaning, Data Extraction.

Abstract:

Lack of readily available data on ICU alarm events constitutes a major obstacle to alarm fatigue research.

There are ICU databases available that aim to give a holistic picture of everything happening at the respective

ICU. However, these databases do not contain data on alarm events. We utilise the vital parameters and alarm

thresholds recorded in the MIMIC-III database in order to artiﬁcially extract alarm events from this database.

Prior to that, we uncover, investigate, and mitigate inconsistencies we found in the data. The results of this

work are an approach and an algorithm for cleaning the alarm data available in MIMIC-III and extract concrete

alarm events from them. The data set generated by this algorithm is investigated in this work and can be used

for further research into the problem of alarm fatigue.

1 INTRODUCTION

Alarm fatigue is the desensitisation of medical staff

due to an excessive number of alarms, most of them

being false or irrelevant (McCartney, 2012). This

results in a lack of response to the alarm stimulus.

The problem of alarm fatigue has been widely inves-

tigated, both qualitatively (Cvach, 2012) and quanti-

tatively (Drew et al., 2014).

However, building technical solutions for alleviat-

ing alarm fatigue is hindered by a lack of readily avail-

able data on patient monitor alarms. Public medical

databases usually provide no data on patient monitor

alarms as is the case with eICU CRD (Pollard et al.,

2018) and HiRID (Hyland et al., 2020).

The MIMIC-III database (Johnson et al., 2016)

provides alarm data. However, there are only alarm

https://orcid.org/0000-0002-5709-4381

https://orcid.org/0000-0001-7824-8872

https://orcid.org/0000-0001-5776-3322

https://orcid.org/0000-0002-6437-7152

https://orcid.org/0000-0002-6299-0738

https://orcid.org/0000-0002-8470-2258

https://orcid.org/0000-0003-4627-7016

https://orcid.org/0000-0003-1575-2056

https://orcid.org/0000-0001-8380-7667

thresholds recorded in the database and not the alarm

events themselves. Alarm thresholds are lower and

upper boundaries for a certain vital parameter, such

as the heart rate (HR). For example, for HR the low

alarm threshold might be set to 60 bpm and the high

alarm threshold to 120 bpm. Whenever the measured

parameter, i.e. HR, drops below the low alarm thresh-

old or exceeds the high alarm threshold, an alarm

event goes off at the patient monitor and alerts the

medical staff.

The objective of this work is to extract these alarm

events from the MIMIC-III database by taking into

account alarm thresholds and the actual parameter

value. This is done for the following vital parameters:

HR: heart rate, as measured by an electrocardiogram

(ECG) and expressed in beats per minute (bpm)

NBP

: non-invasively measured systolic blood pres-

sure, as measured by a sphygmomanometer and

expressed in millimeters of mercury (mmHg)

: peripheral blood oxygen saturation, as mea-

sured by a pulse oximeter (usually on the patients

ﬁnger) and expressed in %

Furthermore, we uncover and rectify inconsisten-

cies in the recorded alarm thresholds such as unrealis-

tically high or low values or instances where the high

328

Chromik, J., Pﬁtzner, B., Ihde, N., Michaelis, M., Schmidt, D., Klopfenstein, S., Poncette, A., Balzer, F. and Arnrich, B.

Extracting Alarm Events from the MIMIC-III Clinical Database.

DOI: 10.5220/0010767200003123

In Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022) - Volume 5: HEALTHINF, pages 328-335

ISBN: 978-989-758-552-4; ISSN: 2184-4305

alarm threshold is below the low alarm threshold.

The rest of this work is structured as follows: In

Section 2 we describe which parts of the MIMIC-III

database we are using, how we address data inconsis-

tencies through data cleaning, and ﬁnally how we ex-

tract alarm events. In Section 3 we describe the alarm

event data set that is produced as a result of this work.

Finally, in Section 4 we discuss our ﬁndings as well

as potential applications and limitations of this data

set.

2 MATERIALS & METHODS

The MIMIC-III database contains 26 tables provid-

ing a wide range of information on the events at the

intensive care units (ICUs) of Beth Israel Deaconess

Medical Center. For our use case, however, only

the CHARTEVENTS table is of interest. This table

contains, among others, measured values and alarm

thresholds of the vital parameters listed in Section 1.

The objective of this work is to extract alarm

events by comparing the measured parameter values

with the corresponding alarm thresholds. However,

before the alarm events can be extracted, we have to

deal with a number of data inconsistencies that we un-

covered. The inconsistencies and our corresponding

rectiﬁcation approaches are presented in the follow-

ing.

2.1 Data Cleaning

The CHARTEVENTS table contains a multitude of

information, such as routine vital signs, ventilator set-

tings, laboratory values, and mental status.

For ex-

tracting alarm events, however, only a small subset of

this information is relevant, i.e. measurements of the

three vital parameters we consider in this work (HR,

NBP

, and S

) as well as their respective high and

low alarm thresholds. Hence, the ﬁrst step in data

cleaning is removing all irrelevant information by re-

taining only a speciﬁc subset of ITEMIDs which are

listed in Table 1.

Invalid Value Removal. Besides removing irrele-

vant data items, we are also interested in validating

the correctness of the relevant data items. For exam-

ple, S

can not exceed 100% and HRs above 350

bpm are rare. For the considered vital parameters, we

found that their recorded values are not always within

clinically valid ranges. We assume that these extreme

values are either erroneous or bear a special but un-

https://mimic.mit.edu/iii/mimictables/chartevents/

Table 1: Complete list of ITEMIDs retained while ﬁlter-

ing the CHARTEVENTS table with their respective label

as recorded in the D ITEMS table.

ITEMID Label

220045 HR

220046 HR Alarm - High

220047 HR Alarm - Low

220179 NBP

223751 NBP

Alarm - High

223752 NBP

Alarm - Low

220277 S

223769 S

Alarm - High

223770 S

Alarm - Low

documented meaning. Therefore, we decided to re-

move values outside the clinically valid ranges listed

in Table 2.

Invalid value removal is done both for the param-

eter measurements themselves and the threshold set-

ting (both low and high) using the same ranges. The

ranges are intentionally chosen to be conservative in

order not to remove valid measurements or settings.

For measurements, this removal means that the value

at this point in time is missing afterwards but could

be reproduced by interpolating between the neigh-

bouring measurements. Concerning alarm thresholds,

only changes of these thresholds are recorded in the

MIMIC-III database. Hence, we do not have dedi-

cated threshold information for each parameter mea-

surement. Thus, for alarm threshold, invalid value re-

moval means that the threshold update is lost and the

previous threshold is carried over.

Table 2: Clinically valid ranges for the vital parameters con-

sidered in this work. Adapted from (Harutyunyan et al.,

2019).

Parameter Lower Limit Upper Limit

HR 0 350

NBP

0 375

0 100

To show how the cleaning steps affect the origi-

nal data set, we look at the distributions of thresholds

and measurements for NBP

. NBP

serves only as an

example here. The same cleaning steps were also per-

formed for HR and S

. Figure 1 shows that the

majority of threshold values and measurements are in

the valid range. However, there is still a wide range

of outliers with implausibly high values. These out-

liers are removed by the invalid value removal clean-

ing step. Figure 2 shows a rectiﬁed and more reason-

able value distribution after the outliers are removed.

Apart from values outside clinically relevant

ranges, we also found inconsistencies within the

alarm thresholds. These inconsistencies are essen-

Extracting Alarm Events from the MIMIC-III Clinical Database

329

Figure 1: Boxplots showing the distribution of high alarm

thresholds, low alarm thresholds and measurements before

cleaning. The distribution is vastly skewed with the valid

range barely visible at the far left corner and a wide range

of outliers.

tially periods of time, where the high alarm threshold

is below the low alarm threshold for a vital parameter.

We further distinguish between exact threshold swaps

and threshold overlaps which we both describe in the

following.

Exact Threshold Swaps. In MIMIC-III, changes

in corresponding high and low thresholds are al-

ways recorded simultaneously. Every newly recorded

high threshold is associated with a low threshold be-

ing recorded at the same time and vice versa. At

times, these thresholds are exactly swapped, i.e., the

high thresholds taking the value of the low threshold

and vice versa as shown in Fig. 2a. This, however,

would create an alarm with every further measure-

ment which is why we consider this to be an erroneous

recording that needs rectiﬁcation. Exact threshold

swaps are easily identiﬁed and corrected by swapping

the high and the low threshold as shown in Fig. 2b.

Threshold Overlaps. Besides the exact alarm

threshold swaps, there are also cases where high

and low alarm thresholds overlap but are not exactly

swapped and therefore generate an alarm too. For ex-

ample, a high alarm threshold might be set unreason-

ably low and falls below the corresponding low alarm

threshold as shown in Fig. 3a. At the same time, the

low alarm threshold continues to stay at a reasonable

value. Such an overlap is usually present for short pe-

riods only. These cases cannot be corrected by swap-

ping. Therefore, since high and low alarm thresh-

olds are always recorded pairwise in the MIMIC-III

database, both high and low alarm thresholds are re-

moved in the respective segment where they overlap.

The last clinically meaningful alarm thresholds prior

to the overlapping thresholds are chosen instead, as

shown in Fig. 3b.

2.2 Extracting Alarm Events

In Section 2.1 we described how we cleaned the

CHARTEVENTS table from out-of-range values and

inconsistencies. In this section, we show how we used

the actual measurements of the vital parameters and

their corresponding threshold setting in order to ﬁnd

actual alarm events in the data. As shown in Algo-

rithm 1, we ﬁrst isolated measurements and thresh-

olds for each ICU stay (single stay of a single patient

at the ICU) and each vital parameter. Then, we went

through each high threshold setting and low threshold

setting respectively and checked whether any mea-

surement within the relevant time frame exceeded

the high threshold or falls below the low thresholds.

Whenever this happened, we return either a high or

a low alarm event at the respective measurement’s

timestamp.

A shortcoming of this approach is that the num-

ber of alarms is subject to the sampling frequency of

the respective vital parameter. Higher sampling fre-

quencies produce more alarms because there are more

measurements in a period of time where the vital pa-

rameter is out of range. Figure 4 shows the differences

in the number of samples for the data items listed in

Table 1. Clearly, HR and S

are measured or at

least recorded more often than NBP

. This can result

in an over-representation of HR and S

alarms as

compared to NBP

alarms.

3 RESULTS

In Section 2 we describe methods and algorithms we

used to clean the CHARTEVENTS table and extract

concrete alarm events from it. This results in a data

set containing all patient monitor alarms as per the

MIMIC-III database in its respective observation pe-

riod. In this section, we show some descriptive statis-

tics that are made possible by the extracted data set of

alarm events.

Parameters and Alarm Types. The data set gives

insight into the relative counts of alarms produced

HEALTHINF 2022 - 15th International Conference on Health Informatics

330

(a) Exactly swapped low and high thresholds before cor-

rection. Every measurement in the time period where

the thresholds are swapped would theoretically produce an

alarm.

(b) A data cleaning step removes the exact threshold swap

thus rectifying the alarm threshold. No alarm events will be

recognised in the respective time period.

Figure 2: Example for an exact threshold swap correction.

(a) In this case, the thresholds overlap without being exactly

swapped. Here, the unreasonable low value for the high

threshold would result in all measurements in the respective

period of time triggering a high threshold alarm.

(b) Threshold overlap was corrected by removing the re-

sponsible alarm threshold settings. After correction, no

high alarms are triggered in the respective period of time.

Figure 3: Example for threshold overlap correction.

Figure 4: Comparison of the number of samples for mea-

surements and thresholds for HR, NBP

, and S

. The

number of measurements is much higher than the number

of thresholds in all cases and there are distinct differences

in the number of samples for the different vital parameter

measurements.

by the different vital parameters. Comparing alarm

counts among vital parameters might yield skewed re-

sults due to the differences in sampling frequencies,

as already discussed in Section 2.2. However, com-

paring the counts of high and low threshold alarms for

a single vital parameter yields interesting results. Fig-

ure 5 shows such a comparison. For HR and NBP

violations of the high threshold seem to occur more

often than violations of the low threshold. However,

for S

violations of the low threshold are a lot more

common than violations of the high threshold. This is

to be expected since a high blood oxygen saturation is

rarely a problem while too low blood oxygen satura-

tion is a harmful condition (Silverthorn, 2018).

We also want to emphasise the effect of the clean-

ing steps we performed on the alarm counts. Fig-

ure 6 shows the numerical reduction of alarms for

each alarm type we considered. The alarms that are

Extracting Alarm Events from the MIMIC-III Clinical Database

331

Data: MIMIC-III CHARTEVENTS

Result: List of Alarm Events

foreach ICUSTAY do

foreach Parameter do

msmts := measurements for Parameter and ICUSTAY;

highs := high threshold settings for Parameter and ICUSTAY;

lows := low threshold settings for Parameter and ICUSTAY;

foreach high in highs do

foreach msmt in msmts do

if time(high) <= time(msmt) < time(high+1) then

if value(msmt) > value(high) then

Return a high alarm event at msmt;

end

foreach low in lows do

foreach msmt in msmts do

if time(low) <= time(msmt) < time(low+1 then

if value(msmt) < value(low) then

Return a low alarm event at msmt;

end

Algorithm 1: Algorithm for extracting alarm events from measurements and thresholds.

Figure 5: Comparison of alarm counts by vital parameter

(i.e. HR, NBP

, and S

) and alarm type (i.e. whether a

high or a low threshold was violated).

not present as a result of the cleaning steps are suppos-

edly false alarms. Hence, this shows that the clean-

ing step actually improves the quality of the generated

data set, since the alarms not included after cleaning

are supposedly false alarms.

Alarm Distribution among ICU Stays. The gen-

erated data set shows that the distribution of alarm

events among the ICU stays seems to follow a Pareto

Figure 6: Alarm counts for low and high alarms regarding

HR, NBP

, and S

. Extracted from an uncleaned and

from a cleaned data set, respectively. This ﬁgure shows that

there is a reduction in alarm count due to cleaning for each

type of alarm.

distribution. The majority of patients produce only

a low number of alarms with the interquartile range

(IQR) spanning from 3 to 16 alarms per ICU stay.

However, there are few patients that are responsible

for an excessively high number of alarms as can be

seen in Fig. 7. We considered the 1% of ICU stays

with the highest number alarms to be outliers and

hence to not show them in the plot in an attempt to

HEALTHINF 2022 - 15th International Conference on Health Informatics

332

show the distribution of the remaining 99% per cent

more clearly.

Figure 7: The distribution of alarm counts (only 99%

shown) among the ICU stays follows a Pareto distribution

with few patients generating a large number of alarms and

many patients generating only few alarms.

Differences between Alarm Threshold and Mea-

surement. Patient monitor alarms do not differenti-

ate between strong and slight threshold violations. An

alarm goes off whenever the measurement exceeds a

high threshold or drops below a low threshold. For the

patient monitor, it does not matter whether the dif-

ference between measurement and threshold is high

or low. However, in clinical practice, the difference

is relevant since a parameter slightly out of range is

far less critical than a parameter that has by far left

a physiologically healthy range. Therefore, we in-

vestigated the difference between measurement and

threshold. Figure 8 shows this difference by the ex-

ample of the S

low threshold. Most of the alarms

are caused by only a slight drop of the measurement

below the threshold by a few per cent. On the other

hand, large drops of the S

parameter are rare. The

same pattern of many low differences between mea-

surement and threshold and few large differences are

also to be found when looking at the other parameters,

i.e. HR and NBP

4 DISCUSSION

The analyses we have shown in Section 3 – although

interesting – are only examples for the potential use

cases of the data set that is created by the approach

presented in this paper. Nevertheless, these results are

relevant ﬁndings that can guide further research into

alarm fatigue.

Structural Findings. On ﬁnding is that extensive

post-processing in terms of cleaning and alarm ex-

Figure 8: Histogram of the differences between alarm

threshold and actual measurement for the S

low thresh-

old. The majority of alarms are triggered by a slight drop of

the measurement below the threshold.

tracting is necessary to make sense from the alarm

data in MIMIC-III. This calls for guidelines prescrib-

ing on how to appropriately provide alarm data. Vital

parameters, alarm thresholds, and alarm events both

in terms of threshold alarm and in terms of other

alarms such as arrhythmia alarms need to be taken

into account. Furthermore, data inconsistencies as un-

covered in Section 2.1 need to be avoided. This can

either be avoided on a device level by designing the

interface of the patient monitor in a way that incon-

sistent thresholds are impossible to set. Or, a post-

processing step is required to rectify or remove these

inconsistencies.

Contentual Findings. Apart from ﬁndings related

to the structure and consistency of the data, we also

want to discuss the ﬁndings related to the content of

the generated data set. Figure 7 shows that the major-

ity of patients generate only a low number of alarms

while few patients generate a large number of alarms.

In order to alleviate alarm fatigue, it would be sensible

to conduct further research into what causes these pa-

tients to generate far greater numbers of alarms. Fur-

ther, we showed in Fig. 8 that the majority of alarms

are caused by minimal threshold violations. This ﬁnd-

ing can be used to guide further research. For ex-

ample, patient monitors could take the difference be-

tween measurement and threshold into account in or-

der to adapt the volume of the alarm, as shown in

(Greer et al., 2018). Another option would be to sup-

press or delay alarms caused by slight threshold viola-

tions in order to help the medical staff focus on more

severe emergencies as (Schmid et al., 2013) and (Win-

ters et al., 2018) ﬁnd that alarm delays are an effective

tool to reduce false alarms at the ICU.

Two design decisions are noteworthy in our ap-

proach to data cleaning. First, in the invalid value re-

moval step, we remove measurements and thresholds

Extracting Alarm Events from the MIMIC-III Clinical Database

333

if and only if their values are outside the correspond-

ing valid range. One result of this is that threshold up-

dates might be partially removed, i.e. a high threshold

update being removed while the corresponding low

threshold is retained or vice versa. This is notewor-

thy because thresholds update originally occur only

pairwise in the MIMIC-III database. We decided to

remove only the invalid part of the threshold update

in order to retain as much valid information as possi-

ble.

Second, when removing threshold overlaps, we

decided to always remove both parts (high and low)

of the threshold update because it is not always ob-

vious whether one threshold part remains in a sen-

sible range while the other part deviates or whether

both parts deviate. This can not be determined with-

out making strong assumptions about the nature of

threshold updates. Hence, we decided to always re-

move both parts thus reverting the effective threshold

to the last reasonable threshold update.

Limitations and Threats to Validity. The alarm

event data set we generated from the MIMIC-III

database provides some interesting insights into the

problem of alarm fatigue in medicine. However, there

are some limitations and threats to validity attached to

our approach. The data quality of the generated alarm

events data set is – apart from the cleaning steps we

performed – limited by the data quality of the data

set it is generated from. For example, the sampling

frequencies for the data in the MIMIC-III database

manifest an upper limit for the sampling frequencies

in the alarm events data set. Furthermore, all changes

in sampling frequency, missing data, etc. are also car-

ried over into the alarm events data set. For exam-

ple, higher sampling frequencies in the vital param-

eter measurements will result in a higher number of

alarms. Since the sampling frequencies vary among

vital parameters, as Fig. 4 shows, some alarm types

(e.g. HR) might be over-represented. This has to be

kept in mind when working with the data set.

Future Work. We already discussed the implica-

tions for alarm fatigue research of this work’s ﬁnd-

ings as well as its limitations. Further work needs

to be done in order to validate the ﬁnding from the

MIMIC-III database. Especially, more extensive ICU

databases are needed covering not only vital param-

eters, input and output events, laboratory ﬁndings,

and hospital logistics but also providing data on ICU

alarms.

Until such a database is created, the data set gener-

ated in this work can be used for a variety of purposes,

some of them are demonstrated in Section 3. Among

others, this data set enables quantitative analyses on

alarm events, alarm forecasting, and alarm threshold

recommendation which are to be covered in future re-

search.

5 CONCLUSION

The contribution of the paper is an approach and al-

gorithm to generate alarm events from the MIMIC-

III database. Publishing the generated data set it-

self would have been more convenient for researchers

interested in data on alarm events. However, by

publishing only the algorithm we ensure compliance

with the data protection guidelines of the MIMIC-

III database. Everyone with access to the MIMIC-

III database can apply the algorithm to the database

and thus create the alarm events data set themselves.

The algorithms for data cleaning and alarm extrac-

tion are published on GitHub, see https://github.com/

HPI-CH/mimic-alarms.

ACKNOWLEDGEMENTS

This work was partially carried out within the

INALO project. INALO is a cooperation project be-

tween AICURA medical GmbH, Charit

e – Univer-

sit

atsmedizin Berlin, idalab GmbH, and Hasso Plat-

tner Institute. INALO is funded by the German Fed-

eral Ministry of Education and Research under grant

16SV8559.

REFERENCES

Cvach, M. (2012). Monitor alarm fatigue: An integrative

review. Biomedical Instrumentation & Technology,

46(4):268–277.

Drew, B. J., Harris, P., Z

egre-Hemsey, J. K., Mammone,

T., Schindler, D., Salas-Boni, R., Bai, Y., Tinoco, A.,

Ding, Q., and Hu, X. (2014). Insights into the problem

of alarm fatigue with physiologic monitor devices: A

comprehensive observational study of consecutive in-

tensive care unit patients. PloS one, 9(10):e110274.

Greer, J. M., Burdick, K. J., Chowdhury, A. R., and

Schlesinger, J. J. (2018). Dynamic alarm systems for

hospitals (dash). Ergonomics in Design, 26(4):14–19.

Harutyunyan, H., Khachatrian, H., Kale, D. C., Ver Steeg,

G., and Galstyan, A. (2019). Multitask learning and

benchmarking with clinical time series data. Scientiﬁc

Data, 6(1):1–18.

Hyland, S. L., Faltys, M., H

user, M., Lyu, X., Gumbsch, T.,

Esteban, C., Bock, C., Horn, M., Moor, M., Rieck, B.,

HEALTHINF 2022 - 15th International Conference on Health Informatics

334

et al. (2020). Early prediction of circulatory failure in

the intensive care unit using machine learning. Nature

Medicine, 26(3):364–373.

Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L.-

W. H., Feng, M., Ghassemi, M., Moody, B., Szolovits,

P., Celi, L. A., and Mark, R. G. (2016). MIMIC-III,

a freely accessible critical care database. Scientiﬁc

Data, 3(1):1–9.

McCartney, P. R. (2012). Clinical alarm management.

MCN: The American Journal of Maternal/Child Nurs-

ing, 37(3):202.

Pollard, T. J., Johnson, A. E., Raffa, J. D., Celi, L. A., Mark,

R. G., and Badawi, O. (2018). The eICU collabora-

tive research database, a freely available multi-center

database for critical care research. Scientiﬁc Data,

5(1):1–13.

Schmid, F., Goepfert, M. S., and Reuter, D. A. (2013). Pa-

tient monitoring alarms in the ICU and in the operat-

ing room. Annual Update in Intensive Care and Emer-

gency Medicine 2013, pages 359–371.

Silverthorn, D. U. (2018). Human Physiology: An Inte-

grated Approach. Pearson, 8th edition.

Winters, B. D., Cvach, M. M., Bonaﬁde, C. P., Hu, X.,

Konkani, A., O’Connor, M. F., Rothschild, J. M.,

Selby, N. M., Pelter, M. M., McLean, B., and Kane-

Gill, S. (2018). Technological distractions (part 2):

A summary of approaches to manage clinical alarms

with intent to reduce alarm fatigue. Critical Care

Medicine, 46(1):130–137.

Extracting Alarm Events from the MIMIC-III Clinical Database

335