A Comparison of Multivariate SARIMA and SVM Models for
Emergency Department Admission Prediction
Alexander Zlotnik
1,2
, Juan Manuel Montero Martínez
1
and Ascensión Gallardo-Antolín
3
1
Department of Electronic Engineering, Politecnic University of Madrid, ETSI Telecomunicación,
Ciudad Universitaria, 28040 Madrid, Spain
2
Ramón y Cajal University Hospital, C/ de Colmenar Viejo, km 9, 100, 28031 Madrid, Spain
3
Department of Signal Theory and Communications. Carlos III University, C/ Madrid, 126, 28903 Getafe, Spain
Keywords: Forecasting, Emergency Service, Emergency Department, Hospital, Operations Research, SVM, Time
Series Analysis, ARIMA, SARIMA.
Abstract: A comparison of multivariate SARIMA model with a multivariate regression-based time series based on a
Support Vector Machine model was performed for emergency department admissions prediction. The same
input variables were used in both models. Both models were trained with consecutive daily samples of data
corresponding to the January 2009 – August 2012 period (n=1339). Performance was evaluated on the
September 2012 test dataset (n=30). The results obtained with the Support Vector Machine were found to be
more accurate with a 46,53% RMSE improvement and a 48,89% MAE improvement on the train set. The
experiment was repeated six times with varying time periods. The SVM approach produced better results in
all cases. Error measurements on the test set were compared with a paired T test. The differences between
all comparisons were found to be statistically significant in all cases with a 95% CI.
1 INTRODUCTION
Specialized emergency care volume is, by its very
nature, hard to predict and requires a large amount
of healthcare resources in all developed countries. A
flexible and easily adaptable model for emergency
department (ED) admission prediction would be of
great use for healthcare managers.
Emergency care admission prediction has been a
problem extensively studied by several approaches,
although the predominant trend has consisted in the
usage of autoregressive time series, such as
SARIMA (seasonal ARIMA). These models are
based on the constant variance assumption, which
does not hold in emergency ward arrivals and
admissions for long periods of time (Monte et al.,
2002). Therefore, although short term predictions for
total arrivals seem to be possible, long term
predictions have unacceptably high errors. ARIMA
models are also limited by linearity assumptions. A
systematic review of regression-based, exponential
smoothing and ARIMA time series models for ED
prediction (Wargon et al., 2009) suggests a simple
regression model called the “calendar method”
(Batal et al., 2001) is preferable since it is one of the
simplest and has been found to have similar
accuracy to more complex models. Also, ARIMA
time series models have been found to be unreliable
when hospital managers need them most – in times
of high demand “bursts” (Jones et al., 2002).
However, some recent multivariate models have
been used to successfully predict short-term ED
crowding and short-term ED census (Schweigler et
al., 2009).
In order to improve predictive capabilities,
research has been performed on environmental
factors affecting emergency medical services
demand of cardiovascular (Metzger et al., 2004) and
respiratory pathologies (Stieb et al., 2009) as well as
the effect of heat waves (Schaffer et al., 2011).
However, including these variables in predictive
models is problematic since weather forecasts have
limited validity and often lack the exact variables
used in these models. Also, environmental factors
have not been found to be significant in models
which try to predict overall admissions (Wargon et
al., 2009); (Jones et al., 2002); (Sun et al., 2009).
Machine learning techniques have been used to
predict bed demand, which is a similar but harder to
model phenomenon. A hybrid ARIMA and neural
approach had promising results (Joy and Jones,
2005), although relatively little research has been
245
Zlotnik A., Montero Martínez J. and Gallardo-Antolín A..
A Comparison of Multivariate SARIMA and SVM Models for Emergency Department Admission Prediction.
DOI: 10.5220/0004326102450249
In Proceedings of the International Conference on Health Informatics (HEALTHINF-2013), pages 245-249
ISBN: 978-989-8565-37-2
Copyright
c
2013 SCITEPRESS (Science and Technology Publications, Lda.)
performed in this direction.
Our approach was to compare multivariate
ARIMA and SVM-based time series prediction
models.
2 MATERIALS AND METHODS
Our final goal was to build a model which allowed
admission prediction, which would serve as a
decision support system for hospital management.
Although an SVM-based approach was
preferable, its performance had to be compared to
the performance of ARIMA models, which had been
widely used for ED admission prediction.
The SPSS Time Series Expert Modeler had been
used in previous research with reasonable accuracy
for a similar problem (Sun et al., 2009), hence we
decided to replicate the approach. Weka software
was chosen for the SVM time series modeling
approach due to its ubiquity and flexibility. The
same independent variables were used in both
models.
2.1 Study Setting
2.1.1 Data Selection and Analysis
The Ramon y Cajal University hospital is a 1100-
bed tertiary care referral center with all medical
specialties excepting obstetrics. Its emergency
department (ED) provides urgent care 24 hours a
day in three shifts.
Notably, less than 13% percent of ED admissions
were hospitalized in the January 2009 – September
2011 period. This is a relatively common pattern in
Spanish hospitals (Palanca-Sánchez et al., 2010) and
is a proxy for an inadequacy in ED usage, i.e. most
patients could have received care in primary or
secondary care. However, from a predictive point of
view, a high affluence of low severity admissions is
likely to be more seasonal and exhibit better
memory, and hence might be easier to predict based
on seasonality factors.
ED admissions data was obtained from the
Central Hospital Information System, which, in the
case of the Ramon y Cajal University Hospital, is
the HP-HIS software. This HIS is a database-centric
application based on the Solaris 8 operating system
with an Informix database backend developed with
the MULTIBASE tool. Although based on old
technologies by current standards, this software is
still widely used in many Spanish hospitals.
A total of 1369 daily samples were obtained. The
total number of admissions was obtained for each
day. Independent variables were computed for each
sample. 1339 samples were used for the train set in
the January 2009 – August 2012 time period. The
remaining 30 samples (September 2012) were used
as the test set.
Descriptive statistics were obtained with SPSS
version 15.
2.1.2 Dependent Variable Analysis
ED admissions follow an almost normal distribution
(figure 1). The Shapiro-Wilk test yielded a
significant result.
Figure 1: ED admissions distribution.
However, the admission time series exhibits
seasonal trends and periods of high volatility (figure
2).
Figure 2: ED admissions time series.
However, admissions cluster in periods of high
and low volatility and variance is not constant across
the time series.
2.1.3 Independent Variable Selection
Environmental data was found to be of little use in
previous research, however vacation days were
found to be statistically significant (Jones et al.,
2002); (Sun et al., 2009); (McCarthy et al., 2008);
(Abraham et al., 2009), hence they were included in
the model.
HEALTHINF2013-InternationalConferenceonHealthInformatics
246
2.2 Model Adjustment
2.2.1 SARIMA Model Adjustment
Week number (52 weeks a year) and week day (7
days a week) seasonality levels were defined in
SPSS. Automatic outlier detection was enabled and
holiday days were introduced as an independent
variable. An ARIMA (2,5,1) (1,0,1) model was fit to
the data. The examination of residuals (figure 3)
shows that the model fit is adequate.
Figure 3: SARIMA PACF and ACF residuals.
Stationary R-squared on test data was 0.697.
Error statistics for the test set were computed using
an Excel 2002 spreadsheet since SPSS does not
allow for an automatic split between train and test
sets.
2.2.2 SVM Model Adjustment
Weka uses a regression-based time series approach.
This approach is considered more flexible by some
authors (Darlington, 1990). Regression-based time
series models easily allow the inclusion of cyclical
factors and with the usage of SVMs non-linear
trends can be better modelled (Mukherjee et al.,
1997).
Year, week number, month, day of week
variables were added to the model in order to add
seasonality information.
Holiday and weekend independent variables
were added to the model similarly to the SPSS
model.
The number of lag terms to be included in the
regression was set at 60 as higher values were found
to produce overfitting and hence worse results on the
test set.
A radial basis function (RBF) kernel (Smola and
Schölkopf, 2004) was selected for the SVM
(Shevade et al., 2000). The software was configured
to produce predictions with a 95% confidence
interval.
Error statistics were computed for the test set
(Table 1). A visual inspection of the series fit
confirmed the adequacy of the approximation
(Figure 4).
Figure 4: SVM model test set fit.
2.2.3 SARIMA and SVM Accuracy
Comparison
The SVM model was deemed to be more accurate
than the SARIMA model. These differences were
higher on the test set.
Table 1: Train set model comparison.
SARIMA SVM %
MAE
20.405 16.253 20.35%
RMSE
26.242 23.312 11.17%
MAPE
5.440 4.540 16.54%
Table 2: Test set model comparison.
SARIMA SVM %
MAE
31.800 16.253 48.89%
RMSE
38.453 20.560 46.53%
MAPE
9.805 4.749 51.57%
Figure 5: SVM model test set prediction.
AComparisonofMultivariateSARIMAandSVMModelsforEmergencyDepartmentAdmissionPrediction
247
To asses the differences between both
approaches with varying time windows, the
experiment was replicated 6 times calculating error
indicators in all cases. With each new repetition, the
latest month was removed and a new split between
train and test sets was introduced using the latest
remaining month as test set. In all cases, models
were re-calculated with identical independent
variables and input parameters for both SPSS and
Weka. MAE, RMSE and MAPE were calculated in
all cases for the test set for both approaches. The
SVM approach produced better results in all cases.
Table 3: Model comparison.
Mean St.Dev. St.Err. p
MAE
9.829 8.389
3.425
0.0349
RMSE
10.759 9.055
3.697
0.0339
MAPE
2.882 2.667
1.089
0.0456
A paired T-test was performed in order to
compare the differences between the ARIMA and
SVM approaches. The differences between all
comparisons were found to be statistically
significant in all cases with a 95% CI.
3 RESULTS
An evaluation of SARIMA and regression-based
SVM prediction models for ED arrivals has been
performed.
The SARIMA approach produced low error fits
on the train set, however the errors on the test set
were higher than with the SVM approach. In order
to generalize this approach, testing on different
hospital datasets is necessary; however our empirical
evidence shows promising results.
Further development will lead to the construction
of an automated ED admission prediction system
based on the SVM approach. Due to the violation of
stationarity conditions, ARIMA ED admission
predictive models have to be regularly re-generated
in order to be useful (Sun et al., 2009). This can be
due to the frequent variability of factors which
influence ED arrivals. Changing emergency care
patterns, notably a higher percentage of medium and
high clinical severity cases are likely to lower the
time series accuracy. This variability would also
affect the SVM model non short-term predictions.
Hence, this system will automatically re-calculate
the SVM model frequently and produce daily
forecasts, as this is easily achievable with the Weka
software package.
4 CONCLUSIONS
Roca and Vilardell have shown that for certain
datasets, emergency ward arrivals do not follow a
Poisson distribution, are self-similar and have a
fractal nature (Monte et al., 2002) over long periods
of time. Constant variance assumptions do not apply
and therefore the process cannot be assumed to be
stationary. Furthermore, the usage of queue and
Markov chain models, which are widely used in ED
computer simulations, is likely to yield inadequate
results when compared with actual ED patient flow.
Although a reasonable accuracy has been
achieved for short-term predictions in our dataset,
the practical applicability of time of both ARIMA
and SVM-based time series models presented in this
paper is nevertheless problematic since neither of
these is likely to successfully predict “burst” or
periods of high demand, where predictions are most
needed (Jones et al., 2002). However, the SVM
approach is still more likely to yield better results as
a “burst” mode may be included with extra
independent variables and non-linear modeling.
Models able to stratify admission predictions in
severity levels are more useful for healthcare
management. An hourly model would also allow for
better crowding management and prediction. Also,
the SVM approach should be compared to more
sophisticated time series models which can be fit to
high volatility periods such as GARCH and its
variations. Further research will try to address these
issues.
REFERENCES
Monte, E., Roca, J. and Vilardell, L. On the self-similar
distribution of the emergency ward arrivals time
series. Fractals-an Interdisciplinary Journal on the
Complex Geometry 10, 413-428 (2002).
Wargon, M., Guidet, B., Hoang, T. D. and Hejblum, G. A
systematic review of models for forecasting the
number of emergency department visits. Emerg Med J
26, 395-399, doi:26/6/395 [pii] (2009).
Batal, H., Tench, J., McMillan, S., Adams, J. and Mehler,
P. S. Predicting patient visits to an urgent care clinic
using calendar variables. Academic Emergency
Medicine 8, 48-53 (2001).
Jones, S. A., Joy, M. P. and Pearson, J. Forecasting
demand of emergency care. Health Care Management
Science 5, 297-305 (2002).
Schweigler, L. M. et al. Forecasting models of emergency
department crowding. Acad Emerg Med 16, 301-308,
doi:ACEM356 [pii] (2009).
Metzger, K. B. et al. Ambient air pollution and
cardiovascular emergency department visits.
HEALTHINF2013-InternationalConferenceonHealthInformatics
248
Epidemiology 15, 46 (2004).
Stieb, D. M., Szyszkowicz, M., Rowe, B. H. and Leech, J.
A. Air pollution and emergency department visits for
cardiac and respiratory conditions: a multi-city time-
series analysis. Environ Health 8, 25 (2009).
Schaffer, A., Muscatello, D., Broome, R., Corbett, S. and
Smith, W. Emergency department visits, ambulance
calls, and mortality associated with an exceptional heat
wave in Sydney, Australia, 2011: a time-series
analysis. Environmental Health 11, 3 (2012).
Sun, Y., Heng, B. H., Seow, Y. T. and Seow, E.:
Forecasting daily attendances at an emergency
department to aid resource planning. BMC Emergency
Medicine 9, 1 (2009).
Joy, M. P. and Jones, S.: in ESANN'2005 proceedings -
European Symposium on Artificial Neural Networks
13th.
Palanca-Sánchez, I., Elola-Somozam, J. and Mejía-
Estebaranz, F.: Unidad de urgencias hospitalarias:
Estándares y recomendaciones. Informes, estudios e
investigación. Madrid: Ministerio de Sanidad y
Política Social (2010).
McCarthy, M. L. et al. The challenge of predicting
demand for emergency department services. Academic
Emergency Medicine 15, 337-346 (2008).
Abraham, G., Byrnes, G. B. & Bain, C. A. Short-term
forecasting of emergency inpatient flow. IEEE
Transactions on Information Technology in
Biomedicine 13 (2009).
Darlington, R. B. Regression and linear models.
(McGraw-Hill New York, 1990).
Mukherjee, S., Osuna, E. and Girosi, F. in Neural
Networks for Signal Processing [1997] VII.
Proceedings of the 1997 IEEE Workshop, 511-520
(IEEE).
Smola, A. J. and Schölkopf, B.: A tutorial on support
vector regression. Statistics and computing 14, 199-
222 (2004).
Shevade, S. K., Keerthi, S., Bhattacharyya, C. and Murthy,
K. R. K. Improvements to the SMO algorithm for
SVM regression. Neural Networks, IEEE Transactions
on 11, 1188-1193 (2000).
AComparisonofMultivariateSARIMAandSVMModelsforEmergencyDepartmentAdmissionPrediction
249