Automated Input Data Management for Discrete Event Simulation

Models of Emergency Departments

Juan David Mogollon

, Virginie Goepp

, Oscar Avila

and Roland de Guio

Department of Systems and Computing Engineering, School of Engineering, Universidad de Los Andes, Bogota, Colombia

I-Cube, INSA Strasbourg, France

Keywords: Emergency Department, Health Information System, Input Data Management, IDM, Discrete Event Simulation.

Abstract: Emergency Department (ED) overcrowding, that relates to congestion due to the high number of patients, has

a negative effect on patient waiting time. The analysis of the flow of patients through Discrete Event

Simulation (DES), that models the operation of a system through a sequence of events, is a relevant approach

to find a solution to such problem. This technique relies on high-quality input data which needs to be

previously managed in a complete process known as Input Data Management (IDM). The objective of this

research is to propose a tool to automate the IDM process required for DES models of ED. To do so, we used

a case with real data in order to contextualize the problem, evaluate required statistical methods, and gather

specific requirements. Based on these results, a prototype was designed and developed by using web and

cloud application development tools.

1 INTRODUCTION

One of the main problems in Emergency Departments

(ED) is over-crowding, which is according to Duguay

and Chetouane (2007), "the situation in which ED

function is impeded primarily because of the

excessive number of patients waiting to be seen,

undergoing assessment and treatment, or waiting for

departure comparing to the physical or staffing

capacity of the ED". Overcrowding is thus recognized

as a global problem, which has reached crisis

proportions in some countries. It has direct

implications for the well-being of patients and staff,

mainly due to waiting times derived from process

deficiencies, the inappropriate placement of physical

and human resources, and budget restrictions. In

addition, it can affect an institution's financial

performance and reputation (Komashie & Mousavi,

2005).

One of the strategies to mitigate the adverse

effects of overcrowding is using Discrete Event

Simulation (DES) to provide analytical methods to

assess and redesign processes and support data-driven

decision-making. In the ED context, DES models aim

to reproduce the flow of patients and their

relationship with the different areas, personnel, and

resources available to solve specific problems.

The success of DES applications depends on the

prior preparation of high-quality input data. Some of

the event data required in DES are represented in

probability distributions. The parameters describing

the underlying distributions are a key input for the

simulation. The process that involves transforming

raw data into a quality-assured representation of all

parameters appropriate for simulation is known as

Input Data Management (IDM) (Skoogh &

Johansson, 2008).

Input data preparation is one of a DES project's

most crucial and time-consuming tasks (Robertson &

Perera, 2002). According to (Skoogh et al., 2012) the

input data management process consumes about 10-

40% of the total time of a DES project. In most cases,

practitioners manually transform raw data from

different sources into appropriate simulation input

(Robertson & Perera, 2002) and separately from the

software used for the simulation. Automating the data

preparation phase can potentially increase efficiency

in DES projects by integrating data resources

(Skoogh et al., 2012). The IDM required to address

the problem of Overcrowding in ED must consider

automating the estimation of the probability

distributions required to simulate patient flow. These

statistics can be grouped into three categories: Arrival

Patterns, Routing Probabilities, and Processing Times

(Ghanes et al., 2015).

Mogollon, J., Goepp, V., Avila, O. and de Guio, R.

Automated Input Data Management for Discrete Event Simulation Models of Emergency Departments.

DOI: 10.5220/0011718600003467

In Proceedings of the 25th International Conference on Enterprise Information Systems (ICEIS 2023) - Volume 2, pages 31-42

ISBN: 978-989-758-648-4; ISSN: 2184-4992

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

While reviewing the literature, we found no

commercial IDM automation tool. We identified only

three open-source tools with such capabilities,

namely, GDM-Tool (Skoogh et al., 2011), DESI

(Rodriguez, 2015), and KE Tool (Barlas & Heavey,

2016). Although these tools have features for fitting

some statistical distributions, they do not fit all the

possibilities required for ED operation simulation,

such as Markov chains modeling. Some other gaps we

found when reviewing them include the fact that these

tools do not offer features for sharing data and results,

limiting the opportunities for collaboration by

allowing other researchers to replicate the process to

obtain similar outputs. Moreover, the reviewed tools

do not have features for managing projects,

generating data quality reports, handling large

datasets, or running intensive workloads; only

examples with small volumes of data on personal

computers are presented.

To identify potential current challenges in the data

preprocessing tasks in the case of a DES project

studying ED crowding, we analyzed a sample of the

patient flow data of the ED at the Hautepierre

Hospital located in the city of Strasbourg, France.

This analysis had two main objectives: (i) identify the

statistical methods to generate the required inputs in

a simulation of the patient pathway within an ED and

(ii) determine the preparation and validation

requirements that guarantee data quality. As a result,

we identified limitations regarding the automation of

the required processing.

In this context, the research problem addressed by

our research is: how to automate the IDM process for

DES models to address the overcrowding problem in

ED? To deal with this question, this article presents

the architecture of a cloud-based web application for

IDM, in which we provide the statistical methods

required to estimate the parameters needed to

describe the patient flow in ED and the management

features to handle the IDM tasks. Our approach

focuses on the definition and development of the IDM

software and not on the integration of enterprise data

or the simulation itself, as illustrated in Figure 1.

Figure 1: IDM Approach.

The article is organized as follows: section 2 presents

related work in the domain. Section 3 presents the

IDM requirements and evaluation of required

statistical methods from the analysis of the real case.

Section 4 introduces the IDM solution's architecture.

Finally, section 5 presents conclusions and

recommendations for future work.

2 RELATED WORK

Skoogh and Johanson (2008), defined Input Data

Management (IDM) as "the entire process of

preparing quality assured, and simulation adapted,

representations of all relevant input data parameters

for simulation models. This includes identifying

relevant input parameters, collecting all information

required to represent the parameters as appropriate

simulation input, converting raw data to a quality

assured representation, and documenting data for

future reference and re-use”.

Data collection has multiple inherent difficulties

(Bokrantz et al., 2018). Organizations can have

multiple data sources and systems to collect the data

from. Second, accuracy, reliability, and validity are

the analyst’s responsibility when extracting and

preparing the data for the simulation; those

procedures, in most cases, are made manually, which

makes it prone to errors. In a survey presented in

(Robertson & Perera, 2001), it was inquired about the

most frequent issues in simulation projects,

considering data collection issues: 60% of

respondents indicated they manually input the data to

the model; 40% reported they use connectors to an

external system like spreadsheets, text files or

databases.

In summary, as described in (Furian et al., 2018),

the main challenges in this process are, in the first

place, manual data collection and data entry, which

increases the likelihood of data entry errors arising

from human manipulation of data. The inherent

difficulties of the manual process compromise the

quality and integrity of the data. In addition, multiple

manual files handling to maintain and process data

makes it difficult to track errors and reproduce

procedures. Finally, data preparation requires specific

knowledge of techniques and algorithms, which could

lead to misuse of statistical packages and lead to

unexpected outcomes.

We identified only three of open-source

specialized in IDM for DES, GDM-TOOL (Skoogh

et al., 2011), DESI (Rodriguez, 2015) and KE tool

(Barlas & Heavey, 2016). To the extent of our

knowledge, no survey or study compares them, or use

them in the ED context. We analyzed such tools in the

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

following section by using criteria extracted from the

requirements in the case of Hautepierre Hospital.

3 IDM FOR ED

This section focuses on analyzing the IDM tasks

enabling to prepare statistical representations of the

patient flow data of the ED of the Hautepierre

Hospital in Strasbourg (France). In addition, we

evaluate IDM tools that could be used for the

analysis.

3.1 Emergency Department Description

To analyze the overcrowding problem, it is necessary

to know in advance the configuration the ED, which

depends on the needs, staff, capacities, and areas of

the health institution. We sketch thus the main steps

of the patient flow of the Hautepierre Hospital ED in

Figure 2. In the diagram, the patient flow is

represented linearly as patients perform each of the

activities consecutively. However, it is worth

mentioning that there are iterations between the

stages, as patients may require a procedure to be

repeated or an exam to be performed multiple times.

In addition, patients may undergo many different

paths and not necessarily goes through all the steps of

the process. The number and types of diagnosis tests

(blood analysis, RX or CT Scan) depend on the

consultation and are not known as the outset, that is

why we model the pathway using routing

probabilities.

The data were provided in comma-separated

values (CSV) files extracted from the ED databases.

The files contained anonymized records of patients

and the events during their stay in the ED. The

collected data contains records from June 22nd, 2020,

to June 28th, 2020, of the ED flow of 795 patients.

The records include information on the following

events of the patient flow: arrival, triage, blood

analysis (BA) (Coagulation, Hematology,

Biochemistry), Computer tomography (CT) Scan,

and X-rays (RX). The average throughput time is 5,52

patients/h. We consulted the physicians to check the

consistency of data, as we had to deal with multiple

files from different data sources.

The ED uses a severity index for the assignment

of degrees of emergency to decide the priority and the

order of procedures. It considers three levels of

severity identified by colors: red for immediate care,

orange for cases that can be taken care of within the

hour, and green for not urgent care. According to each

severity level, the patients are assigned to one of three

zones in the ED.

Figure 2: Hautepierre Hospital ED BPM.

Figure 3: Estimate hourly arrival rate.

Automated Input Data Management for Discrete Event Simulation Models of Emergency Departments

3.2 Statistical Analysis

3.2.1 Analysis Description

Different types of metrics are required to simulate an

ED, which can be grouped into three categories:

Arrival Patterns, Routing Probabilities, and

Processing Times (Ghanes et al., 2015).

Arrival patterns. The arrival patterns refer to the

measurements made to the patients at the time of

entering the emergency room. The metric used in this

case is the arrival rate per hour/day.

For the modeling we consider 𝑁𝑡 the number of

patients arriving at the emergency room at a particular

time 𝑡. It is assumed that patients arrive randomly and

independently. In that case, it is possible to model the

patient count as a Poisson process of parameter 𝜆.

However, when considering the temporal dependence

of the counts, it can be considered as a Non-

Homogeneous Poisson process with rate 𝜆𝑡. For the

estimation it is assumed that the rate is piecewise

constant on a set of time independent intervals. Given

that 𝑁𝑡 is a Poisson process with rate 𝜆𝑡 the

distribution of the interarrival time follows an

exponential distribution of parameter 𝜆𝑡.

Routing probabilities. For the estimation of

routing probabilities, we consider the sequence of

events observed in the data as a Markov Process

(Baum & Petrie, 1966), in which each state represents

one event in the process, such as triage, or blood test,

among others. The transition probabilities associated

with the Markov Chain are in consequence, the

routing probabilities. For the verification of this

model, the following hypothesis tests on thproperties

of the chain are considered: Markov property, order,

and stationarity of the transition probabilities, and

sample size (Anderson & Goodman, 1957).

Processing times. In the case, three elements can

be distinguished in the processing times. The waiting

times from the prescription of the exams to the

moment they are performed, the time it takes to

complete the exam and the additional waiting time to

get the results. Once the data were adequately

arranged, we iterate over a set of continuous

distributions to identify the one with the best fit for

each variable. To test the goodness of fit, we used the

Kolmogorov- Smirnoff test (Massey, 1951) in which

the null hypothesis evaluates that the data follow

some specific distribution

3.2.2 Results Analysis

Arrival Patterns. Firstly, the non-homogeneous

Poisson process was estimated and the parameter 𝜆

was determined for all the one-hour intervals. Figure

3 shows the behavior of the parameter for all the days

of the week, which can be evidenced by the bands of

greater congestion and the peaks of patient arrivals

during the day. The x-axis indicates the hours of the

day, and the y-axis is the number of patients. The

curves represent the behavior of the intensity

parameter for each day of the week. From these

arrival patterns it is possible to construct the

distribution of arrivals and inter-arrivals per hour,

following the deduction mentioned above.

Routing Probabilities. The chain states are

represented as the nodes of the Figure 4, which

describes the transition matrix that indicates the

probability of moving from one state to another. We

can see that after triage, for example, the probability

that a patient undergoes a blood test is 0.48, while not

going through any stage is 0.44. Patients do not

usually go directly from Triage to RX or MRI CT

Scan; generally, to obtain these tests, a blood test is

performed beforehand, where 70% are referred to one

of these two tests. For Markov property, the Q

statistic is 4535.63 and the p-value is: 1.0. Taking

alpha = 0.05 there is statistical evidence to not reject

H_0, there is no evidence to reject the hypothesis that

the process satisfies the Markov property.

Figure 4: General routing probabilities.

Processing Time. After triage, the subsequent most

frequent examination is a blood test. The collected

samples are used for three evaluations, Biochemistry,

Hematology, and Coagulation. For the Biochemistry

blood analysis, we consider the distinction by severity

index, and plot the histogram and the fitted

distributions as seen in Figure 5.

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

Figure 5: Biochemistry BT duration.

3.3 Requirements

We gathered requirements in terms of user stories

through manual process analysis and interviews with

an expert doctor and two DES researchers.

Functional:

(1) Manage Input Data: As a user, I want to

manage my datasets so I can analyze it on the

platform. Acceptance criteria: Upload resource,

encrypt and version files.

(2) Check Data Quality: As a user, I want to check

the quality of my dataset so I can make sure my

dataset is appropriate for simulation. Acceptance

criteria: Perform data quality checks, generate data

quality reports.

(3) Process Data: As a practitioner user, I want to

process my input data and obtain statistical

representations of my variables in a compatible

format for the simulation software. Acceptance

criteria: Display variables, fit distribution, display

results, export results.

(4) Reproducibility: As a user, I want to replicate

the analysis and results of my projects so I can make

research reproducible. Acceptance criteria: Public

repository, share data, share results.

Non-functional:

(5) Availability: As a user, I want the site to be

available 99.9 percent of the time I try to access it so

that I can trust the application and need another and

do not need another app.

(6) Scalability: As a user, I want the application to

support traffic spikes and several simultaneous

processing requests so that no delays in the results

should happen.

(7) Security: As a user, I want the application to

ensure encryption for all my data so that is no risk

associated with the exposure of confidential

information.

3.3.1 Existing IDM Tools

The analysis of existing IDM tools is made through

criteria extracted from the requirements identified

before. We identified only three tools: GDM-TOOL

DESI (Skoogh et al., 2011), KE tool (Barlas &

Heavey, 2016) and DESI (Rodriguez, 2015). The

specific gap between the tools’ characteristics and the

requirements are presented in Table 1.

Table 1: Evaluation of IDM tools.

Category Criteria GDM KE DESI

Manage input

data

Manage projects No Yes Yes

Load data Yes No No

Encrypt files No No No

Version files No No No

Data storage No No Yes

Data collection Yes No Yes

Check data

quality

Check data

ualit

No Yes No

Reports No No No

Process data

User interface Yes No Yes

Visualization No Yes Yes

Fit distributions Yes Yes Yes

Fit Markov chain No No No

Hypotesis testing No No No

Display results Yes Yes Yes

Export results Yes Yes Yes

Reproducibility

Public repository No Yes No

Cloud available No No No

Share data No No No

Share results No No No

Non-functional

Availability No No No

Scalability No No No

Security No No No

Manage Input Data: All the tools have data

loading features, GDM-Tool (Skoogh et al., 2011),

and DESI (Rodriguez, 2015), have features for data

collection and use a database for storage. None of the

tools has encryption, and versioning features.

Check Data Quality: The comparison of the tools

in this criterion showed that only KE Tool (Barlas &

Heavey, 2016) has methods for evaluating the input

data. None of the tools has features to generate reports

on the quality of the input data.

Process Data: it was found that all the tools have

features for exporting data, displaying results, and

adjusting statistical distributions. GDM-Tool

(Skoogh et al., 2011), and DESI (Rodriguez, 2015)

have a user interface. KE Tool (Barlas & Heavey,

2016) and DESI (Rodriguez, 2015) show graphs of

Automated Input Data Management for Discrete Event Simulation Models of Emergency Departments

the obtained distributions. Although the KE Tool

(Barlas & Heavey, 2016) does not have a user

interface, it is possible to generate graphs from the

code in the development environment. None of the

tools adjust specific distributions such as Markov

Chain or evaluate the hypothesis of their properties.

Reproducibility: KE Tool (Barlas & Heavey,

2016), is available in a public repository. However,

none of the tools is available in the cloud, and they do

not have features for sharing data and results

obtained.

Non-Functional: The evaluated applications are

desktop apps, where availability, security, and

scalability are not concerned.

4 ARCHITECTURE

We present our architectural decisions through

different diagrams (reference, context, deployment,

application layers, deployment in AWS) to provide a

top-level view of a software’s structure representing

the principal design and understanding of the

problem. We propose adopting cloud infrastructure

instead of on-premises infrastructure by considering

three characteristics of the former model:

manageability, scalability, and cost. In the first place,

there is an intrinsic responsibility for managing the

entire hardware and software stack on the on-

premises setting, which implies the ownership and

administration of servers, databases, networks, and

containers, among others. The cloud services take

away the burden of managing the required

infrastructure, the provisioning, and maintenance of

software (Narasayya & Chaudhuri, 2021). In a cloud

environment, provisioning instances to meet the

desired response time is easily configurable. In an on-

premises setup, scalable architecture is possible but is

limited to implemented resources and budget. In

addition, in cloud setup, there can be a reduction in

the total cost of ownership (TCO) mainly regarding

capital expenditure (Capex) (Qian et al., 2009).

The software architecture described in the

following subsections will adopt a fully decoupled

microservice pattern to cover two main non-

functional concerns: availability and scalability. Each

element can be scaled independently in a cost and

time-effective manner in this architecture, which is

essential when handling several simultaneous

processing requests and large datasets. In addition, it

is also more manageable to maintain due to its

relatively smaller size.

4.1 Software Architecture

Figure 6: Software architecture.

This section describes each area of the software

architecture that we propose in Figure 6, namely,

IDM, analytics, reports, projects, and information.

Input Data Management

: Data management:

The system should provide a mean to ingest high

volumes of data, persist it and store it securely.

Data quality: The service must outline data quality

issues and provide visualizations and reports to the

user. Data processing: The platform must process

all the data according to the user configuration and

apply convenient transformation for analytical

purposes.

Analytics

: Dashboards allow the user to quickly

gain insights into the critical metrics and

information relevant to him. It also provides

means for identifying potential issues that require

imminent

action. Statistical analysis: It provides

summary statistics of variables, fits statistical

distributions, estimates parameters, and test goodness

of fit hypothesis. Data visualization: Visualization

techniques provide the user with a clear representation

of information to get quick data insights.

Reports: Export report: It enables the user to have

a portable version of the results of the data quality

inspection, data processing, and the statistical analysis

in html format. Generate data: The platform has to

provide the user a mechanism to generate synthetic

data that mimic the system’s original data according

to the statistical distributions of the processes

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

Figure 8: Deployment diagram.

Project

: Project management: Projects allow

the user to organize and centralize the resources,

and arrange data

, analysis, and reports. Version

control: it keeps track of versions of the projects and

their resources in an organized manner.

Authorization: It allows the administrator to manage

roles and permissions over the project’s context. It

provides a good way to secure files. Share and search:

Provide mechanisms for indexing and cataloging data

sources and analysis objects in order to facilitate the

searching and sharing of files.

Information: Users: The database dedicated to

centralizing the user’s data, roles, and authorizations.

Projects data: A dedicated database for project data

management. Processing: All the data allocated in

memory during the processing. File storage: The

excepted contents of the system are the original data

sources, transformed data, metadata, parameters,

results, reports.

4.2 Context of the Application

Figure 7: Context diagram.

The intended users are researchers and practitioners

involved in the DES field. It considers three types of

users: administrators, practitioners, and guest users as

seen in Figure 7. The administrator and practitioners

can login, create

projects, manage input data, version

resources, share resources, analyse data, generate

reports, generate data quality reports. Practitioners

require an invitation from an administrator for

creating an account. Guest do not need an account but

require an invitation from

an administrator to see a

project and generate reports.

4.3 Deployment of the Application

Figure 8 shows the system structure, the current

understanding of the artifacts and how they will be

deployed. First, the user accesses the application

through a web client hosted in the cloud, specifically

in a front-end component of the application. A load

balancer act as a firewall and manage application

loads, then a there is a back-end component based on

three microservices: User, Projects, and Processing,

analytics, and reports. All of them have a dedicated

database, and the last one has also an object storage.

4.4 Prototype

The application prototype was built using two

lightweight software development frameworks,

FastAPI (backend) and React.js (frontend). These are

appropriate for this type of application as it is a small

application with simple logic.

Microservices. The application has three

microservices containerized using Docker, which

hosts a REST-API created using the FastAPI python

framework that exposes the service methods. Each of

these microservices can access the storage services on

one side to the Postgres database hosted in AWS

RDS.

Automated Input Data Management for Discrete Event Simulation Models of Emergency Departments

Elements of the User Interface (UI). It includes

basic elements, such as input controls, navigation and

information components and containers, to offer a

simple interactive experience in which IDM is

presented through a series of steps that do not require

further configuration by the user to obtain the data

reports (see Figure 9).

Figure 9: Data processing steps.

The first step is the creation of a project, in which the

user must enter a name and description of the project

and a label to reference. Then the user can load the

files required to be processed following the template

provided for this purpose. Once the file is loaded, if it

corresponds to the structure required, the

corresponding validations on the data will be

executed. The next step is the data processing where

the statistics are estimated. The last step corresponds

to the menu where the user can share private links to

the resources or download the reports. Regarding the

data processing functionality, estimated parameters

of the arrival process, the inter-arrival, the routing

probabilities, waiting and processing times

distributions are generated online under demand. The

results are presented in a dashboard containing three

pages, one for each analysis category.

Figure 10 shows an example of the dashboard

with the properties of the Markov chain. It shows the

results of evaluating the properties of the Markov

chain and the transition matrix associated with the

process. Two buttons allow the user to go to the other

sections for navigation within the dashboard.

Clicking on the routing probabilities option displays

the dashboard with three main elements. First, some

information cards are shown with the results of the

hypothesis tests that verify the properties of the

Markov chain, as are the Markov property, order, and

stationarity. The processing microservices perform

the calculations, and the results are presented to the

user in cards as seen in Figure 10. The third page of

the dashboard presents a drop-down list from which

it is possible to choose the activities that are used as

the starting point of an activity, process or waiting

time. Then a graph is displayed with the histogram of

the data, the kernel density, and the fitted density.

4.5 Deployment in AWS

The web application proposed in this article uses

some of the tools available in Amazon Web Services,

which follow the usual deployment patterns for the

proposed architecture. Route53 is used to register the

website domain and to redirect traffic. In addition, we

use Elastic Load Balancing (ELB), the service

managed by Amazon, for load balancing between

applications, in order to distribute the application

traffic, determine the scaling of resources on demand,

and keep hidden the IPs of the microservices where

the user information is hosted. To manage the

containers where the microservices of the application

are hosted, Elastic Container Registry is used to

facilitates the deployment using Fargate. The

processing microservice has access to the data hosted

in a RDS database instance, which can scale

horizontally under demand. For Front-End content

delivery we use CloudFront, the low-latency, highly

available and secure content delivery network service

that does not depend on a particular region. The

Simple Storage Service (S3) is used to store the files

that users import into the application, as well as the

reports that are generated on data quality and statistics

processed by the application.

Figure 10: Dashboard elements: Markov chain properties and routing probabilities.

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

Figure 11: Deployment diagram in AWS.

Several security measures were considered to secure

and protect user data. First of all, the application uses

Virtual Private Cloud (VPC) to group the computing

resources and database of the project securely. The

output to the Internet of the VPC resources is through

internet Gateway only. Also, the object storage S3

encrypts all original user data. On the other hand, the

load balancer acts as a firewall so that the APIs that

expose the microservices cannot be accessed directly.

In addition, the proposed architecture considers the

use of public and private subnets so that the response

data of the microservices does not go directly to the

Internet but through a Nat-Gateway that routes the

response. Last but not least, the credentials, API keys,

and

database access data were stored encrypted in

AWS Secrets Manager. Furthermore, to promote

service availability, it was planned to deploy the

application in two availability zones: us-east-1a and

us-west-2b. Moreover, to ensure that the application

responds to intensive workloads, an autoscaling

group was defined to increase the number of

processing instances if needed.

Regarding the client-side design, all requests

made from the client side are handled by FastAPI.

Within the application, design React handles two

types of users: administrators who are in charge of

creating projects and guests. For the verification and

authorization of administrators, both client-side and

server-side, an access token creation system, JWT,

was used. React is responsible for the user interface

and for making the corresponding HTTP requests to

the application server.

For the server-side design, the model presented in

Figure 11 shows all the elements involved on the

server-side. The boxes group together: the AWS

cloud represented by the AWS cloud logo, the Virtual

Private Cloud (VPC), the availability regions

identified by the texts "us-east-1a", "us-east-1b". The

green background distinguishes the private subnets

by the blue background, and finally, the Elastic

Container Service (ECS) cluster is distinguished by

the ECS logo.

Users access the application via internet through

the website whose domain is registered with Route53.

Once accessed, the site content is delivered by Cloud

Front, which queries the static Front- End files stored

in S3. In AWS, the application has a VPC to isolate

the compute resources, storage, and subnets securely.

The point of contact of the VPC with the Internet is

the Internet Gateway. Requests received through this

point are directed to the load balancer responsible for

redirecting the request to the containers hosted in

Fargate. The containers host the application’s

microservices and access the RDS database, S3 object

storage, and SES mail delivery services.

The microservices are hosted on a private subnet,

which means they cannot be accessed directly and

cannot deliver request responses directly to the

Internet. It is necessary to use the Nat gateway

available on the public subnet. The proposed

architecture complies with several elements to secure

servers and user data, such as using a VPC, private

and public subnets, a load balancer that serves as a

firewall and encrypting the data in S3 in the RDS.

Likewise, the multi-region deployment favors the

Automated Input Data Management for Discrete Event Simulation Models of Emergency Departments

availability of the application. In terms of scalability,

the ECS cluster configuration has an auto-scaling

policy to increase the number of instances required

for processing in case of traffic spikes.

4.6 Validation

The application’s validation consisted of verifying

requirements with end-users. Table 2 shows the result

of reviewing the requirements, the microservices, and

the AWS components used to satisfy each

requirement. Validation involved expert input from

two simulation specialists and hands-on involvement

from two students working with real data. End-user

feedback was gathered through periodic validation

sessions during prototype development. As we can

see in Table 2, all requirements were satisfied through

the application. In addition, the end-users transferred

knowledge about ED processes and data, proposed

and evaluated data analysis strategies, and requested

features for visual components, navigation, and data

results.

Table 2: Mapping of requirements and architecture’s areas

and components.

Feature Container AWS Service

Registry Users Fargate

Delete account Users Fargate

Create project Projects Fargate

Invite User Projects Fargate

Search project Projects Fargate, RDS

Delete project Projects Fargate

Manage data Processing Fargate

Download Data Processing Fargate, S3

Data quality Processing Fargate

Dashboard Processing Fargate. CloudFront

Process Data Processing Fargate, RDS, S3

Security

Subnets, Load

Balancer, Autoscaling

Grou

Scalability

Load Balancer

Autoscaling

Availability Multi AZ

Functional Test. Benchmark against KE-Tool.

One of the tests performed to validate the reliability

of the results was to process and obtain the probability

distributions using another IDM tool. The test

consisted only of estimating the probability

distributions of the data, as the other characteristics of

the applications are not directly comparable. We

compared the results of the statistical distributions

generated by the application against those generated

using the KE tool, the only tool we found in a public

repository. The comparison was performed in terms

of the KL divergence, the observed values were close

to zero so we can affirm that there is no divergence,

which means that the information distributions found

in both programs are similar.

Non-Functional Requirements: As mentioned

earlier, we considered non-functional scalability,

availability, and security requirements. The proposed

architecture facilitates storing and processing files

and scaling on-demand. The application achieves this

through the use and configuration of amazon’s

processing and storage services appropriate for this

application, such as the object storage service and the

configuration of the auto-scaling groups. On the other

hand, optimized libraries were used for distributed

data processing, thus reducing processing times. The

availability and security of the application is achieved

by implementing AWS services, such as RDS multi-

AZ, availability zones, VPC, subnets, and load

balancer. The availability is assured using two

availability zones, and the security requirements are

covered with object storage encryption.

Load Test. To assess the non-functional

scalability requirement, we tested the processing

microservice's ability to handle a certain number of

HTTP requests per minute. We assume the system is

completely degraded when a failure rate greater than

99% occurs. The web service https://loader.io/ was

used to send the requests to the necessary endpoints.

The auto-scaling configuration enabled up to 5 ECS

tasks. We observed the system completely degrade

when receiving 1500 requests in 15 seconds. And also

if more than 6000 requests are received in two

seconds. We found evidence of using the maximum

number of enabled instances, thus satisfying the

requirement.

5 DISCUSSION

This section discusses the advances that the proposed

application brings concerning the previous

propositions in the literature as well as our

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

contributions to meet the needs identified from data

analyzed.

Qualitative features comparison. The proposed

solution has storage services such as databases for

microservices. It uses s3 object storage to store all

user information. The results obtained can be

reproduced on-demand and facilitate the sharing of

the original files, unlike DESI (Rodriguez, 2015),

which has a temporary storage of the processed

records. Additionally, the prototype integrates a

feature for checking data quality that performs unit

tests on the input data, unlike the KE Tool (Barlas &

Heavey, 2016) that performs unit tests only on the

methods of the classes that perform the processing.

Unlike KE Tool (Barlas & Heavey, 2016) and

DESI (Rodriguez, 2015), the prototype has a more

robust user interface with navigation, information,

and visualization elements, that facilitates

visualization of distributions, particularly the

Markov Chain analysis results. Decisions are

presented using the usual agile architecture diagrams

such as context, deployment, components, classes,

and soon unlike KE Tool (Barlas & Heavey, 2016),

which illustrates a single high-level diagram of the

system elements

The way in which the components fulfill the

requirements is described below.

Manage Projects: The project microservice has

methods for creating projects. Once a project has

been created, the user can invite other users to view

the content of the dashboards through a private link,

which is generated in the project microservice. When

a user wants to invite another user to the application,

a record of the guest’s email address is saved in the

database, then an email invitation to the project is

sent to the user.

Fit Distribution: In the dashboard, in the section

of the adjustment of probability distributions, once

the user selects a particular activity, the microservice

is responsible for identifying the best distribution.

Check Data Quality: Rules were generated to

validate the data at a stage before processing. It is

worth mentioning that these rules do not limit the

user to continue with the statistics generation process

but serve to alert the user to avoid compromising the

results of the estimations due to errors in the data.

User Interface: The processing microservice has

a dashboard developed in Dash and Plotly that has

three pages where all the statistics generated are

displayed. This microservice is a python module in

which each page of the dashboard is an independent

module, which facilitates the maintenance and

editing of the visual components. The data is

presented in graphs and tables, making it easy for the

user to quickly learn about the data distributions.

Markov Chains Validation: We implemented

features for fitting Markov Chains and performing

hypothesis testing to verify Markov chain properties.

6 CONCLUSION

By reviewing the literature and examining the real

data, we defined the basic requirements that an IDM

solution for DES should fulfil such as: managing the

input data, verifying the quality of the data,

processing and presenting process statistics in

dashboards. We also analyzed probability

distributions to be implemented in such application by

using a real case. The proposed solution introduces

therefore a cloud architecture that satisfies the

requirements based on a microservices pattern that

will enable high performance, availability, scalability,

and security.

The novelty of this paper is the integration of

Markov chain modeling to IDM, the proposed cloud

architecture, the design, development and testing of

the software, and the implementation with real data in

the context of ED. The built application has elements

that had not been previously used in similar tools,

such as cloud computing services, containers, unit

testing on data and interactive visualization.

Additionally, the application implements

straightforward and intuitive navigation tools in order

to benefit user experience.

As future work, the results obtained in the

evaluation of the properties of Markov chains rise to

the question on how to approach the preparation of

data for simulation models that consider routing

probabilities for the problem of overcrowding in ED.

Last, it would be desirable to adjust the code so that

the processing is generic for data of similar data

sources where IDM is required, such as

manufacturing.

REFERENCES

Anderson, T. W., & Goodman, L. A. (1957). Statistical

inference about Markov chains. The annals of

mathematical statistics, 89-110.

Baum, L. E., & Petrie, T. (1966). Statistical inference for

probabilistic functions of finite state Markov

chains. The Annals of Mathematical Statistics, 37(6),

1554–1563.

Barlas, P., & Heavey, C. (2016). KE tool: an open source

software for automated input data in discrete event

Automated Input Data Management for Discrete Event Simulation Models of Emergency Departments

simulation projects. In 2016 Winter Simulation

Conference (WSC) (pp. 472-483). IEEE.

Bokrantz, J., Skoogh, A., Lämkull, D., Hanna, A., & Perera,

T. (2018). Data quality problems in discrete event

simulation of manufacturing operations. Simulation,

94(11), 1009-1025.

Duguay, C., & Chetouane, F. (2007). Modeling and

improving emergency department systems using

discrete event simulation. Simulation, 83(4), 311-320.

Furian, N., Neubacher, D., O’Sullivan, M., Walker, C., &

Pizzera, C. (2018). GEDMod–Towards a generic

toolkit for emergency department modeling. Simulation

Modelling Practice and Theory, 87, 239-273.

Ghanes, K., Jouini, O., Jemai, Z., Wargon, M., Hellmann,

R., Thomas, V., & Koole, G. (2015). A comprehensive

simulation modeling of an emergency department: A

case study for simulation optimization of staffing

levels. In Proceedings of the Winter Simulation

Conference 2015 (pp. 1421-1432). IEEE.

Komashie, A., & Mousavi, A. (2005). Modeling emergency

departments using discrete event simulation techniques.

In Proceedings of the Winter Simulation Conference,

2005. (pp. 5-pp). IEEE.

Massey, F. J. (1951). The Kolmogorov-Smirnov Test for

Goodness of Fit. Journal of the American Statistical

Association, 46(253), 68–78.

Narasayya, V., & Chaudhuri, S. (2021). Cloud Data

Services: Workloads, Architectures and Multi-

Tenancy. Foundations and Trends in Databases 10(1),

pp 1-107.

Qian, L., Luo, Z., Du, Y., & Guo, L. (2009). Cloud

Computing: An Overview. Cloud Computing, First

International Conference, CloudCom. 626-631.

Robertson, N. H., & Perera, T. (2001). Feasibility for

automatic data collection. In Proceeding of the 2001

Winter Simulation Conference (Cat. No. 01CH37304)

(Vol. 2, pp. 984-990). IEEE.

Robertson, N., & Perera, T. (2002). Automated data

collection for simulation?. Simulation Practice and

Theory, 9(6-8), 349-364.

Rodriguez, C. (2015). Evaluation of the DESI interface for

discrete event simulation input data management

automation. International Journal of Modelling and

Simulation, 35(1), 14-19.

Skoogh, A., & Johansson, B. (2008). A methodology for

input data management in discrete event simulation

projects. In 2008 Winter Simulation Conference (pp.

1727-1735). IEEE.

Skoogh, A., Michaloski, J., & Bengtsson, N. (2011).

Towards continuously updated simulation models:

combining automated raw data collection and

automated data processing. In Proceedings of the 2010

Winter Simulation Conference (pp. 1678-1689). IEEE.

Skoogh, A., Johansson, B., & Stahre, J. (2012). Automated

input data management: evaluation of a concept for

reduced time consumption in discrete event simulation.

Simulation, 88(11), 1279-1293.

ICEIS 2023 - 25th International Conference on Enterprise Information Systems