readers can check out our Github repository, which
provides links to all of our artifacts as well as easy-to-
setup environments, to try out our sample scenarios.
The remainder of the paper is organized as follows:
Section 2 describes the system simulated in SimFaaS
in detail. Section 3 outlines the design of SimFaaS
with the most important design choices and character-
istics. Section 4 lists some of possible use cases for
SimFaaS. In Section 5, we present the experimental
evaluation of SimFaaS, validating the accuracy of the
simulator. Section 6 gives a summary of the related
work. Finally, Section 7 concludes the paper.
2 SYSTEM DESCRIPTION
In this section, we introduce the management system
in serverless computing platforms, which has been
fully captured by the serverless simulator presented in
this paper.
Function Instance States:
according to recent stud-
ies (Mahmoudi and Khazaei, 2020a; Mahmoudi and
Khazaei, 2020b; Wang et al., 2018; Figiela et al., 2018;
Mahmoudi et al., 2019), we identify three states for
each function instance: initializing, running, and idle.
The initializing state happens when the infrastructure
is spinning up new instances, which might include set-
ting up new virtual machines, unikernels, or containers
to handle the excessive workload. The instance will
remain in the initializing state until it is able to handle
incoming requests. As defined in this work, we also
consider application initializing, which is the time
user’s code is performing initial tasks like creating
database connections, importing libraries or loading a
machine learning model from an S3 bucket as a part of
the initializing state which needs to happen only once
for each new instance. Note that the instance cannot
accept incoming requests before performing all initial-
ization tasks. It might be worth noting that the applica-
tion initializing state is billed by most providers while
the rest of the initializing state is not billed. When
a request is submitted to the instance, the instance
goes into the running state. In this state, the request is
parsed and processed. The time spent in the running
state is also billed by the serverless provider. After the
processing of a request is over, the serverless platform
keeps the instances warm for some time to be able
to handle later spikes in the workload. In this state,
we consider the instance to be in the idle state. The
application developer is not charged for an instance
that is in the idle state.
Cold/Warm Start:
as defined in previous
work (Lloyd et al., 2018; Wang et al., 2018;
Figiela et al., 2018), we refer to a request as a cold
start request when it goes through the process of
launching a new function instance. For the platform,
this could include launching a new virtual machine,
deploying a new function, or creating a new instance
on an existing virtual machine, which introduces
an overhead to the response time experienced by
users. In case the platform has an instance in the idle
state when a new request arrives, it will reuse the
existing function instance instead of spinning up a
new one. This is commonly known as a warm start
request. Cold starts could be orders of magnitude
longer than warm starts for some applications. Thus,
too many cold starts could impact the application’s
responsiveness and user experience (Wang et al.,
2018). This is the reason a lot of research in the field
of serverless computing has focused on mitigating
cold starts (Lin and Glikson, 2019; Bermbach et al.,
2020; Manner et al., 2018).
Autoscaling:
we have identified three main autoscal-
ing patterns among the mainstream serverless comput-
ing platforms: 1) scale-per-request; 2) concurrency
value scaling; 3) metrics-based scaling. In scale-per-
request Function-as-a-Service (FaaS) platforms, when
a request comes in, it will be serviced by one of the
available idle instances (warm start), or the platform
will spin up a new instance for that request (cold start).
Thus, there is no queuing involved in the system, and
each cold start causes the creation of a new instance,
which acts as a tiny server for subsequent requests. As
the load decreases, to scale the number of instances
down, the platform also needs to scale the number of
instances down. In the scale-per-request pattern, as
long as requests that are being made to the instance
are less than the expiration threshold apart, the in-
stance will be kept warm. In other words, for each
instance, at any moment in time, if a request has not
been received in the last expiration threshold units of
time, it will be expired and thus terminated by the
platform, and the consumed resources will be released.
To enable simplified billing, most well-known pub-
lic serverless computing platforms use this scaling
pattern, e.g., AWS Lambda, Google Cloud Functions,
IBM Cloud Functions, Apache OpenWhisk, and Azure
Functions (Wang et al., 2018; Van Eyk et al., 2018).
As scale-per-request is the dominant scaling technique
used by major providers, in this paper, we strive to
simulate the performance of this type of serverless
platform.
In the concurrency value scaling pattern (Google
Cloud Platform Inc., 2020), function instances can re-
ceive multiple requests at the same time. The number
of requests that can be made concurrently to the same
instance can be set via concurrency value. Figure 1
shows the effect of concurrency value on the autoscal-
CLOSER 2021 - 11th International Conference on Cloud Computing and Services Science
24