These limitations are especially relevant to build
an on-demand video surveillance system (VSS) with
computer vision algorithms that involve the
deployment of Deep Neural Networks (DNNs), as
their complexity is significantly higher for hardware
platforms with limited computational resources
(Bianco et al., 2018). Thus, very recently important
efforts are being done towards the goal of optimizing
DNNs, such as new methodologies (Elordi et al.,
2018) (Frankle and Carbin, 2019), new
microprocessor classes (e.g., Intel's VPUs (Intel,
2019) and Google's TPUs (Google, 2019) and new
software tools for DNN model optimization included
in deep learning frameworks (e.g., TensorFlow
(Google, 2020), PyTorch (Facebook, 2019) and
OpenVINO (Intel, 2020)). However, since both DNN
optimization techniques and serverless architectures
are still at early stages, few works have yet focused
on the optimal deployment of DNNs on serverless
platforms, tackling simultaneously the characteristics
of both components.
Our main motivation is to help building cost-
effective on-demand VSSs, leveraging (1) the latest
advances of DNN optimization techniques for
inference purposes along with (2) tailored
deployment strategies to make the most of current
FaaS architectures. Although this paper is focused on
optimal DNN deployment in serverless
environments, our approach considers the security
and privacy measures to preserve the biometric data
on VSS environments (Biometrics Institute, 2020).
This work represents a step forward in distributed
computational VSS infrastructures and the Video-
Surveillance-as-a-Service (VSaaS) paradigm (Limna
and Tandayya, 2016). We have taken AWS Lambda
(Baird et al., 2017) as the baseline to design our
methodology.
2 SERVERLESS VSS PLATFORM
The FaaS platforms are materialized in function
instances, which have two stages (Baird et al., 2017).
The first stage begins when the FaaS function is
invoked for the first time, creating an isolated runtime
environment with the necessary resources. This
process takes additional time to be completed and,
consequently, this stage is called the cold start stage.
When the container initialization is finished, the
remaining function instances are executed
concurrently. This second stage is called the warm
stage.
Wrong management of resources in the
initialization process and the concurrent instances
could drastically increase the cold start and serverless
execution (warm stage) time. Therefore, the key is to
identify strategies to minimize processing time in
both stages for a good quality of service. In the
following, we summarize the current performance
strategies presented in the literature (Baird et al.,
2017) (Bardsley et al., 2018).
• Concise function logic if 3rd-party
dependencies are required, avoid using open-
source packages. Since their general-purpose
and 3rd-party interdependency nature, open-
source packages include more functionalities
than required and, thus, can cause a significant
slowdown in cold start time and increase
processing time.
• Third-party dependencies: limit the space and
the use of third-party libraries to match the
serverless function storage limitations.
• Resource management: limit the
reinitialization of local variables on every
instance. Instead, use global/static variables or
singleton patterns to handle the application
scope variables.
• Allocated function memory: finding the trade-
off between the configuration of computing
resources and execution cost can be the key to
optimal serverless execution.
• Language agnostic advice: the interpreted
programming languages achieve faster initial
invocation time, while compiled languages
perform best in the warm stage.
• Keep the container in the warm state: make
preconfigured periodical calls to serverless
functions to avoid changing to a cold stage.
Although these strategies are available for general
serverless architectures, the complexity of DNN
models (Bianco et al., 2018) requires a deep analysis
of DNN model deployment to cope with the
serverless platform limitations. With that purpose, we
present a FaaS architecture with tailored DNN
optimization strategies to maximize inference
efficiency.
The proposed serverless architecture is illustrated
in Figure 1, together with the lifecycle of the
processing pipeline, where each processing task is
numbered from 1 to 11. This pipeline contains two
main components: the initialization process (from
step 1 to 7) and the on-demand invocation task (from
step 8 to 11). The event controller shown in the
architecture represents the event-triggering design of
FaaS platforms (see Figure 1). In this context, each
input-image source triggers an event to the FaaS
function. In terms of security, the images are stored