FaaS Benchmarking over Orama Framework’s Distributed Architecture

Leonardo Rebouc¸as de Carvalho

, Bruno Kamienski

and Aleteia Araujo

Computer Science Department, University of Brasilia, Campus Darcy Ribeiro, Brasilia, Brazil

Keywords:

Distributed Benchmark, Function-as-a-Service, Orama Framework.

Abstract:

As the adoption of Function-as-a-Service-oriented solutions grows, interest in tools that enable the execution

of benchmarks on these environments increases. The Orama framework stands out in this context by offering

a highly conﬁgurable and scalable solution to aid in provisioning, running benchmarks and comparative and

statistical analysis of results. In this work, a distributed architecture for the Orama framework is presented,

through which it is possible to run benchmarks with high levels of concurrency, as well as with bursts of

geographically dispersed requests, just as in real environments. The results of the experiment showed that the

proposed architecture was able to divide the loads between the distributed workers and able to consolidate

properly in the return of the results. In addition, it was possible to observe speciﬁc characteristics of the

providers involved in the experiment, such as the excellent performance of Alibaba Function, whose average

execution time was the lowest of the tests and free of failures. Google Cloud Function and AWS Lambda

recorded intermediate results for average execution time and recorded failures. Finally, Azure Function had

the worst results in average execution time and cold start.

1 INTRODUCTION

Worldwide end-user spending on public cloud ser-

vices is forecast to grow 20.7% to total $591.8 bil-

lion in 2023, up from $490.3 billion in 2022 (Gartner,

2022). All this perspective of growth shows the so-

lidity of this area of computing. Although the main

cloud models, such as Infrastructure-as-a-Service

(IaaS) (MELL and Grance, 2011) and Platform-as-a-

Service (PaaS) still lead the focus of investment, other

models that integrate the archetype of Everything-as-

a-Service (XaaS) (DUAN et al., 2015) also indicate

growth. In this context, serverless-based cloud com-

puting models such as Function-as-a-Service (FaaS)

(Schleier-Smith et al., 2021) have been predicted as

the main programming paradigms of the next genera-

tion of the cloud.

In this scenario, there has been a growing in-

terest in evaluations and benchmark tools that ana-

lyze the deliveries of providers, such as Sebs (Copik

et al., 2021), PanOpticon (Somu et al., 2020), FaaS-

Dom (Maissen et al., 2020), BeFaaS (Grambow et al.,

2021) and Orama framework (Carvalho and Araujo,

2022). However, given the wide range of possibili-

https://orcid.org/0000-0001-7459-281X

https://orcid.org/0000-0001-5817-8694

https://orcid.org/0000-0003-4645-6700

ties for adoptable strategies by providers, as well as

the different regions in which these services can be

implemented, this type of study needs to be increas-

ingly dynamic and adjustable to real-world situations.

Therefore, this work presents a distributed architec-

ture for the Orama framework (Carvalho and Araujo,

2022) in which it is possible to run benchmarks in

FaaS environments, exploring a wider range of sce-

narios than was possible in the standalone version.

Through the master/workers architecture, it is possi-

ble to divide workloads between several instances, in-

cluding geographically dispersed ones, allowing both

the expansion of assessable levels of concurrency, as

well as expanding the capacity to represent a reality

of distributed demand.

In experiments, it was veriﬁed that the distributed

architecture allows high levels of requests in compar-

ison to the standalone version. In addition, it was

found that the performance calculated between the

scenario whose workers were in the same region and

the scenario in which the workers were geograph-

ically spread maintained equivalent results, demon-

strating that the separation of workers, in the pro-

posed architecture, does not affect the analysis of the

results. In addition, it was possible to observe be-

haviors intrinsic to the services, such as the superior

performance demonstrated by Alibaba Cloud Func-

Rebouças de Carvalho, L., Kamienski, B. and Araujo, A.

FaaS Benchmarking over Orama Framework’s Distributed Architecture.

DOI: 10.5220/0011948400003488

In Proceedings of the 13th International Conference on Cloud Computing and Services Science (CLOSER 2023), pages 67-77

ISBN: 978-989-758-650-7; ISSN: 2184-5042

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

tions. AWS Lambda and Google Cloud Function both

had intermediate results. Azure services, on the other

hand, obtained the worst results, presenting high av-

erage execution times and cold start.

This article is divided into six parts, the ﬁrst be-

ing this introduction. Section 2 presents the theoreti-

cal foundation that supports the work. Section 3 de-

scribes the distributed implementation of the Orama

framework. Section 4 presents the related works. Sec-

tion 5 shows the results obtained and ﬁnally, Section

6 presents the conclusions and future work.

2 BACKGROUND

Thenceforward NIST (MELL and Grance, 2011) de-

ﬁned the traditional cloud models in 2011 as IaaS,

PaaS and SaaS, the main public cloud providers began

to name their services with another set of acronyms,

which culminated in the emergence of the term

“XaaS” to deﬁne Everything-as-a-Service (DUAN

et al., 2015). In 2014 AWS launched Lambda, which

then inaugurated the Function-as-a-Service (FaaS)

model (Motta. et al., 2022). In the FaaS paradigm,

the customer delivers a piece of code of interest to

the provider, generally written in some language sup-

ported by the provider. In addition, the client must

conﬁgure a trigger to activate the function and this

trigger can happen from other services that make up

the provider’s platform or through a REST API.

Since FaaS is designed to run state-independently,

some restrictions are imposed on customers, such as

limits on allocable vCPUs, RAM and maximum exe-

cution time. Another aspect that signiﬁcantly distin-

guishes FaaS from other cloud service models is the

billing method. Instead of being charged based on the

execution time of instances, as in the IaaS model, in

FaaS the customer will only pay based on the activa-

tion of the service and its respective duration.

Major public cloud providers have FaaS offerings.

In addition to AWS Lambda, which is the forerunner

of this concept, it is possible to ﬁnd solutions from

Azure, GCP, Alibaba Cloud, among others. Azure

Function (AZF) is Azure’s entry into this slice of the

cloud market. Google Cloud Function (GCF) is the

name of GCP’s FaaS. Alibaba Cloud offers Alibaba

Function Compute (AFC) as its FaaS. Other providers

such as Oracle and IBM also offer FaaS options.

Faced with so many FaaS options, it is essential

to understand how the strategies adopted by providers

deliver environments. Considering the large number

of regions in which the main providers are installed,

it is possible that implementation differences between

providers or between regions can meet different needs

or may even be impeding, unfeasible or expensive.

Since these strategies are the provider’s big trade se-

crets, the best way to elicit these strategies is by run-

ning benchmarks. As these environments are highly

dynamic and maintain constant evolution, it is impor-

tant that the solutions that promote benchmarks on

these platforms are conﬁgurable to the point of bet-

ter representing the problems found in real environ-

ments. The development of solutions aimed at the

current multicloud context should favor the incorpo-

ration of new providers and their services as they en-

ter this market. Because of this, it is essential to use a

cloud orchestration solution.

2.1 Infrastructure Tools

Terraform (HashiCorp, 2021) is a cloud orchestrator

that helps solutions integrate with different virtualiza-

tion and automation platforms, especially in cloud en-

vironments. Through Terraform it is possible to cre-

ate lightweight and portable infrastructure deﬁnition

artifacts that can be easily incorporated into other so-

lutions. Other cloud orchestration solutions, such as

Heat (Gupta et al., 2014) and CloudFormation (Wit-

tig and Wittig, 2018) propose similar approaches to

Terraform, however, as evaluated in (Carvalho and

Araujo, 2020) Terraform presents better results.

Performance testing (Erinle, 2013) is a type of

testing intended to determine the responsiveness, re-

liability, throughput, interoperability, and scalability

of a system and/or application under a given work-

load. It could also be deﬁned as a process of deter-

mining the speed or effectiveness of a computer, net-

work, software application, or device. Testing can

be conducted on software applications, system re-

sources, targeted application components, databases,

and a whole lot more. It normally involves an auto-

mated test suite, such as JMeter (Erinle, 2013), as this

allows for easy, repeatable simulations of a variety of

normal, peak, and exceptional load conditions. Such

forms of testing help verify whether a system or ap-

plication meets the speciﬁcations claimed by its ven-

dor. Technology solutions currently need to deal with

large and fast ﬂows of information and any instability

in the service can lead to loss of information. Because

of this, it is important to use queuing mechanisms in

order to guarantee the correct processing of requests.

Apache Kafka (Sax, 2018) is a scalable, fault-

tolerant, and highly available distributed streaming

platform that can be used to store and process data

streams. The Kafka cluster stores data streams, which

are sequences of messages/events continuously pro-

duced by applications and sequentially and incremen-

tally consumed by other applications. The Connect

CLOSER 2023 - 13th International Conference on Cloud Computing and Services Science

API is used to ingest data into Kafka and export data

streams to external systems such as distributed ﬁle

systems, databases, and others. For data stream pro-

cessing, the Streams API allows developers to specify

sophisticated stream processing pipelines that read in-

put streams from the Kafka cluster and write results

back to Kafka. Apache Kafka is a solution that ad-

heres to the microservices paradigm, that is, the de-

velopment model in which the solution’s complexity

blocks are segmented into smaller processes. In or-

der to manage this large amount of services, without

the need to deal with several virtual machines, it is

possible to use container-oriented environments.

Docker (Ibrahim et al., 2021) enables the con-

tainerization of a software package along with asso-

ciated conﬁguration and setup details. Such contain-

ers can be easily and rapidly deployed while avoid-

ing compatibility issues. In fact, a recent study re-

ports that Docker can speed up the deployment of

software components by 10-15 folds. Much of to-

day’s applications are multi-component (i.e., multi-

container) applications. For instance, a simple web

application would require a web server and a database

component. Docker Compose (Ibrahim et al., 2021),

a natural progression of Docker, enables practitioners

to compose such complex applications. Applications

transcribe such compositions in a Docker Compose

ﬁle, where components are speciﬁed by describing

their Docker image and associated conﬁguration as

well as the relations between components. The var-

ious services of a solution supported by a container

environment can generate large volumes of data that

needed processing techniques to make any sense.

2.2 Statistical Analysis Tools

Statistical analysis of benchmark results allows the

observation of several phenomena. The factorial de-

sign (Jain, 1991), for example, helps to identify the ef-

fect of mapped factors on the results. With the facto-

rial design, it is possible to identify whether the vari-

ation of any factor in a given scenario causes (or does

not) any statistically signiﬁcant impact on the results.

It is also possible to identify the existence of factors

that were not initially mapped and that may have a

signiﬁcant inﬂuence on the results. Another impor-

tant statistical analysis tool for understanding the re-

sults is the paired t-test. In this test, the difference

between two results is evaluated in order to determine

its statistical signiﬁcance, as well as the respective de-

gree of conﬁdence. This test allows the user to estab-

lish whether the difference between two results can

be considered relevant or insigniﬁcant, in the second

case the results can be considered statistically equal.

With the purpose of implementing a speciﬁc

benchmark solution for a FaaS environment that ad-

heres to different scenarios as close to real ones as

possible, this work uses tools such as Terraform, JMe-

ter, Apache Kafka, Docker, factorial design, and t-test

to propose a distributed architecture for Orama frame-

work, whose detailed description will be presented in

the next section.

3 DISTRIBUTED ORAMA

FRAMEWORK

The Orama framework (Carvalho and Araujo, 2022)

is a tool developed to conduct benchmarks on FaaS

environments. Although it is possible to run bench-

marks on other types of environment, its focus is di-

rected towards the evaluation of cloud environments

oriented to the FaaS paradigm. Through the Orama

framework, it is possible to provision environments

in an automated way, thanks to its integration with

Terraform, which makes the incorporation of new

providers as simple as building a Terraform infras-

tructure deﬁnition artifact. Once provisioned, the

environment can also be de-provisioned using Ter-

raform automations through the Orama framework.

In addition, the entire process of running the bench-

marks is conducted by the framework based on the

settings entered in the system. It is possible to create

several test scenarios varying the level of concurrency,

the number of repetitions of a scenario, the establish-

ment of intervals between tests and the execution of

warm-up requests, whose objective is to observe the

cold start phenomenon, which is very common, and

impactful in FaaS environments.

The Orama framework comes with some pre-

conﬁgured use cases that can be deployed to major

providers without any user intervention. These use

cases range from a simple calculator, whose objective

is only to validate the implementation and correct op-

eration of the service, as well as use cases that use

other services from providers, such as object storage

and databases. The use cases accept parameteriza-

tions that allow the provisioning of different conﬁg-

urations for the use cases, from the amount of alloca-

ble memory to the region of the provider where the

use case must be implemented.

The standalone version of the framework pre-

sented at (Carvalho and Araujo, 2022) has all of its

components implemented in a container environment,

including the “benchmarker” which is the component

responsible for triggering bursts of requests that sim-

ulate real demands on the environments. It turns out

that using this approach, the level of concurrency that

FaaS Benchmarking over Orama Framework’s Distributed Architecture

Figure 1: Distributed Orama framework architecture.

the Orama framework can simulate on the environ-

ment is limited to the amount of resources available

on the machine where it is installed. Furthermore, test

requests compete with framework management traf-

ﬁc, as framework components also communicate via

HTTP.

In this work, a distributed architecture for the

Orama framework is presented, as shown in Figure

1. In this architecture, it is observed that the bench-

marker component is deployed outside the main envi-

ronment and therefore its installation can occur in an-

other instance. Furthermore, the number of instances

of the benchmarker is variable and can be adjusted

to the characteristics of the test to be performed. It

is possible, for example, to concentrate the bench-

marker’s workers in the same region or to distribute

them among several regions. It is also possible to

include a number of workers, whose load distribu-

tion of requests prevents the saturation of resources

on the machine, preventing the occurrence of faults

attributed to the test execution worker.

Communication between the remote benchmark-

ers (workers) and the main Orama framework en-

vironment (master) composed of frontend, backend,

database, and orchestrator is done through Kafka, as

can be seen in Figure 1. There are three types of “top-

ics” that are managed by Kafka. The “Health check”

topic allows remote workers to inform the master that

they are able to receive triggers to run benchmarks.

Once health is up to date, this worker will be consid-

ered in the distribution of loads of a respective bench-

mark and then a topic with the “uuid” of the respective

worker will be included in Kafka by the backend of

Figure 2: Kafka workﬂow in Orama framework.

the Orama framework, to be consumed by the respec-

tive worker and executed, as can be seen in Figure

2. After running the benchmark, the remote worker

records the partial result and inserts it into a third

Apache Kafka topic, for consumption by the master’s

backend. It is worth noting that the Orama framework

implements a balance between the number of requests

requested for each worker to execute on the target en-

vironment.

Figure 2 shows a scenario with three workers and

a request to run a benchmark with 2048 concurrent

requests. It is possible to notice that all workers are

asked for 682 requests, which adds up to 2046 re-

CLOSER 2023 - 13th International Conference on Cloud Computing and Services Science

quests. This leaves 2 requests that are assigned to

one of the workers, in the case shown in Figure 2 the

“Remote Benchmarker Worker 01”. As soon as the

backend perceives the receipt of all partial results, it

promotes the consolidation of the results. If these re-

sults do not appear within a conﬁgured timeout, then

the respective partial result is assigned a failure result.

Once the results are consolidated, it is possible to

create factorial designs combining two benchmarks

and two levels of concurrence. If the benchmarks in-

volve two different providers, it will be possible to ob-

serve the inﬂuence of the providers on the results, as

well as the difference from the concurrence. Using the

factorial design as established by the Orama frame-

work, it is possible to identify whether the strate-

gies adopted by the providers impact the results and

whether the level of concurrence is the factor whose

inﬂuence prevails. It is still possible to identify the ex-

istence of some other factors apart from the provider

and the level of concurrence inﬂuencing the results,

in this case the portion of the inﬂuence related to the

statistical error will be signiﬁcantly high.

The distributed architecture of the Orama frame-

work presented in this work opens up a wide range

of possibilities for running benchmarks on FaaS en-

vironments. At this moment of leveraging this ap-

proach, it is essential to have a tool available that

allows the evaluation of environments delivered by

providers, including high levels of concurrence and

distributed customers, whose characteristics are more

similar to critical situations in the real world experi-

enced by solutions supported by FaaS environments.

In the next section, the results of an experiment us-

ing the aforementioned architecture will be presented.

The capabilities of the Orama framework using a dis-

tributed approach and the insights obtained from the

results of the experiment will be illustrated.

4 RELATED WORKS

This article presents a distributed architecture for ex-

ecuting benchmarks in FaaS environments over the

Orama framework. The related work is discussed

from the perspective of benchmarking FaaS platforms

in general as well as work on serverless benchmark

frameworks. A comparison between related works is

presented in Table 1.

In the paper (Back and Andrikopoulos, 2018) the

authors used a microbenchmark in order to investigate

two aspects of the FaaS: the differences in observable

behavior with respect to the computer/memory rela-

tion of each FaaS implementation by the providers,

and the complex pricing models currently in use.

They used AWS, IBM, GCP, Azure, and OpenWhisk

in their evaluation. However, the authors did not

present an evaluation of the performance of their mi-

crobenchmark in different regions of the providers,

nor with different levels of concurrence, as presented

in this work.

The quality impacts of operational tasks in FaaS

platforms as a foundation for a new generation of

emerging serverless big data processing frameworks

and platforms are evaluated in (Kuhlenkamp et al.,

2019). The authors presented SIEM, a new evaluation

method to understand and mitigate the quality impacts

of operational tasks. They instantiated SIEM to eval-

uate deployment package and function conﬁguration

changes for four major FaaS providers (AWS, IBM,

GCP, and Azure), but only in European regions for the

same level of concurrence. In this work, on the other

hand, several levels of concurrence are evaluated us-

ing two different worker approach (concentrated and

distributed).

PanOpticon (Somu et al., 2020) provides a com-

prehensive set of features to deploy end-user business

logic across platforms at different resource conﬁgu-

rations for fast evaluation of their performance. The

authors conducted a set of experiments testing sep-

arate features in isolation. An experiment compris-

ing a chat server application was conducted to test

the effectiveness of the tool in complex logic scenar-

ios in AWS and GCP. Furthermore, in this work, the

range of tests that the Orama framework can evalu-

ate was extended beyond the execution of benchmarks

on AWS and GCP, to include the execution of bench-

marks on Azure and Alibaba, which are two other im-

portant players in this market.

FaaS-dom (Maissen et al., 2020) is a modular set

of benchmarks for measuring serverless computing

that covers the major FaaS providers and contains

FaaS workloads (AWS, Azure, Google, and IBM). A

strong element of FaaS-dom’s functions is that they

were created in a variety of languages and for a va-

riety of providers. However, the operations that the

FaaS-dom functions carry out can be viewed as basic,

and they lack Orama’s approach to integration with

other cloud services.

BeFaaS (Grambow et al., 2021) offers a bench-

mark methodology for FaaS settings that is applica-

tion centric and focuses on evaluating FaaS apps us-

ing real-world and prevalent use cases. It offers en-

hanced result analysis and federated benchmark test-

ing, where the benchmark application is split across

several providers. It does not, however, provide a su-

perior approach to statistical analysis, such as the fac-

torial design or t-test that are covered by this study.

In paper (Barcelona-Pons and Garc

ıa-L

opez,

FaaS Benchmarking over Orama Framework’s Distributed Architecture

Table 1: Related works.

Paper Providers

Conﬁgu-

rable

Factorial

Design

T-test

Distri-

buted

(Back and

Andrikopoulos,

2018)

AWS, IBM, GCP, Azure, and

OpenWhisk

No No No No

(Kuhlenkamp et al.,

2019)

AWS, IBM, GCP, and Azure No No No No

(Somu et al., 2020) AWS and GCP No No No No

(Grambow et al.,

2021)

AWS , GCP, Azure, TinyFaaS,

OpenFaaS, and OpenWhisk

Yes No No No

(Barcelona-Pons and

Garc

ıa-L

opez, 2021)

AWS, IBM, GCP, and Azure No No No No

(Copik et al., 2021) AWS, Azure, and GCP No No No No

(Wen et al., 2020) AWS, Azure, GCP, and Alibaba No No No No

(Carvalho and

Araujo, 2022)

AWS and GCP Yes Yes Yes No

This paper AWS, Azure, GCP, and Alibaba Yes Yes Yes Yes

2021) the authors analyzed the architectures of four

major FaaS platforms: AWS Lambda, AZF, GCP,

and IBM Cloud Functions. The research focused on

the capabilities and limitations the services offer for

highly parallel computations. The design of the plat-

forms revealed two important traits inﬂuencing their

performance: virtualization technology and schedul-

ing approach. This work, on the other hand, focuses

on investigating the differences in performance of the

main providers with different levels of concurrence.

In the Serverless Benchmark Suite (Sebs) (Copik

et al., 2021), typical workloads are chosen, the im-

plementation is tracked, and the infrastructure is as-

sessed. The benchmark’s applicability to several com-

mercial vendors, including AWS, Azure, and Google

Cloud, is guaranteed by the abstract concept of a FaaS

implementation. Based on the executed test cases,

this work assesses variables including time, CPU,

memory, I/O, code size, and cost. However, unlike

the Orama framework used in this work, their solu-

tion can’t work in distributed mode.

In (Wen et al., 2020), the authors run a test ﬂow

employing micro benchmarks (CPU, memory, I/O,

and network) and macro benchmarks to evaluate FaaS

services from AWS, Azure, GCP, and Alibaba in de-

tail (multimedia, map-reduce and machine learning).

The tests made use of speciﬁc Java, Node.js, and

Python methods that investigated the benchmarking

attributes to gauge resource usage efﬁciency and ini-

tialization delay. However, they do not present eval-

uations in different levels of concurrence and also do

not perform a statistical analysis using factorial de-

sign and t-test as is done in this work.

Although the aforementioned work presents an

ad-hoc evaluation of the providers, the scalability and

manageability of this evaluation is limited to the pa-

rameters that were used in the work. On the other

hand, this article uses a fully manageable solution pre-

pared for the incorporation of new providers and use

cases.

5 RESULTS

The Orama framework makes it possible to run

benchmarks on a FaaS environment in different sce-

narios, especially with the introduction of the dis-

tributed architecture presented in this work. In order

to explore some possibilities of running benchmarks,

an experiment was designed to allow the behavior of

the architecture to be observed, as well as obtaining

some insights into FaaS environments. Details of how

the experiment was conducted will be presented in

Section 5.1, while an analysis of the results is pro-

vided in Section 5.2.

5.1 Methodology

Considering that the main objective of this experiment

is to analyze the behavior of the distributed architec-

ture of the Orama framework, a use case was selected

which had the role objective of guaranteeing the cor-

rect execution of FaaS in the providers, without inte-

gration with other services, such as Database or Ob-

ject Storage, which could introduce speciﬁc features

of these services and divert the focus from the archi-

tecture itself. Therefore, the use case that deploys a

simple math calculator in each of the four main FaaS

providers supported by the Orama framework was se-

lected. The providers are AWS, Azure, Google and

CLOSER 2023 - 13th International Conference on Cloud Computing and Services Science

Table 2: Average execution times (in milliseconds).

Concurrence level

Scenario

AWS Lambda GCF AZF AFC

1 581 793 592 684 643 954 102 281

2 595 749 605 757 612 910 93 360

4 663 776 699 769 715 1004 91 302

8 592 859 595 832 622 1149 133 511

16 637 897 638 817 695 1145 100 358

32 709 983 707 918 890 1414 100 367

64 911 1158 924 1097 1303 1732 109 378

128 1328 1569 1367 1548 2264 2390 128 393

256 2274 2417 2386 2466 4130 3749 233 565

512 4068 4418 4447 4489 7506 6648 168 426

1024 6852 7334 5181 5787 11514 11884 116 367

2048 6139 6072 3596 3270 18378 17234 67 338

Figure 3: Concentrated workers scenario.

Alibaba.

The distributed architecture of the Orama frame-

work allows workers in different regions to be de-

ployed. Therefore, in this experiment two scenarios

were elaborated. In the ﬁrst scenario, shown in Figure

3, the workers were all deployed in the same region

where the target FaaS environments were also provi-

sioned (concentrated workers scenarios). In the sec-

ond scenario, workers were distributed in geographi-

cally distant regions, as shown in Figure 4. With these

two scenarios, it was expected to demonstrate the in-

ﬂuence of latency on FaaS, however, their behavior

Figure 4: Distributed workers scenario.

should be equivalent and demonstrate the same strat-

egy applied by providers.

Both scenarios were subjected to concurrent loads

of requests from just one simultaneous request to up

to 2048 requests with exponential growth. Thus, the

FaaS of each provider were submitted individually to

loads of 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024,

and 2048. It is noteworthy that the tests performed

with the Orama standalone architecture framework

were limited to up to 512 simultaneous requests, not-

ing the limitations of opening a connection with FaaS

services imposed by the instance in which the Orama

framework was installed. As 2048 was the maximum

level of this experiment, a quantity of 4 benchmarker

workers was deﬁned so that each one was responsi-

ble for carrying out a maximum of 512 concurrent

requests. Furthermore, each battery of tests with dif-

ferent levels of concurrency was repeated 10 times in

order to establish a statistical data mass for further

analysis.

In the ﬁrst scenario (concentrated workers), the

Orama framework was deployed on an instance of

GCP’s Compute Engine in the US-East region with

16GB of RAM and 4 vCPUS, and each of the four

FaaS Benchmarking over Orama Framework’s Distributed Architecture

workers was also deployed in the same GCP region,

the servers had 4GB of RAM and 2 vCPUS each. For

the second scenario, the master installation of Orama

framework was kept on the same instance in the US-

East GCP region as used in the ﬁrst scenario. How-

ever, the workers were spread out in other GCP re-

gions such as: asia-northeast1 (Tokyo, Japan), US

-West (The Dalles, Oregon, United States), europe-

west2 (London, England), and southamerica-east1

(Osasco, Brazil). All instances of GCP used in this

experiment ran Debian 11 as the operating system.

The FaaS that were the target of this experiment in

both scenarios were conﬁgured to be deployed in the

respective East American regions at each of the re-

spective providers, namely AWS Lambda, AZF, GCF

and AFC.

At the end of the execution of the repetitions of

both test scenarios, factorial designs and t-tests were

created in order to analyze the effects of the difference

between the providers and the difference between the

lowest level of concurrency (only one request) and

the highest level, with 2048 concurrent requests. The

analysis of the obtained results is presented in the next

section.

5.2 Outputs Analysis

Table 2 presents the consolidated result of all eight

test scenarios conducted in the experiment. It is pos-

sible to observe that the provider whose scenarios had

the lowest average execution times was AFC, high-

lighted in bold. Next in very close range are the av-

erage execution times for AWS Lambda and GCP. Fi-

nally, the average execution times for AZF were the

highest in this experiment. In addition, in Table 2, it

is also possible to observe that the scenarios with dis-

tributed workers have generally higher averages than

the averages found in the scenario with workers con-

centrated in the same region, this is the expected re-

sult, since the greater geographic distance between

workers and target FaaS should raise averages by in-

troducing higher latency to traverse the network.

Figure 5a presents a comparative result between

the scenarios involving AWS Lambda. It may be seen

that the average execution time of both scenarios in

AWS Lambda follow the growth of the concurrency

level the previous level to the maximum (2048 con-

current requests) when there is a small drop in the av-

erage. This decrease in the average execution time af-

ter 1024 indicates that, when faced with a growing de-

mand, the provider reinforced the service infrastruc-

ture in order to maintain its quality in terms of execu-

tion time. Despite this effort by the provider, failures

occurred from 256 concurrent requests, as shown in

Figure 5b, which shows the percentage failure rates

that occurred at each concurrency level. To corrobo-

rate the indication that the provider promoted an es-

calation before 2048 requests, as well as the average

time, the failure rate also showed a reduction.

Figures 5c and 5d show the average execution

times of the scenarios involving the GCF and their

respective failure rates at each concurrency level, re-

spectively. It is possible to observe that the graph of

the average execution time of the GCF is similar to

the same graph for AWS Lambda, in which the aver-

age time accompanies the growth of the concurrency

level until the intervention of the provider causes a

decrease in the average time. However, GCF’s av-

erage time threshold is lower than AWS Lambda’s.

While in AWS Lambda the peak point is 7.3 seconds

(at 1024 in the distributed scenario), in CGF the peak

is recorded at 5.7 seconds (at 1024 in the distributed

scenario). This proximity between AWS Lambda and

GCF average times demonstrates a strategic equiva-

lence between providers. Despite this, with regard

to the failure rate, the occurrence of failures in GCF

started only after 1024 requests, while AWS Lambda

already had failures at 256. However, the level of fail-

ures recorded by GCF was higher compared to AWS

Lambda, because while in AWS Lambda the fail-

ures peaked at 8% of requests, in GCF these failures

reached around 20%. Another difference in the AWS

Lambda and GCF results is the continuous growth

of failures demonstrated in the GCF distributed sce-

nario, which indicates that the provider’s monitoring

layer had greater difﬁculty in dealing with requests

from different regions than those whose origin was

the same.

Unlike AWS Lambda and GCF, which showed

a point of reduction in the average execution time,

AZF, as shown in Figure 5e, maintained a continu-

ous growth in the average execution time, registering

the highest average times in the experiment. This in-

dicates that in the AZF environment there was no re-

inforcement of the infrastructure as demand grew, al-

though this is a FaaS premise. Despite the high aver-

age time, AZF recorded a low failure rate, with fail-

ures occurring in only 3% of the 2048 concurrent re-

quests in the distributed scenario, as can be seen in

Figure 5f.

Figure 5g shows the average AFC execution

times. It is possible to observe three points where the

provider seems to have reinforced the infrastructure,

such as from 2 to 4 requests, from 8 to 16, and from

256 to 512. This meant that the provider maintained

the lowest average times of the experiment and did

not present an increasing curve of the average time, as

occurred with the other providers. In addition, there

CLOSER 2023 - 13th International Conference on Cloud Computing and Services Science

(a) AWS Lambda execution times. (b) AWS Lambda failure rates.

(d) GCF failure rates.

(e) AZF execution times.

(f) AZF failure rates.

(g) AFC execution times.

Figure 5: Comparative results.

was no occurrence of failure during the ten repetitions

of the test batteries. The AFC result also shows the

expected result of the experiment, in which it is pos-

sible to observe the same behavior both in the sce-

nario with workers concentrated in the same region,

and with distributed workers, differing only in the av-

erage level.

Another interesting aspect to be analyzed in this

FaaS Benchmarking over Orama Framework’s Distributed Architecture

Figure 6: Warm-up times. (C) → concentrated workers —

(D) → distributed workers.

experiment is the cold start phenomenon, that is, the

time it takes the provider to put its FaaS into operation

for the ﬁrst time or after a certain period of inactivity.

Figure 6 provides the warm-up times for all eight sce-

narios. This is the ﬁrst request made before the start

of the battery of tests. The arcing times showed a

big difference between AZF and the other providers.

While AWS Lambda, GCF and AFC recorded warm-

up times around 1s, AZF took 89 seconds in the dis-

tributed scenario and 103 seconds in the concentrated

scenario. This difference shows that the AZF deploy-

ment strategy is considerably different from the oth-

ers and this can signiﬁcantly impact the performance

of applications supported by this service, since after

some downtime, AZF FaaS will have a much higher

response time than usual and this will certainly nega-

tively impact the user experience.

In order to understand the impact of the provider

and concurrence factors on the results, six factorial

designs were elaborated comparing the four providers

with each other and the minimum and maximum con-

currence levels (1 and 2048), as shown in Figure 7.

In Figures 7a, 7b and 7d, it is possible to observe the

predominance of the concurrence factor, to the detri-

ment of the provider factor, which indicates that in

(a) AWS Lambda/GCF.

(b) AWS Lambda/AZF.

(d) GCF/AZF.

(e) GCF/AFC.

(f) AZF/AFC.

Figure 7: Factorial design results.

these benchmark results, the providers’ strategies in-

ﬂuenced the result less than the differences between

the concurrences. In Figures 7c, 7e and 7f, there is a

predominance of the provider factor and this corrob-

orates the results shown previously for average exe-

cution time and failure rate, since AFC participates in

both factorial designs.

The result of the t-tests for the six comparisons

between the providers is shown in Table 3. It is pos-

sible to notice that all the differences between the

average results were considered statistically relevant

with a 99.95% conﬁdence level. From which it may

be stated that these differences are not irrelevant and

may be considered as deﬁning the result of the bench-

marks.

Table 3: T-test results.

Scenario

Difference

(ms)

Standard

deviation

Conﬁdence

level (%)

AWS x

GCF

1,317.03 281.82 99.95

AWS x

AZF

7,792.25 928.90 99.95

AWS x

AFC

5,454.88 259.53 99.95

GCF x

AZF

9,109.28 898.69 99.95

GCF x

AFC

4,137.84 110.14 99.95

AZF x

AFC

13,247.13 891.95 99.95

The results of this work can serve as input in

decision-making processes according to the charac-

teristics and requirements of the real use case. For ex-

ample, if the use case requires high reliability of the

FaaS service, AFC would be the best option among

the evaluated providers, since it did not present fail-

ures. On the other hand, if the use case is very sen-

sitive to cold start, then the Azure provider should be

avoided, as it presented high values in this regard.

6 CONCLUSION

In this work, the distributed architecture of the Orama

framework was presented, with which it is possible to

perform benchmarks with high levels of concurrency

on a FaaS environment, as well as conﬁgure the load

to be triggered from different locations on the globe,

considerably expanding the range of possibilities for

benchmarks against state-of-the-art tools.

The Orama framework allows the visualization of

different scenarios for the same use case, especially

for concurrency levels above 512 concurrent requests,

CLOSER 2023 - 13th International Conference on Cloud Computing and Services Science

which is the amount observed as the safest maximum

for management by an intermediate conﬁguration in-

stance. In addition, it was evidenced that the Orama

framework in its distributed architecture is capable of

visualizing the same behavior as the FaaS provider

when subjected to a concentrated and distributed ap-

proach of bursts of requests, as well as eventual dif-

ferences.

In the experiments, it was also evident that the

AFC FaaS delivers greater consistency in terms of av-

erage execution time and occurrence of failures, fol-

lowed by AWS Lambda and CGF, which registered

close results, and ﬁnally the AZF results with high

average execution times and cold start.

In future work, other providers will be integrated

into the Orama framework, such as IBM and Ora-

cle, in order to expand the coverage of the analy-

ses presented in this work. Furthermore, even higher

levels of concurrency can be evaluated by design-

ing experiments that include a larger number of dis-

tributed workers. Furthermore, evolutions in the

Orama framework will allow the analysis of bench-

mark results using percentiles.

REFERENCES

Back, T. and Andrikopoulos, V. (2018). Using a mi-

crobenchmark to compare function as a service solu-

tions. In ECSOCC, pages 146–160. Springer.

Barcelona-Pons, D. and Garc

ıa-L

opez, P. (2021). Bench-

marking parallelism in faas platforms. Future Gener-

ation Computer Systems, 124:268–284.

Carvalho, L. and Araujo, A. (2022). Orama: A bench-

mark framework for function-as-a-service. In Pro-

ceedings of the 12th CLOSER, pages 313–322. IN-

STICC, SciTePress.

Carvalho, L. R. and Araujo, A. P. F. (2020). Performance

comparison of terraform and cloudify as multicloud

orchestrators. In 2020 20th IEEE/ACM CCGRID,

pages 380–389.

Copik, M., Kwasniewski, G., Besta, M., Podstawski, M.,

and Hoeﬂer, T. (2021). Sebs: A serverless benchmark

suite for function-as-a-service computing.

DUAN, Y., Fu, G., Zhou, N., Sun, X., Narendra, N. C., and

Hu, B. (2015). Everything as a Service (XaaS) on

the Cloud: Origins, Current and Future Trends. vol-

ume 00, pages 621–628.

Erinle, B. (2013). Performance testing with JMeter 2.9.

Packt Publishing Ltd. ISBN: 9781782165842.

Gartner (2022). Gartner forecasts worldwide pub-

lic cloud end-user spending to reach nearly $600

billion in 2023. [online; 01-Jan-2023; url:

https://tinyurl.com/3z72zebw].

Grambow, M., Pfandzelter, T., Burchard, L., Schubert, C.,

Zhao, M., and Bermbach, D. (2021). Befaas: An

application-centric benchmarking framework for faas

platforms.

Gupta, P. R., Taneja, S., and Datt, A. (2014). Using heat

and ceilometer for providing autoscaling in openstack.

JIMS8I-International Journal of Information Commu-

nication and Computing Technology, 2(2):84–89.

HashiCorp (2021). Terraform: Write, plan, apply. [online;

11-Aug-2021; url: https://www.terraform.io ].

Ibrahim, M. H., Sayagh, M., and Hassan, A. E. (2021).

A study of how docker compose is used to compose

multi-component systems. Empirical Software Engi-

neering, 26(6):128.

Jain, R. (1991). The art of computer systems: Tech-

niques for experimental design, measurement, simu-

lation, and modeling. ISBN:13.978-0471503361.

Kuhlenkamp, J., Werner, S., Borges, M. C., El Tal, K., and

Tai, S. (2019). An evaluation of faas platforms as a

foundation for serverless big data processing. In Pro-

ceedings of the 12th IEEE/ACM, UCC’19, page 1–9,

NY, USA. ACM.

Maissen, P., Felber, P., Kropf, P., and Schiavoni, V. (2020).

Faasdom. Proceedings of the 14th ACM International

Conference on Distributed and Event-based Systems.

MELL, P. and Grance, T. (2011). The NIST deﬁnition of

cloud computing. National Institute of Standards and

Tecnology.

Motta., M. A. D. C., Reboucas De Carvalho., L., Rosa., M.

J. F., and Favacho De Araujo., A. P. (2022). Compar-

ison of faas platform performance in private clouds.

In Proceedings of the 12th CLOSER,, pages 109–120.

INSTICC, SciTePress.

Sax, M. J. (2018). Apache Kafka, pages 1–8. Springer Inter-

national Publishing, Cham. ISBN: 978-3-319-63962-

Schleier-Smith, J., Sreekanti, V., Khandelwal, A., Carreira,

J., Yadwadkar, N. J., Popa, R. A., Gonzalez, J. E., Sto-

ica, I., and Patterson, D. A. (2021). What serverless

computing is and should become: The next phase of

cloud computing. ACM, 64(5):76–84.

Somu, N., Daw, N., Bellur, U., and Kulkarni, P. (2020).

Panopticon: A comprehensive benchmarking tool for

serverless applications. In 2020 COMSNETS, pages

144–151.

Wen, J., Liu, Y., Chen, Z., Chen, J., and Ma, Y. (2020).

Characterizing commodity serverless computing plat-

forms. DOI:10.48550/ARXIV.2012.00992.

Wittig, M. and Wittig, A. (2018). Amazon web services in

action. Simon and Schuster. ISBN: 978-1617295119.

FaaS Benchmarking over Orama Framework’s Distributed Architecture