Performance and Usability Implications of Multiplatform and

WebAssembly Containers

Sangeeta Kakati

and Mats Brorsson

Interdisciplinary Center for Security, Reliability, and Trust (SnT),

University of Luxembourg, Luxembourg

{sangeeta.kakati, mats.brorsson}@uni.lu

Keywords:

WebAssembly, Heterogeneity, Cloud-Edge, Containers, Multi-Platform, Runtimes.

Abstract:

Docker and WebAssembly (Wasm) are two pivotal technologies in modern software development, each of-

fering unique strengths in portability and performance. The rise of Wasm, particularly in conjunction with

container runtimes, highlights its potential to enhance efﬁciency in diverse application stacks. However, a

notable gap remains in understanding how Wasm containers perform relative to traditional multi-platform

containers across multiple architectures and workloads, especially when optimizations are employed. In this

paper, we aim to empirically assess the performance and usability implications of native multi-platform con-

tainers versus Wasm containers under optimized conﬁgurations. We focus on critical metrics including startup

time, pull time (both fresh and cached), and image sizes for three distinct workloads and architectures- AMD64

(AWS bare metal) and two embedded boards: Nvidia Jetson Nano with ARM64 and Starﬁve VisionFive2 with

RISCV64. To address these objectives, we conducted a series of experiments using docker and containerd in

multi-platform built images for native containers and Wasmtime as the WebAssembly runtime within contain-

erd/docker’s ecosystem. Our ﬁndings show that while native containers achieve slightly faster startup times,

Wasm containers excel in agility and maintain image sizes of approximately 27.0% of their native counterparts

and a signiﬁcant reduction in pull times across all three architectures of up to 25% using containerd. With con-

tinued optimizations, Wasm has the potential to emerge as a viable choice in environments that demand both

reduced image size and cross-platform portability. It will not replace the current container paradigm soon;

rather, it will be integrated into this framework and complement containers instead of replacing them.

1 INTRODUCTION

Containers are essential for native cloud computing,

offering efﬁcient and scalable deployment choices for

distributed systems. WebAssembly (Wasm) is a com-

pact binary instruction format and portable compila-

tion target that enables seamless execution of code

written in multiple languages, delivering high perfor-

mance across various platforms.

Containerization and WebAssembly represent a

convergence of technologies that enhance applica-

tion deployment and execution across diverse envi-

ronments. Containerization encapsulates applications

and their dependencies in isolated units, facilitating

consistent execution regardless of the underlying in-

frastructure.

WebAssembly, on the other hand, allows applica-

tions to run in a secure sandbox environment, allow-

https://orcid.org/0000-0002-4795-7489

https://orcid.org/0000-0002-9637-2065

ing near-native performance inside and outside the

Web. The WebAssembly System Interface (WASI)

has extended Wasm execution beyond web environ-

ments, making it increasingly popular in cloud com-

puting. The ability to use the same binary executable

on multiple architectures is particularly compelling.

A few years ago, Docker introduced support for dif-

ferent runtimes which opened up for choosing a We-

bAssembly runtime and running Wasm binaries pack-

aged in a container format.

In this paper, we explore the performance implica-

tions and usability beneﬁts/issues of WebAssembly in

container runtimes using Wasmtime, comparing it to

traditional Linux containers. The context is a future

cloud-edge compute continuum where we foresee the

use of multiple hardware architectures. Indeed, in

regular cloud servers, we can already observe both

X86-based- and ARM-based architectures of different

points in the design space. This proliferation of archi-

tectures will only increase with time. Any software

Kakati, S. and Brorsson, M.

Performance and Usability Implications of Multiplatform and WebAssembly Containers.

DOI: 10.5220/0013203200003950

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 15th International Conference on Cloud Computing and Services Science (CLOSER 2025), pages 15-25

ISBN: 978-989-758-747-4; ISSN: 2184-5042

developed should be capable of running on multiple

architectures. Currently, the options to create cloud-

native applications for this scenario is to either build

native docker container images that can run on any

architecture or to use WebAssembly which is archi-

tecture agnostic.

We present a comprehensive performance eval-

uation and contribution in the ﬁeld of WebAssem-

bly and multi-architecture native containers across

three workloads and three architectures that show-

case cloud server and edge devices, using a state-

of-the-art runtime: Wasmtime

, which is a project

within the Bytecode Alliance

. WebAssembly and

multi-architecture containers offer signiﬁcant advan-

tages, but standardization and consistent performance

remain challenges. Resolving these issues is vital for

widespread adoption in IoT settings.

Figure 1 illustrates the workﬂow for developing

and executing cross-architecture containers. In the

case of native execution, we must build the software

for each architecture and support or build a multi-

architecture docker container image. The Docker en-

gine can execute this image natively through its lay-

ers of runtimes, most notably containerd and runc.

The ﬂow for WebAssembly is almost the same, ex-

cept that the developer only has to build one image.

The docker runtime stack is instead containerd and

WasmTime in which the latter will compile the We-

bAssembly bytecode for native execution.

Cloudnative applications require fast startup times

and low resource overhead, especially in serverless

and edge computing environments. Native contain-

ers offer high performance, but at the cost of portabil-

ity, and as we will see, larger container image sizes,

while Wasm containers offer portability which might

come with performance trade-offs. We aim to analyze

these trade-offs across different architectures. Multi-

architecture containers enhance security but add man-

agement complexity and potential performance trade-

offs, which must be carefully evaluated in deployment

strategies.

Although most of the related work concentrates

on the viability of WebAssembly as an alternative to

Docker containers for IoT applications (Pham et al.,

2023), it has not yet emerged as a fully functioning

alternative to Docker. Furthermore, previous work

focused on WebAssembly as a mechanism to reduce

cold start latency compared to Docker-based run-

times, but did not use optimizations and best practices

for building multi-architecture containers (Gackstat-

ter et al., 2022).

Hence, to assess the performance characteristics

https://github.com/bytecodealliance/wasmtime

https://bytecodealliance.org/

of Wasm containers, our experiments emphasize col-

lecting key metrics associated with their utilization

and management. These metrics encompass image

size, startup duration, pull time, and the distinc-

tions between Wasm-based and native Docker’s op-

timized, multi-platform built containers. Addition-

ally, we explore the use of containerd as an alter-

native to Docker for image management, providing

a more direct and efﬁcient means of handling multi-

architecture containers.

The ﬁndings of our experiments indicate that the

Wasm containers are on average 85% smaller than

the native containers. In particular, Wasm containers

can reduce image pull times across the amd64, arm64,

and riscv64 platforms up to 25% compared to native

containers using containerd directly. In particular,

the main contributions of this paper are as follows.

1. This study is the ﬁrst to extensively explore

Wasm’s integration with Docker alternative run-

times, like Wasmtime, on multiple architectures.

2. We provide a thorough performance compari-

son between multi-architecture optimized native

and Wasm containers in three distinct workloads

across key metrics such as image size, pull time,

and startup duration. In particular, our exper-

iments on arm64, amd64, and riscv64 archi-

tectures using containerd (ctr) demonstrate

clear performance beneﬁts in pull times.

3. We investigate best practices not commonly em-

ployed by related works (e.g., non-optimized im-

ages) and use of ctr, focusing on the usage of

docker and ctr for both native and wasm work-

loads across architectures.

2 BACKGROUND

Docker containers and WebAssembly are two foun-

dational technologies that have greatly impacted the

domain of software development. The Docker con-

tainer technology is a way to package software with

its dependencies in a standardized format, making it

easy to deploy and execute in any runtime environ-

ment of the same architecture for which the software

was compiled. WebAssembly is a compact binary for-

mat that serves as a versatile compilation target for a

variety of programming languages, such as Rust, C,

C++, JavaScript, C# and Go.

Docker containers are typically distributed to their

runtime environment from a container registry, such

as Docker hub

. The container runtime is given a

https://hub.docker.com/

CLOSER 2025 - 15th International Conference on Cloud Computing and Services Science

Figure 1: Cross-architecture container execution.

container image speciﬁcation and will pull the image

from the registry if it is not found locally.

A WebAssembly runtime performs the bytecode

execution in three semantic steps: decoding, valida-

tion, and execution, ensuring that code executes at

near-native speeds within a secure, sandboxed envi-

ronment to safeguard against harmful software. Ex-

ecution can be done through interpretation or after

a compilation step done by the runtime. Some We-

bAssembly runtimes also perform tiered compilation

where ”hot” functions are further optimized during

execution. Wasmtime, which is the runtime used in

our study, compiles the WebAssembly bytecode in

one step prior to execution.

Dockers’ integration with WebAssembly allows

developers to use the performance and security ad-

vantages of WebAssembly while beneﬁting from

Docker’s efﬁcient image management and distribu-

tion capabilities. The motivation for containerizing

Wasm modules is rooted in utilizing existing con-

tainer orchestration ecosystems. While Wasm pro-

vides portability and sandboxing, Docker’s image dis-

tribution and management capabilities offer signiﬁ-

cant operational beneﬁts, such as seamless integra-

tion with CI/CD pipelines, image versioning, and reg-

istry services. Docker containerizes WebAssembly

runtime environments, ensuring uniformity across di-

verse development and production landscapes. More-

over, packaging Wasm applications streamlines de-

ployment and scaling across different infrastructures.

By merging both, developers can navigate the com-

plexities of establishing cross-platform Wasm envi-

ronments while enjoying the beneﬁts of Docker’s es-

tablished workﬂows and comprehensive ecosystem.

3 RELATED WORK

The cloud landscape is evolving, and we increasingly

see a development that includes data center servers,

regional or on-premise servers, mobile edge stations,

and even end-user devices connecting to the edge. To-

gether, we call this the cloud-edge continuum (Gko-

nis et al., 2023). Quite a lot of heterogeneity charac-

terizes this evolving infrastructure. In the context of

serverless computing, (Kjorveziroski et al., 2022) dis-

cuss the impact of serverless computing on cloud and

edge networks, highlighting the startup latency issues

associated with current platforms that use containers

and microvirtual machines. It proposes WebAssem-

bly as a potential solution to these cold start prob-

lems, enabling faster execution of server-side appli-

cations and serverless functions. The authors in (Fujii

et al., 2024) explored the migration of stateful virtual

machines among various runtimes. They aimed to ad-

dress the complexities of migrating VMs across dif-

ferent Wasm implementations, which lack standard-

ized internal designs.

Previous works have examined WebAssembly for

web applications and explored its use in edge com-

puting, however, limited research has focused on

comparing Wasm with native containers in multi-

architecture scenarios, particularly for cloud-native

applications. Several works have explored the use of

WebAssembly in the cloud due to its cross-platform

and security features. (Ferrari et al., 2021) studied

Wasm performance on edge devices, while (Fiedler

et al., 2020) analyzed Wasm startup latency in the

context of serverless computing. (Felter et al., 2015)

focused on docker container performance on x86 ar-

Performance and Usability Implications of Multiplatform and WebAssembly Containers

chitectures but did not explore Wasm.

WebAssembly’s ability to address constraints re-

lated to quality of service, hardware availability, and

networking conﬁgurations has been a subject of in-

vestigation. The research work of (Hilbig et al.,

2021) has explored an update on the usage of We-

bAssembly in the real world, highlighting vulnerabil-

ities in insecure source languages and a growing di-

verse ecosystem. Another work by (Kakati and Brors-

son, 2024b) presented the performance of the wasm-

time runtime in the Sightglass benchmark suite focus-

ing on architecture-speciﬁc outcomes.

Another study by (Chadha et al., 2023) explores

the utilization of WebAssembly as a distribution for-

mat for MPI-based HPC applications, introducing

MPIWasm to facilitate high-performance execution of

Wasm code. The study demonstrates competitive na-

tive application performance, coupled with a signif-

icant reduction in binary size compared to statically

linked binaries for standardized benchmarks.

(Bosshard, 2020) explore the execution of Wasm

in various serverless contexts. It compares server-side

Wasm runtime options, such as Wasmer, Wasmtime,

and Lucet, and investigates running it within Open-

whisk and AWS Lambda platforms. Experiments per-

formed by (Spies and Mock, 2021) and (Wang et al.,

2021) demonstrated Wasm’s usage using the bench-

mark PolyBench. Despite being an effective compiler

benchmark suite, PolyBench appears to be unable to

reﬂect applications in a wider range of non-web do-

mains.

A previous investigation by (Hockley and

Williamson, 2022) examined the use of WebAssem-

bly as a sandboxed environment for general-purpose

runtime scripting. However, the experiments are

conducted on speciﬁc hardware conﬁgurations,

which might not fully represent the performance

on a broader range of devices and architectures.

The performance of WebAssembly is compared to

native and container-based applications on ARM

architecture by (Mendki, 2020) with benchmarking

across various application categories. (Kakati and

Brorsson, 2023) highlights a WebAssembly survey

as the changing landscape of cloud computing, with

a shift toward edge computing, and emphasizes the

need for a cross-platform and interoperable solution.

Continuing the exploration of WebAssembly,

(Wang, 2022) addressed the gap in previous research

by conducting a detailed analysis of standard Wasm

runtimes, covering ﬁve widely used Wasm runtimes.

They introduced a benchmark suite named WaBench,

which includes tools from established benchmark

suites and whole applications from various domains.

(Kakati and Brorsson, 2024a) provided a thor-

ough analysis of the two most prominent WebAssem-

bly runtimes employing an extensive array of instru-

mented benchmarks and sheds light on Wasm’s poten-

tial to address the requirements of a cross-architecture

cloud-to-edge application framework.

There are a multitude of architectures and dif-

ferent performance characteristics, making applica-

tion development and deployment difﬁcult. We need

to ensure that software can run on as many differ-

ent nodes as possible without speciﬁcally adapting

to each hardware platform. This is an active re-

search area addressed by several researchers, includ-

ing (Mattia and Beraldi, 2021), (Nastic et al., 2021),

and (Orive et al., 2022). There also exists an op-

portunity for exploration within the domain of multi-

architecture native containers and the incorporation of

WebAssembly into containerization.

4 METHODOLOGY

To understand Docker’s support for WebAssembly,

it is imperative to understand the operational frame-

work of the Docker Engine. The Docker Engine

is predicated on a higher-level container runtime,

containerd, which furnishes essential functionality

for controlling the container lifecycle. Using a shim

process, containerd can take advantage of the capabil-

ities of runc, a low-level runtime used for native con-

tainers. Subsequently, runc interfaces directly with

the operating system to manage diverse facets of con-

tainer operations. This design offers the capability to

develop a shim to integrate various runtimes with con-

tainerd, including WebAssembly runtimes. Conse-

quently, it becomes feasible to seamlessly utilize dif-

ferent WebAssembly runtimes within Docker, such as

Wasmtime, Spin, and Wasmedge. Our experiment in

this regard encompassed three distinct architectures:

• amd64: The benchmarks were run on an AWS

bare metal server (Linux, AMD64).

• arm64: The benchmark was run on an Nvidia Jet-

son Nano server (ARM64 architecture).

• riscv64: We used an additional benchmark that

was also run on a StarFive VisionFive2 board with

RISCV64 architecture.

This multi-architecture setup allowed us to eval-

uate the performance of WebAssembly containers

against native containers across different hardware

environments. The general methodology is shown in

Fig.2.

CLOSER 2025 - 15th International Conference on Cloud Computing and Services Science

amd64, arm64, riscv64

server environments

Docker CE and Buildx (For

multi-architecture builds)

Native Containers

Wasm Containers

Docker default

runtime(runc)

Containerd shim

(Wasmtime runtime in

runwasi)

Final Measurements:

Startup Time, Pull Time,

Image size, etc,.

Cloud/Edge Environments

Build methods

Containerized Workloads

Choice of runtime

Measurement scripts

Figure 2: Overview of native and wasm container work-

ﬂows.

4.1 Workloads

Three workloads were designed to evaluate the per-

formance of both native and Wasm containers:

• Matrix Multiplication Workload: Implements

matrix multiplication in Rust, C++ and TinyGo,

with corresponding Wasm versions. This work-

load tests computational performance in the con-

text of heavy mathematical operations.

• Sorting Workload: Reads a text ﬁle, sorts its

contents, and writes the sorted output to a new

ﬁle. The sorting algorithm supports both ”all at

once” and ”one at a time” processing and is im-

plemented in Rust, Cpp, TinyGo, and their Wasm

counterparts. This workload tests ﬁle I/O and

memory handling capabilities.

• Face Detection Workload: Written in C++, we

run this workload on three architectures and use

containerd directly, thereby bypassing Docker en-

tirely.

To investigate the performance and resource char-

acteristics, we conducted experiments using three dif-

ferent hardware platforms with different instruction

set architectures. Table 1 provides some details on

the platforms used.

4.2 Container Conﬁgurations

For each workload, we containerized both native and

WebAssembly implementations, all approaches are

optimized to keep them comparable. Rust programs

were built with --release, C++ programs with -O3,

and TinyGo programs with -opt=2, because these are

the highest optimizations available in their respective

compilers, and the respective docker images are kept

to a minimum. Rust --release and C++ -O3 are

equivalent optimization levels, both focusing on max-

imum performance. TinyGo -opt=2 is the highest

optimization available for TinyGo, but it is not as ag-

gressive as Rust’s --release or C++’s -O3. Unfor-

tunately, TinyGo does not have an equivalent to -O3,

so we are using the best available option.

Native Containers: The Rust, TinyGo and Cpp

applications were compiled natively for each ar-

chitecture. We used Docker multi-platform builds

(--platform linux/amd64,linux/arm64) to

generate native containers compatible with both

architectures. For the riscv64 target, we need to

do it a little bit differently as there is no base

image common for amd64, arm64 and riscv64,

but the end result is the same, an image which

contains a sub-image for each architecture.

The Dockerﬁles were optimized through several

key measures including a multi-stage build pro-

cess, which separates the build environment from

the runtime environment, ensuring that the ﬁnal

image is minimal and only includes necessary

runtime dependencies. The ﬁrst stage is the build

stage during which we use a regular Linux base

image such as Alpine (for amd64 and arm64) and

an Ubuntu image for riscv64. We ensure that the

ﬁnal image is extremely small and contains only

the application binary and its essential dependen-

cies. These measures collectively contribute to an

efﬁcient and lightweight Docker image.

Wasm Containers: We compiled the Rust, TinyGo

and Cpp applications to WebAssembly and exe-

cuted them using the wasmtime runtime through

docker run. To run these containers, the con-

tainerd wasmtime shim was installed which is

provided by the runwasi

project. The contain-

erd runtime can provide support for Wasmtime as

an alternative container runtime

. The runtime

provided a secure execution environment with the

added sandboxing offered by WebAssembly. The

Dockerﬁle is kept minimal for running a wasm bi-

nary.

Again, by using ”scratch” as the base image, we

ensure that the ﬁnal image is lightweight, contain-

ing only the Wasm binary and no additional layers

or dependencies. This approach minimizes the at-

tack surface and signiﬁcantly reduces the image

size. This setup is ideal for environments where

resource efﬁciency and security are paramount.

We used the command --platform wasm and

https://github.com/containerd/runwasi

https://docs.docker.com/engine/daemon/

alternative-runtimes/

Performance and Usability Implications of Multiplatform and WebAssembly Containers

Table 1: Hardware Architectures Used in Experiments.

Platform CPU Model Clock Threads per Cores per Caches

frequency Core Socket

X86 64 AWS c5.metal Intel Xeon 8275CL 3 GHz 2 24 L1d: 32 KiB, L1i: 32 KiB, L2: 1 MiB, L3: 71.5 MiB

Nvidia Jetson Nano ARM Cortex-A57 1.5 GHz 2 4 L1d: 32 KiB, L1i: 48 KiB, L2: 2048 KiB

StarFive VisionFive2 SiFive U74 Core 1 GHz 1 4 L1d: 32 KiB, L1i: 32 KiB, L2: 2048 KiB

runwasi

(containerd-wasm-shim)

containerd-shim-runc-v2

Wasmedge Wasmtime spin

crun youki

High level Container

Runtimes

Container Management

Platforms

Wasm Runtimes Low Level Container Runtimes

containerd shims

Figure 3: Wasi support with containerd shims.

runtime=io.containerd.wasmtime.v1 to exe-

cute the wasm containers. For native containers,

the default docker runtime was used.

The general workﬂow of WebAssembly and the

use of it as an alternative docker runtime is shown in

Fig. 3. As illustrated in this ﬁgure. when running

a program using the docker run command, docker

will use containerd to manage the container. First, it

will check the local image store if it is already present

locally, and if not, it will pull a fresh image from the

used image registry. In our case, the Docker hub.

4.3 Benchmarking Approach

We measured the startup time of each container (both

native and Wasm) using a custom script that detects

the system architecture, allowing it to run appropriate

container images for either AMD64 or ARM64. We

instrumented the source code to capture the current

time at the beginning of the main function, providing

a precise timestamp for when the main function starts

to execute. In addition to measuring the time until

the main function starts, the script also logs the im-

age pull time, ensuring that we capture the overhead

of pulling the container image when a fresh pull is

required.

To ensure accurate measurements, we use the

‘date‘ command for high-resolution timestamps and

employ Docker’s built-in functionality to run contain-

ers. Before running each experiment, the script force-

removes the image if a fresh pull is speciﬁed, ensuring

a cold start, which provides a more accurate measure-

ment of the startup time. The overall approach com-

bines internal timing measurements within the con-

tainer and external validation through the execution

of the Docker container, ensuring consistency and re-

liability across different environments.

The measured time is categorized into two distinct

metrics: Pull Time, deﬁned as the time from request-

ing to run the container until it has been brought in

by pulling it from the remote OCI registry, and Con-

tainer Time, which reﬂects the duration taken to load

and initialize the image for execution until the pro-

gram starts its main function. This reﬂects the startup

time of the container.

The script supports testing with forced fresh pulls

and using cached images, allowing us to assess

the performance of the containerization approach in

different scenarios

. This detailed benchmarking

method provides insights into the performance char-

acteristics of native versus Wasm containers in multi-

architecture cloud environments. The script runs the

containers in both architectures (amd64 and aarch64),

performs the workload, and calculates the startup time

based on the elapsed time between image pull and the

execution of the container’s main function.

The AWS instance represents powerful cloud dat-

acenter server instances. The Intel Xeon 8275CL

CPU is a powerful processor with advanced par-

allel pipelines using dynamic instruction schedul-

ing, branch prediction, and cache prefetching mech-

anisms. In contrast, the Nvidia Jetson and StarFive

VisionFive2 boards represent the embedded domain

more likely to be found at the edge.

5 RESULTS

To evaluate startup times for native and Wasm con-

tainers on ﬁle sorting and matrix multiplication work-

loads, we initially followed best practices by employ-

ing multistage Dockerﬁle setups and optimized base

images as shown in Figure 4 for the Matrix Multipli-

cation program and Figure 5 for the Sorting bench-

mark. The results slightly favors native containers,

with smaller differences in image sizes and startup

times between native and Wasm, despite the tradition-

https://github.com/sangeeta1998/benchmark sort/

CLOSER 2025 - 15th International Conference on Cloud Computing and Services Science

Figure 4: Startup Times (image pull and container time) for Matrix Multiplication.

Figure 5: Startup Times (image pull and container time) for Sorting Benchmark.

ally smaller size of Wasm. The main reason is that

although WebAssembly execution is fast and com-

petetive, as has been shown by (Kakati and Brorsson,

2024a), there is still some overhead compared to na-

tive execution.

Looking at Figures 4 and 5 we see in the red and

blue bars, the image pull time and container time for

all architectures and source languages. The yellow

and green bars indicate the startup time when the im-

age can be found locally in the Docker image cache.

We ﬁrst observe that for the AMD64 architecture,

the image pull time (light red) is almost the same

for both native and Wasm images and also across all

source languages. For ARM64, there are some small

deviations: The matrix multiplication C++ Wasm im-

age is pulled slightly faster than the native image. The

reason can be found in the large difference in image

size, see Figure 7. A similar situation occurs for Sort

with TinyGo.

When it comes to the actual startup time after the

container image has been pulled from the remote reg-

istry, the time is normally longer for Wasm contain-

ers than for native. This is normal considering that

there is an overhead of compiling the Wasm bytecode

which is in the critical path of execution. Some source

code languages are more sensitive here, in particular

TinyGo which might indicate that the Wasm genera-

tion in the TinyGo compiler is not yet as mature as for

Rust and C++.

5.1 Evaluation of Startup Times:

Impact of Dockerﬁle Practices

To closely mirror the scenarios often seen in related

work, which suggests Wasm has signiﬁcantly lower

startup times than we saw in the previous section, we

conducted a revised experiment on matrix multipli-

cation workload for native containers without scratch

Performance and Usability Implications of Multiplatform and WebAssembly Containers

(a) Average Startup Time in AMD64. (b) Average Startup Time in ARM64.

Figure 6: Pull and Startup Time for AM64 and ARM64: Without best Practices.

and multistage optimizations. In this conﬁguration,

image sizes and startup times were notably impacted,

and prominent differences in native vs. wasm startup

times can be seen. The Alpine-based C++ (433 MB)

produced a minimal image size due to the compact

nature of its dependencies. In contrast, Rust (927

MB) required a larger Alpine image to include the

Rust toolchain. TinyGo (1.55 GB) relied on the

tinygo/tinygo:latest image as a base, as TinyGo

lacks native Alpine support for arm64.

The results shown in Figure 6 reveal a substan-

tial improvement in the start times of Wasm com-

pared to native containers, particularly on AMD64.

For fresh pull and container startup times on AMD64,

Native Rust registered a pull time of 7166 ms, with

an additional container startup of 6190 ms. Mean-

while, Wasm Rust showed a much faster startup at

1996 ms for both pull and container times, underscor-

ing Wasm’s more efﬁcient startup under these condi-

tions.

Speciﬁcally, Wasm Rust’s startup time was ap-

proximately 72% faster than the combined pull and

startup time of Native Rust. Similar trends emerged

for TinyGo and C++ images: TinyGo Native required

signiﬁcantly longer for both pulling and startup com-

pared to Wasm, which achieved an 85% faster overall

startup. Native C++ also showed similar results, with

Wasm C++ maintaining faster performance, resulting

in an 82% faster total startup time.

For ARM64, native images demonstrated greater

overhead relative to Wasm, though the performance

gap between architectures narrowed slightly. Native

Rust showed a higher combined pull and startup time,

while Wasm Rust achieved a faster overall start, being

66% quicker. TinyGo on ARM64 followed a similar

trend, where Wasm’s total startup time was 77% faster

than Native. Native C++ on ARM64 required longer

times overall, with Wasm C++ reducing this signiﬁ-

cantly, achieving a 69% faster total startup time.

In summary, this additional experiment reveals

that Wasm consistently achieves faster startup times

than native containers under nonoptimized Docker

conﬁgurations, particularly on AMD64. While na-

tive images in optimized setups can perform similarly

to Wasm, removing best practices reveals Wasm’s

potential for faster initialization, making it suitable

for performance-sensitive applications across both

AMD64 and ARM64 architectures.

On the other hand, if we are using the ultimate

best practices for native multi-platform built images,

Wasm provides no additional beneﬁts in performance

and remains within an acceptable range. These results

highlight Wasm’s suitability for multi-architecture en-

vironments, suggesting that further optimizations can

closely match native performance across platforms,

offering viable cross-architecture consistency.

While Wasm shows notable advantages in startup

time and consistency across architectures, our experi-

ments did not exhaustively compare runtimes for lan-

guages with larger dependencies, such as Java, C#,

Python, and JavaScript. We anticipate that these lan-

guages may yield different performance results due to

their larger runtime requirements, providing avenues

for further study.

For all the other results presented in this paper, we

use optimized native docker images.

5.2 Image Size Comparison

Wasm containers offer signiﬁcantly reduced image

sizes compared to native containers, making them

optimal for resource-constrained environments. Fig-

ure 7 shows the image sizes at the host architecture

when they have been pulled and cached locally. Fig-

ure 8 shows the compressed sizes at the OCI Registry

(DockerHub) for the same images. Note, however,

that this is only for one architecture. The real size

stored at the registry in order to support multiple ar-

CLOSER 2025 - 15th International Conference on Cloud Computing and Services Science

Figure 7: Host-based Image Sizes comparison.

chitectures should be multiplied with the number of

architectures (the image size is roughly the same size

for each architecture).

Figure 8: Image Sizes from OCI Registry for the AMD64

architecture.

5.3 Host Architecture Sizes vs. OCI

Registry Sizes

Table 2 and 3 summarize the image sizes for both

workloads comparing the image size stored in the reg-

istry (OCI size) to the one in the image runtime. The

registry images are compressed to save storage space

and transfer time. The reduction in size from the host

architecture to the OCI registry is calculated as shown

in Equation 1.

Reduction =



1 −

OCI Size

Host Size



× 100 (1)

Table 2: Host Architecture Sizes vs. OCI Registry Sizes for

Sorting Benchmark.

Language Host Size (KB) OCI Size (KB) Reduction (%)

TinyGo Native 1210 373.36 69.2

Rust Native 610 259.27 57.5

C++ Native 2180 746.24 65.8

TinyGo WASM 257 100.31 61.0

Rust WASM 96 40.79 57.5

C++ WASM 212 68.5 67.7

When an OCI image is pulled, the compressed

version of the image layers is transferred over the net-

work. The decompression happens on the client side

Table 3: Host Architecture Sizes vs. OCI Registry Sizes for

Matrix Multiplication Benchmark.

Language Host Size (KB) OCI Size (KB) Reduction (%)

TinyGo Native 1220 366.46 69.9

Rust Native 613 273 55.5

C++ Native 2170 735.43 66.1

TinyGo WASM 250 101.26 59.5

Rust WASM 96.1 42.25 56.1

C++ WASM 192 60.38 68.6

after the image layers are downloaded. This approach

minimizes the amount of data transferred, making the

process more efﬁcient.

Given the small images we are working with here,

we do not see much difference in pull time between

native images over Wasm images, even when the size

difference is large. The docker software introduces

some overhead in addition to the overhead of contain-

erd, which is the service that actually does the image

management.

The fresh startup time for a Docker container in-

volves several steps:

1. Calling Docker and Checking Local Image

Cache: The initial command is sent to the Docker

daemon and it checks if the image is already avail-

able locally.

2. Request Time from and in Docker Hub: If the

image is not available locally, Docker requests it

from the registry. The registry processes the re-

quest and starts sending the image layers.

3. Getting the Image Back and Decompressing:

The image layers are downloaded to the local ma-

chine. The downloaded layers are decompressed.

4. Loading Image to Container Runtime and

Start Execution: The image is loaded into the

container runtime, and the container is started.

It is only step 3 and to some extent step 4, which

are affected by the image size.

Performance and Usability Implications of Multiplatform and WebAssembly Containers

Table 4: Image sizes across architectures.

Architecture Image Size (MB)

amd64 1.65

arm64 1.55

riscv64 1.55

wasm 0.816

5.4 Using Containerd to Handle Images

In this section, we present the performance results

for the face detection benchmark, implemented in

C++ with -O3 optimization. The application was

containerized for three target architectures: amd64,

arm64, and riscv64. Notably, the Dockerﬁles for

amd64 and arm64 are identical, while the Dockerﬁle

for riscv64 differs only in the base image used to ac-

commodate the architecture. The image sizes across

different architectures are summarized in Table 4.

We perform the benchmark using containerd,

bypassing Docker, to directly evaluate the image pull

times. This allowed us to compare the performance

when using containerd as opposed to Docker.

The Dockerﬁles for the native images (amd64,

arm64, riscv64) follow a similar structure:

• Build Stage: The source code is compiled us-

ing clang++ with optimizations set to -O3.

This includes compiling several C++ source ﬁles

which are then linked to the ﬁnal executable

(facedetection).

• Runtime Stage: The runtime image is based on

the scratch image to minimize its size. The

compiled facedetection executable and an in-

put image are copied into the image.

5.4.1 Image Pull Times

We measured the average image pull time over 10 it-

erations for each architecture using ctr. The results

are presented in Fig. 9.

Figure 9: Containerd Image Pull Time.

The results show that the Wasm containers has

pull times considerably faster than the native contain-

ers. The results also demonstrate the effectiveness

of using containerd for managing containers, as it

provides faster image pull times. These ﬁndings con-

tribute to understanding the trade-offs between differ-

ent architectures and container runtimes in the context

of performance benchmarks.

6 CONCLUSION

In response to the growing demand for efﬁcient and

portable cloud-native computing, we have investi-

gated the performance implications of WebAssem-

bly (Wasm) containers compared to native multi-

architecture containers. Through benchmarking

across three distinct workloads on ARM64, AMD64 and

RISCV64 architectures, we found that native contain-

ers exhibited slightly faster startup times when using

highly optimized build process.

Wasm containers demonstrated a signiﬁcantly

smaller image size, averaging approximately 15%

of the size of their native counterparts (see Fig-

ure 7), making it a compelling choice for resources-

constrained environments, where memory and storage

footprint are critical.

An important aspect of this study was the use

of containerd instead of docker to manage contain-

ers. This direct approach to container management

resulted in up to 25% faster pull times for Wasm im-

ages across all architectures. In particular, the use

of containerd showcased an advantage in efﬁciency

when using container runtimes that bypass the Docker

layer.

Our contributions include a comprehensive eval-

uation of Wasm’s integration with Docker’s al-

ternative runtime, Wasmtime in runwasi-containerd

shim, across three architectures, along with a de-

tailed performance comparison between native multi-

architecture containers and Wasm containers based on

key metrics such as startup time, pull time, and im-

age size. The integration of Wasm into containerized

environments offers a signiﬁcant opportunity to opti-

mize performance, and instead of taking the place of

the existing container paradigm, it is more likely to

be integrated into that framework. Therefore, Wasm

will not replace containers; rather, they will function

as complementary technologies.

CLOSER 2025 - 15th International Conference on Cloud Computing and Services Science

ACKNOWLEDGMENT

This research has been partly funded by the Luxem-

bourg National Research Fund (FNR) under contract

number 16327771 and has been supported by Prox-

imus Luxembourg SA. For the purpose of open ac-

cess, and in fulﬁllment of the obligations arising from

the grant agreement, the author has applied a Creative

Commons Attribution 4.0 International (CC BY 4.0)

license to any Author Accepted Manuscript version

arising from this submission.

REFERENCES

Bosshard, B. (2020). On the use of web assembly in a

serverless context. In Agile Processes in Software

Engineering and Extreme Programming–Workshops,

page 141.

Chadha, M., Krueger, N., John, J., Jindal, A., Gerndt, M.,

and Benedict, S. (2023). Exploring the use of we-

bassembly in hpc. In Proceedings of the 28th ACM

SIGPLAN Annual Symposium on Principles and Prac-

tice of Parallel Programming, pages 92–106.

Felter, W., Ferreira, A., Rajamony, R., and Rubio, J. (2015).

An updated performance comparison of virtual ma-

chines and linux containers. In 2015 IEEE Interna-

tional Symposium on Performance Analysis of Sys-

tems and Software (ISPASS), pages 171–172. IEEE.

Ferrari, D., Sciascio, E., Gioiosa, R., Krieger, O., Krikellas,

K., and Giuliani, G. (2021). Webassembly for cloud-

native applications: A performance evaluation. Future

Generation Computer Systems, 115:335–348.

Fiedler, M., Alrowaily, H., Menzel, K., and Stiller, B.

(2020). Performance evaluation of webassembly as a

cloud-edge continuum technology. In 2020 IEEE 6th

International Conference on Network Softwarization

(NetSoft), pages 334–340. IEEE.

Fujii, D., Matsubara, K., and Nakata, Y. (2024). State-

ful vm migration among heterogeneous webassem-

bly runtimes for efﬁcient edge-cloud collaborations.

In Proceedings of the 7th International Workshop on

Edge Systems, Analytics and Networking, pages 19–

24.

Gackstatter, P., Frangoudis, P. A., and Dustdar, S. (2022).

Pushing serverless to the edge with webassembly run-

times. In 2022 22nd IEEE International Symposium

on Cluster, Cloud and Internet Computing (CCGrid),

pages 140–149. IEEE.

Gkonis, P., Giannopoulos, A., Trakadas, P., Masip-Bruin,

X., and D’Andria, F. (2023). A survey on iot-edge-

cloud continuum systems: Status, challenges, use

cases, and open issues. Future Internet, 15(12):383.

Hilbig, A., Lehmann, D., and Pradel, M. (2021). An empir-

ical study of real-world webassembly binaries: Secu-

rity, languages, use cases. In Proceedings of the Web

Conference 2021, pages 2696–2708.

Hockley, D. and Williamson, C. (2022). Benchmarking run-

time scripting performance in webassembly.

Kakati, S. and Brorsson, M. (2023). Webassembly be-

yond the web: A review for the edge-cloud contin-

uum. In 2023 3rd International Conference on Intel-

ligent Technologies (CONIT), pages 1–8. IEEE.

Kakati, S. and Brorsson, M. (2024a). A cross-architecture

evaluation of webassembly in the cloud-edge con-

tinuum. 2024 IEEE 24th International Symposium

on Cluster, Cloud and Internet Computing (CCGrid),

pages 337–346.

Kakati, S. and Brorsson, M. (2024b). An investigative study

of webassembly performance in cloud-to-edge. In

2024 International Symposium on Parallel Computing

and Distributed Systems (PCDS), pages 1–5.

Kjorveziroski, V., Filiposka, S., and Mishev, A. (2022).

Evaluating webassembly for orchestrated deployment

of serverless functions. In 2022 30th Telecommunica-

tions Forum (TELFOR), pages 1–4. IEEE.

Mattia, G. P. and Beraldi, R. (2021). Leveraging Reinforce-

ment Learning for online scheduling of real-time tasks

in the Edge/Fog-to-Cloud computing continuum. In

2021 IEEE 20th International Symposium on Net-

work Computing and Applications (NCA), pages 1–9.

ISSN: 2643-7929.

Mendki, P. (2020). Evaluating webassembly enabled server-

less approach for edge computing. In 2020 IEEE

Cloud Summit, pages 161–166.

Nastic, S., Pusztai, T., Morichetta, A., Pujol, V. C., Dustdar,

S., Vii, D., and Xiong, Y. (2021). Polaris Scheduler:

Edge Sensitive and SLO Aware Workload Scheduling

in Cloud-Edge-IoT Clusters. In 2021 IEEE 14th Inter-

national Conference on Cloud Computing (CLOUD),

pages 206–216. ISSN: 2159-6190.

Orive, A., Agirre, A., Truong, H.-L., Sarachaga, I., and

Marcos, M. (2022). Quality of Service Aware Or-

chestration for Cloud–Edge Continuum Applications.

Sensors, 22(5):1755. Number: 5 Publisher: Multidis-

ciplinary Digital Publishing Institute.

Pham, S., Oliveira, K., and Lung, C.-H. (2023). Webassem-

bly modules as alternative to docker containers in iot

application development. In 2023 IEEE 3rd Interna-

tional Conference on Electronic Communications, In-

ternet of Things and Big Data (ICEIB), pages 519–

524. IEEE.

Spies, B. and Mock, M. (2021). An evaluation of we-

bassembly in non-web environments. In 2021 XLVII

Latin American Computing Conference (CLEI), pages

1–10. IEEE.

Wang, W. (2022). How far we’ve come–a characterization

study of standalone webassembly runtimes. In 2022

IEEE International Symposium on Workload Charac-

terization (IISWC), pages 228–241. IEEE.

Wang, Z., Wang, J., Wang, Z., and Hu, Y. (2021).

Characterization and implication of edge webassem-

bly runtimes. In 2021 IEEE 23rd Int Conf on

High Performance Computing & Communications;

7th Int Conf on Data Science & Systems; 19th Int

Conf on Smart City; 7th Int Conf on Dependabil-

ity in Sensor, Cloud & Big Data Systems & Applica-

tion (HPCC/DSS/SmartCity/DependSys), pages 71–

80. IEEE.

Performance and Usability Implications of Multiplatform and WebAssembly Containers