underlying system hardware the container is running
on (kat, 2019); this approach has a significant perfor-
mance overhead.
gVisor’s approach to container isolation is in be-
tween that of a typical container runtime and of a
fully virtualized runtime. The idea of a guest ker-
nel of VMM-based runtimes served as an inspiration
for a user-space kernel Sentry (which uses ptrace
for syscall interception) and for container file-system
access manager Gofer. Both of them are instanti-
ated once for every OCI container in a pod sandbox.
This approach is a middle ground between the default
control groups / namespaces approach and the VMM
model. To reduce overhead, it does not virtualize
system hardware and leaves scheduling and memory
management to the host kernel. This split of respon-
sibilities negatively impacts performance for applica-
tions loaded with system calls and leads to potential
incompatibilities between some workloads and gVi-
sor due to specific Linux kernel behaviour or unim-
plemented Linux system calls (LLC, 2019).
gVisor is implemented as an OCI-compatible run-
time that can be easily plugged into well-known con-
tainer tools like containerd, Docker and CRI-O. Us-
ing a containerd plugin it is also possible to switch
between the default runtime and gVisor using annota-
tions (LLC, 2019).
3 RELATED WORK
The literature on container runtime evaluation is
scarce with the most of the related research only par-
tially being devoted to the evaluation of container run-
time overhead.
Avino et al. performed the research in the context
of resource-critical mobile edge computing, and have
discovered that the CPU usage of the Docker pro-
cess is constant regardless of the CPU cycle consump-
tion of the containerized process (Avino et al., 2017).
Casalicchio et al. have benchmarked Docker Engine
for CPU- and I/O-intensive workloads with the re-
sults reporting on the reduction in overhead of Docker
Engine from 10% down to 5% when the amount
of requested CPU cycles by the container is over
80% (Casalicchio and Perciballi, 2017). Kozhirbayev
et al. compared Docker to rivaling Flockport contain-
ers (based on LXC) showing that I/O and system calls
were responsible for most performance overheads for
both container implementations with Docker memory
performance being slightly better than that of Flock-
port (Kozhirbayev and Sinnott, 2017). Morabito et
al. also identify the disk performance as the major
bottleneck both for container-based and hypervisor-
based virtualization although no significant overheads
in other performance metrics were detected (Morabito
et al., 2015).
The study by Xavier et al. went further by dis-
covering that the near-native performance overhead
of containers (LXC, Linux-VServer, and Open VZ) is
acceptable for HPC applications (Xavier et al., 2013).
The same argument holds for Docker in data-intensive
HPC applications (Chung et al., 2016). Lee et al.
evaluated performance of containers under the hood
of production serverless computing environments by
AWS, Microsoft, Google and IBM concluding that
Amazon Lambda performs better in terms of CPU,
network bandwidth, and I/O throughput (Lee et al.,
2018). K. Kushwaha’s study on the performance of
VM-based and OS-level container runtimes such as
runc, kata-containers, also containerd and CRI-O con-
cluded that containerd performs better in compari-
son to CRI-O and Docker due to different file sys-
tem driver interface design though CRI-O has very
low container start-up latency in comparison to con-
tainerd (Kushwaha, 2017).
4 TouchStone
4.1 Overview of the Tool
This section describes the evaluation tool Touch-
Stone(the name refers to small stones that have been
used in the past to determine the quality of gold)
which was implemented to conduct tests for this
study. The tool is published under the MIT license
on GitHub. TouchStone is implemented in Go due
to available set of open-source tools that use CRI are
implemented in Go.
The main motivation for the development of the
tool is to improve container runtime evaluation by
collecting meaningful evaluation data during multi-
ple runs and exporting it. TouchStone allows to eval-
uate CPU, memory and block I/O performance as
well as container operations performance and scala-
bility of the performance under varying load. Touch-
Stone runs evaluations in a testing matrix that tests
each configuration pair consisting of container run-
time (CRI runtime) and OCI runtime once. The
tests are identified by a hierarchical naming scheme
context.component.facet, for instance, perfor-
mance.cpu.total in a CPU performance bench-
mark. All of the configuration listed above is done
using a YAML file. After all benchmarks have been
run, an index of the test results is generated and in-
jected together with the results into an an easily read-
able HTML report.
Performance Evaluation of Container Runtimes
275