SLO and Cost-Driven Container Autoscaling on Kubernetes Clusters

Angelo Marchese

and Orazio Tomarchio

Dept. of Electrical Electronic and Computer Engineering, University of Catania, Catania, Italy

Keywords:

Cloud Computing, Container Technology, Kubernetes Autoscaler, Service Level Objectives, Cost Monitoring.

Abstract:

Modern web services must meet critical non-functional requirements such as availability, responsiveness,

scalability, and reliability, which are formalized through Service Level Agreements (SLAs). These agree-

ments specify Service Level Objectives (SLOs), which deﬁne performance targets like uptime, latency, and

throughput, essential for ensuring consistent service quality. Failure to meet SLOs can result in penalties and

reputational damage. Service providers also face the challenge of avoiding over-provisioning resources, as this

leads to unnecessary costs and inefﬁcient resource use. To address this, autoscaling mechanisms dynamically

adjust the number of service replicas to match user demand. However, traditional autoscaling solutions typi-

cally rely on low-level metrics (e.g., CPU or memory usage), making it difﬁcult for providers to optimize both

SLOs and infrastructure costs. This paper proposes an enhanced autoscaling methodology for containerized

workloads in Kubernetes clusters, integrating SLOs with a cost-driven autoscaling policy. This approach over-

comes the limitations of conventional autoscaling by making more efﬁcient decisions that balance service-level

requirements with operational costs, offering a comprehensive solution for managing containerized applica-

tions and their infrastructure in Kubernetes environments. The results, obtained by evaluating a prototype of

our system in a testbed environment, show signiﬁcant advantages over the vanilla Kubernetes Horizontal Pod

Autoscaler.

1 INTRODUCTION

Microservices architecture is a widely used architec-

tural style for enterprise software that breaks down

large applications into a series of small, modular,

independently deployable microservices (Salii et al.,

2023).

For such applications, the distribution of work-

load across a cluster of servers can be achieved in a

horizontal manner, obviating the necessity for a sin-

gle, costly server. The allocation of resources can be

managed with precision by replicating or allocating

greater resources to microservices experiencing the

highest demand or those requiring greater reliability.

It is evident that microservice applications exhibit

the characteristics that render them as ”cloud-native”,

that is to say, they possess the capability to exe-

cute and expand within contemporary, evolving en-

vironments such as public, private and hybrid clouds

(Hongyu and Anming, 2023).

Services developed and organized in such a way,

are complex systems designed to meet a wide range of

https://orcid.org/0000-0003-2114-3839

https://orcid.org/0000-0003-4653-0480

non-functional requirements that are critical to their

business operations, including service availability, re-

sponsiveness, scalability, and reliability. These re-

quirements are typically formalized through Service

Level Agreements (SLAs) between service providers

and consumers, which outline the operational bound-

aries within which a service must perform. An SLA

consists of one or more Service Level Objectives

(SLOs), which deﬁne high-level performance indi-

cators that must be maintained throughout the ser-

vice delivery period. These indicators represent the

desired state of service, such as uptime, latency, or

throughput, and are essential for ensuring consistent

service quality. Failing to meet SLOs can result

in contractual penalties and damage to the service

provider’s reputation.

Furthermore, it is also important, from a service

provider perspective not to over-provision resource

allocation in a deployment environment while consid-

ering a given SLO, as it would result in additional

costs and non-optimal resource utilization (Gupta

et al., 2017). Service providers are then challenged

to ﬁnd the right balance between meeting SLOs and

optimizing resource usage and costs.

The adoption of cloud computing technology

Marchese, A. and Tomarchio, O.

SLO and Cost-Driven Container Autoscaling on Kubernetes Clusters.

DOI: 10.5220/0013482100003950

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 15th International Conference on Cloud Computing and Services Science (CLOSER 2025), pages 72-79

ISBN: 978-989-758-747-4; ISSN: 2184-5042

and service orchestration systems has emerged as a

promising solution to address this challenge, thanks to

the cloud infrastructure’s reliability, availability, scal-

ability, and elasticity, as well as the automation ca-

pabilities in service management provided by orches-

tration systems (Mukherjee et al., 2024; Calcaterra

et al., 2021). In particular, the service autoscaling

mechanisms offered by orchestration systems, along

with the ability to rapidly provision and de-provision

cloud infrastructure, are key features for ﬁnding the

right balance between meeting SLOs and optimizing

costs.

Service autoscaling mechanisms help minimize

over or under provisioning issues by dynamically ad-

justing the number of service replicas to match the

current user request load (Chen et al., 2018). How-

ever, most orchestration solutions only allow to scale

the service based on the value of a low-level metric

(i.e., a metric considering low-level monitoring indi-

cators such as service CPU or memory usage) and in

this way it is hard for a service provider to control the

required high-level SLO and to optimize the infras-

tructure costs.

In this paper we propose an enhanced autoscaling

methodology, with particular reference to container-

ized workloads utilizing Kubernetes clusters. The

proposed approach addresses the limitations of con-

ventional autoscaling methodologies by integrating

an SLO and a cost-driven autoscaling policy. This fa-

cilitates more efﬁcient autoscaling decisions that bal-

ance both service-level requirements and operational

costs, offering a more comprehensive solution for

managing containerized applications in Kubernetes

environments and the infrastructure required for their

execution.

The rest of the paper is organized as follows. Sec-

tion 2 provides some background information about

the Kubernetes autoscaling policy and discusses in

more detail some of its limitations that motivate our

work. Section 3 presents our proposed approach, de-

tailing its implementation, while Section 4 discusses

the evaluation results from a testbed environment. Re-

lated works are reviewed in Section 5, and Section 6

concludes the work.

2 KUBERNETES AUTOSCALING

Kubernetes is today the de-facto orchestration plat-

form for the lifecycle management of containerized

applications deployed on large-scale node clusters

(Kubernetes, 2024; Gannon et al., 2017). A typical

Kubernetes cluster comprises a control plane and a set

of worker nodes. The control plane encompasses vari-

Figure 1: Kubernetes Horizontal Pod Autoscaler.

ous management services running within one or more

master nodes, while the worker nodes serve as the

execution environment for containerized workloads.

In Kubernetes, the fundamental deployment units are

Pods, each containing one or more containers and

managed by a Deployment resource.

The Kubernetes Horizontal Pod Autoscaler (HPA)

is a control plane component that adjusts the number

of Pods managed by a Deployment based on the aver-

age CPU or memory usage of those Pods. Conﬁgured

with a target value, the HPA periodically scales the

number of Pods to ensure that their average resource

usage aligns with the desired target. Equation 1 shows

the HPA algorithm with a target value on the average

CPU usage.

= ceil(r

∗ (cpu

/cpu

)) (1)

where r

is the desired number of replicas, r

the

current number of replicas, cpu

the current average

CPU usage, cpu

the desired CPU usage and ceil()

is a function that gives as output the greatest integer

less than or equal to the input argument. The main

limitation of the HPA algorithm is that it relies on

target values speciﬁed for low-level resource usage

metrics, which are challenging to deﬁne and correlate

with high-level indicators such as SLOs and infras-

tructure costs. Incorrectly deﬁning these target values

can lead to inefﬁcient scaling decisions. Setting low

resource usage targets triggers frequent scaling up ac-

tions, improving service performance but resulting in

infrastructure over provisioning and higher costs. On

the other hand, setting high targets results in frequent

scaling down actions, reducing costs but potentially

causing frequent SLOs violations.

SLO and Cost-Driven Container Autoscaling on Kubernetes Clusters

3 PROPOSED APPROACH

3.1 General Model

Building on the limitations discussed in Section 2, this

work introduces an SLO and cost-driven autoscaling

policy speciﬁcally designed for containerized work-

loads running on Kubernetes clusters. The primary

goal is to address the shortcomings of traditional au-

toscaling methods, such as the Kubernetes Horizontal

Pod Autoscaler (HPA), which typically relies on low-

level service resource usage metrics (e.g., CPU and

memory utilization). These methods require contin-

uous monitoring and conﬁguration with target values

for service resource consumption, and scaling deci-

sions are based on achieving these predeﬁned targets.

Our proposed autoscaling policy, whose general

model is shown in Figure 2, aims to move beyond this

resource-centric approach by integrating performance

objectives (SLOs) and cost considerations into the

scaling process. This enables more efﬁcient autoscal-

ing decisions that balance both service-level require-

ments and operational costs, offering a more compre-

hensive solution for managing containerized applica-

tions in Kubernetes environments and the infrastruc-

ture required for their execution.

The core idea of the proposed approach is based

on the principle that an effective service autoscal-

ing policy must ensure acceptable performance while

minimizing infrastructure costs, especially in the face

of ﬂuctuating user request workloads. To achieve this,

the policy should continuously monitor both the ser-

vice’s response time and the associated infrastructure

costs through a monitoring framework, adjusting the

system to meet predeﬁned targets for these metrics.

By dynamically balancing performance and cost, the

approach aims to optimize resource allocation in real-

time, ensuring both high-quality service delivery and

cost-efﬁciency. Further details on the proposed cus-

tom autoscaler and the monitoring framework are pro-

vided in the following subsections.

3.2 Custom Pod Autoscaler

The proposed custom Pod autoscaler operates as a

Deployment within the Kubernetes control plane and

is built on top of the open source Custom Pod Au-

toscaler framework

The autoscaler is conﬁgured through a CustomPo-

dAutoscaler Kubernetes custom resource which de-

ﬁne a scaling conﬁguration for a Deployment. A Cus-

tomPodAutoscaler resource, whose schema is shown

https://custom-pod-autoscaler.readthedocs.io

apiVersion: custompodautoscaler.com/v1

kind: CustomPodAutoscaler

metadata:

name: nginx-autoscaler

spec:

scaleTargetRef:

apiVersion: apps/v1

kind: Deployment

name: nginx

runPeriod: 30000

stabilizationWindow: 10000

cost: 100

slo:

p: 99th

window: 20

target: 300

Listing 1: Example of a CustomPodAutoscaler resource.

in Listing 1, contains a spec property with ﬁve sub-

properties: scaleTargetRef, runPeriod, stabilization-

Window, cost and slo. The scaleTargetRef property

identiﬁes the target Deployment to scale. The runPe-

riod property determines the time interval, in millisec-

onds, between two consecutive executions of the au-

toscaling algorithm. The stabilizationWindow prop-

erty deﬁnes the time interval, in milliseconds, follow-

ing a scaling action during which the autoscaler can-

not take further scaling actions for the Deployment.

The cost property represents the desired target for the

overall hourly cost of the cluster nodes. The slo prop-

erty speciﬁes a target SLO for the response time of the

Deployment and contains three sub-ﬁelds: p, window

and target. The target ﬁeld represents the target SLO

value, in milliseconds, for the p-quantile of the De-

ployment response time over the time period deﬁned

by the window ﬁeld.

For each periodic execution of the autoscaling al-

gorithm the number of replicas for the Deployment is

determined by Equation 2.

= ceil(k ∗ r

) (2)

with:

k = w

∗ (rt

/rt

) + w

∗ (c

) (3)

where r

is the desired number of replicas, r

the

current number of replicas and k a multiplier factor.

The value of the k parameter is determined as the

weighted average between the response time SLO ra-

tio (rt

/rt

) and the cost ratio (c

). The response

time SLO ratio represents the relationship between

the current p-quantile of the response time rt

and the

target SLO response time rt

. A ratio greater than one

indicates SLO violations, signaling the need to scale

up the number of replicas. Conversely, a ratio below

one indicates no SLO violations, allowing for a reduc-

tion in replicas to lower infrastructure costs. The cost

CLOSER 2025 - 15th International Conference on Cloud Computing and Services Science

Custom Autoscaler

Response

time SLO

Monitoring framework

Target

infrastructure

costs

Current response

time

Current

infrastructure

costs

Figure 2: General model of the proposed approach.

ratio reﬂects the relationship between the desired tar-

get cost c

and the currently predicted cost c

. A ratio

greater than one indicates that the current predicted

costs are below the target, allowing for scaling up the

number of replicas to improve performance. On the

other hand, a ratio lower than one indicates predicted

costs exceed the target, necessitating a scale down of

replicas to reduce costs.

The w

and w

parameters are in the range be-

tween zero and one and their sum is equal to one. By

adjusting the values of the w

and w

parameters, dif-

ferent weights are assigned to the response time SLO

and cost ratios, respectively, in determining the value

of the k parameter. A value of w

signiﬁcantly higher

than w

indicates that scaling actions are primarily

driven by the need to meet SLO targets. Conversely, if

is much higher than w

, scaling actions are primar-

ily guided by the need to keep costs below the maxi-

mum target.

3.3 Monitoring Framework

The real-time service response times and infrastruc-

ture cost metrics are continuously collected by a

comprehensive monitoring framework (Marchese and

Tomarchio, 2024), as depicted in Figure 3, and are

made accessible to the custom autoscaler for dynamic

scaling decisions.

At the core of this monitoring framework is the

Prometheus

metrics server, a database designed to

collect, store, and query time series data. Prometheus

periodically gathers metrics from various exporters

and makes them available through the PromQL query

language, allowing for detailed insights and real-time

monitoring. The Prometheus server is deployed as a

Kubernetes Deployment within the control plane.

Service response time metrics are collected us-

ing the Istio

framework, a service mesh implemen-

https://prometheus.io

https://istio.io

tation that manages Pod communication within the

Kubernetes cluster. The Istio control plane is in-

stalled within the cluster and automatically injects a

sidecar container running an Envoy proxy into each

Pod upon creation. These Envoy proxies intercept all

trafﬁc between Pods, providing ﬁne-grained observ-

ability. They expose detailed trafﬁc statistics through

metrics exporters, which can then be queried by the

Prometheus server to capture real-time service re-

sponse times.

Infrastructure cost metrics are collected using the

OpenCost agent and node exporters. OpenCost is

a vendor neutral framework designed for measur-

ing and allocating cloud infrastructure and container

costs. Speciﬁcally built for Kubernetes environments,

OpenCost enables real-time cost monitoring, show

back, and charge back, providing valuable insights

into resource consumption and associated expenses.

The OpenCost agent, which runs as a Deploy-

ment within the Kubernetes control plane, collects

node CPU and memory metrics from the Prometheus

server. It then generates infrastructure cost metrics

based on a a pricing model and the collected data,

which are subsequently stored within Prometheus for

further analysis. Node exporters, deployed as Dae-

monSets on each cluster node, continuously monitor

and report CPU and memory usage, and expose those

metrics to the Prometheus server.

4 EVALUATION

The proposed solution has been evaluated by us-

ing a sample application generated using the µBench

benchmarking tool (Detti et al., 2023). µBench

enables the generation of Kubernetes manifests for

service-mesh topologies with one or multiple mi-

croservices, each running a speciﬁc function. Among

the pre-built functions in µBench, the Loader func-

tion models a generic workload that stresses node re-

SLO and Cost-Driven Container Autoscaling on Kubernetes Clusters

Kubernetes control plane

worker node

Prometheus server

Custom autoscaler

node exporter

Istio control plane OpenCost agent

node resource metrics

worker node

node exporter

worker node

node exporter

Pod

Envoy proxy

response time metrics

Figure 3: Monitoring framework.

sources when processing HTTP requests. When in-

voked, the Loader function computes an N number of

decimals of π. The larger the interval, the greater the

complexity and stress on the CPU. Additional stress

on node memory can be conﬁgured by adjusting the

amount of memory required by the function for each

computation. For this work, a single-service applica-

tion has been generated as a Kubernetes Deployment

with resource requirements of 0.5 vCPU and 250MB

of memory.

The test bed environment for the experiments con-

sists of a Rancher Kubernetes Engine 2 (RKE2)

Ku-

bernetes cluster with one master node for the con-

trol plane and a pool of worker nodes. These nodes

are deployed as virtual machines on a Proxmox

en-

vironment and conﬁgured with 2 vCPU and 8GB of

RAM. Autoscaling of worker nodes is managed by

the kproximate

cluster autoscaler, which communi-

cates with the Proxmox API server to dynamically

provision and de-provision virtual machines based on

the resource required by the service replicas. A pric-

ing model that charges one unit of cost per vCPU/hour

and one unit of cost per 1GB of RAM/hour is used to

calculate the overall cost of provisioned cluster nodes.

Black box experiments are conducted by evaluat-

ing the end-to-end response time of the sample appli-

cation and the overall infrastructure costs when HTTP

requests are sent to the application service with a

https://docs.rke2.io

https://www.proxmox.com

https://github.com/jedrw/kproximate

speciﬁed number of virtual users each sending one

request every second in parallel. Requests to the ap-

plication are sent through the k6 load testing utility

from a node inside the same network where cluster

nodes are located. This setup minimizes the impact

of network latency on the application response time.

Each experiment consists of 10 trials, during which

the k6 tool sends requests to the application for 30

minutes. For each trial, statistics about the application

response time are measured and averaged with those

of the other trials of the same experiment. An SLO of

300ms for the 90th of the application response time

and an overall cost of 40 units both over a 30 minutes

window are ﬁxed as target values. For each experi-

ment, we compare the performances of the proposed

custom autoscaler with those of the Kubernetes Hor-

izontal Pod Autoscaler. The custom autoscaler has a

run period and stabilization window set to 30 seconds,

with the autoscaling algorithm parameters w

and w

each assigned a value of 0.5.

Figures 4 and 5 present the results of the experi-

ments. The ﬁrst graph shows the 90th percentile of

the application response time in relation to the num-

ber of virtual users concurrently sending requests,

while the second graph illustrates the cumulative in-

frastructure costs for each experiment. Across all ex-

periments, the proposed approach consistently out-

performs the Kubernetes HPA in both application re-

sponse time and infrastructure costs. At lower vir-

tual user counts, the performance of the proposed ap-

https://k6.io

CLOSER 2025 - 15th International Conference on Cloud Computing and Services Science

100

150

200

250

300

0.1

0.3

0.8

1.2

1.6

Virtual users

90th percentile response time (ms)

Custom autoscaler

Kubernetes HPA

SLO target

Figure 4: Service response time.

100

150

200

250

300

100

150

Virtual users

cost units

Custom autoscaler

Kubernetes HPA

SLO target

Figure 5: Infrastructure costs.

proach is similar to that of the Kubernetes HPA, as

the application experiences limited load and minimal

infrastructure requirements. However, as the number

of virtual users increases, the proposed approach be-

gins to signiﬁcantly outperform the Kubernetes HPA,

with more noticeable improvements at higher user

counts. Both response time and infrastructure costs

grow more rapidly with the Kubernetes HPA com-

pared to the proposed approach.

5 RELATED WORK

In the literature, there is a variety of works that pro-

pose extensions of the Kubernetes platform in order

to devise custom Pod autoscaling solutions aimed at

ensure service response times while minimizing in-

frastructure costs (Tran et al., 2022; Do et al., 2025).

In (Marie-Magdelaine and Ahmed, 2020) authors

propose a proactive autoscaling framework that uses

a learning-based forecast model to dynamically ad-

just the resource pool, both horizontally and verti-

cally. The framework uses a proactive autoscaling al-

SLO and Cost-Driven Container Autoscaling on Kubernetes Clusters

gorithm based on Long Short-Term Memory (LSTM)

to improve the end-to-end latency for cloud-native ap-

plications.

Libra (Balla et al., 2020) is an adaptive autoscaler,

which automatically detects the optimal resource set

for a single Pod, then manages the horizontal scaling

process. Additionally, if the load or the underlying

virtualized environment changes, Libra adapts the re-

source deﬁnition for the Pod and adjusts the horizon-

tal scaling process accordingly.

In (Yuan and Liao, 2024) authors propose a pre-

dictive autoscaling Kubernetes operator based on time

series forecasting algorithms, aimed to dynamically

adjust the number of running instances in the cluster

to optimize resource management. In this work, the

Holt–Winter forecasting method and the Gated Re-

current Unit (GRU) neural network, two robust time

series forecasting algorithms, are employed and dy-

namically managed.

Gwydion (Santos et al., 2025), is a microservices-

based application autoscaler that enables different au-

toscaling goals through Reinforcement Learning (RL)

algorithms. Gwydion is based on the OpenAI Gym

library and is aimed to bridge the gap between RL

and autoscaling research by training RL algorithms

on real cloud environments for two opposing reward

strategies: cost-aware and latency-aware. Gwydion

focuses on improving resource usage and reducing the

service response time by considering microservice in-

ter dependencies when scaling horizontally.

In (Pramesti and Kistijantoro, 2022) an autoscaler

based on response time prediction is proposed for mi-

croservice applications running in Kubernetes envi-

ronments. The prediction function is developed using

a machine learning model that features performance

metrics at the microservice and node levels. The re-

sponse time prediction is then used to calculate the

number of Pods required by the application to meet

the target response time.

StatuScale (Wen et al., 2024) is a status-aware and

elastic scaling framework which is based on a load

status detector that can select appropriate elastic scal-

ing strategies for differentiated resource scheduling in

vertical scaling. Additionally, StatuScale employs a

horizontal scaling controller that utilizes comprehen-

sive evaluation and resource reduction to manage the

number of replicas for each microservice.

6 CONCLUSIONS

In this work, we propose extending the Kubernetes

platform with a custom Pod autoscaling strategy

aimed at minimizing SLO violations in the response

times of containerized applications running in cloud

environments, while simultaneously reducing infras-

tructure costs. Our primary goal is to address the lim-

itations of the Kubernetes Horizontal Pod Autoscaler,

which scales Pod replicas based on low-level resource

usage metrics. This approach makes it challenging to

deﬁne scaling targets that are properly correlated with

the desired response time SLOs and maximum infras-

tructure costs. The idea is to propose a Pod autoscal-

ing policy based on high-level metrics, such as actual

application response times and infrastructure costs, to

more accurately achieve the desired SLO and cost tar-

gets.

For future work, we plan to enhance the efﬁciency

of the proposed autoscaling policy by using AI and

time series analysis techniques to identify patterns in

user requests and predict their trends. This will enable

the development of a proactive autoscaling policy that

scales up the number of replicas to ensure improved

service performance, while minimizing infrastructure

over provisioning and reducing unnecessary costs.

ACKNOWLEDGEMENTS

This work was partially funded by the European

Union under the Italian National Recovery and Re-

silience Plan (NRRP) of NextGenerationEU, Mission

4 Component C2 Investment 1.1 - Call for tender No.

1409 of 14/09/2022 of Italian Ministry of University

and Research - Project ”Cloud Continuum aimed at

On-Demand Services in Smart Sustainable Environ-

ments” - CUP E53D23016420001.

REFERENCES

Balla, D., Simon, C., and Maliosz, M. (2020). Adap-

tive scaling of kubernetes pods. In NOMS 2020 -

2020 IEEE/IFIP Network Operations and Manage-

ment Symposium, pages 1–5.

Calcaterra, D., Di Modica, G., Mazzaglia, P., and Tomar-

chio, O. (2021). TORCH: a TOSCA-Based Orchestra-

tor of Multi-Cloud Containerised Applications. Jour-

nal of Grid Computing, 19(1).

Chen, T., Bahsoon, R., and Yao, X. (2018). A survey and

taxonomy of self-aware and self-adaptive cloud au-

toscaling systems. ACM Comput. Surv., 51(3).

Detti, A., Funari, L., and Petrucci, L. (2023). µbench: An

open-source factory of benchmark microservice ap-

plications. IEEE Transactions on Parallel and Dis-

tributed Systems, 34(3):968–980.

Do, T. V., Do, N. H., Rotter, C., Lakshman, T., Biro, C., and

erczes, T. (2025). Properties of horizontal pod au-

toscaling algorithms and application for scaling cloud-

CLOSER 2025 - 15th International Conference on Cloud Computing and Services Science

native network functions. IEEE Transactions on Net-

work and Service Management, pages 1–1.

Gannon, D., Barga, R., and Sundaresan, N. (2017). Cloud-

native applications. IEEE Cloud Computing, 4:16–21.

Gupta, H., Vahid Dastjerdi, A., Ghosh, S. K., and Buyya,

R. (2017). ifogsim: A toolkit for modeling and

simulation of resource management techniques in

the internet of things, edge and fog computing en-

vironments. Software: Practice and Experience,

47(9):1275–1296.

Hongyu, Y. and Anming, W. (2023). Migrating from mono-

lithic applications to cloud native applications. In

2023 8th International Conference on Computer and

Communication Systems (ICCCS), pages 775–779.

Kubernetes (2024). Production-Grade Container Orchestra-

tion. https://kubernetes.io. Last accessed 3 Jun 2024.

Marchese, A. and Tomarchio, O. (2024). Telemetry-driven

microservices orchestration in cloud-edge environ-

ments. In 2024 IEEE 17th International Conference

on Cloud Computing (CLOUD), pages 91–101, Shen-

zhen, China. IEEE Computer Society.

Marie-Magdelaine, N. and Ahmed, T. (2020). Proactive au-

toscaling for cloud-native applications using machine

learning. In GLOBECOM 2020 - 2020 IEEE Global

Communications Conference, pages 1–7.

Mukherjee, A., De, D., and Buyya, R. (2024). Cloud Com-

puting Resource Management, pages 17–37. Springer

Nature Singapore, Singapore.

Pramesti, A. A. and Kistijantoro, A. I. (2022). Autoscaling

based on response time prediction for microservice

application in kubernetes. In 2022 9th International

Conference on Advanced Informatics: Concepts, The-

ory and Applications (ICAICTA), pages 1–6.

Salii, S., Ajdari, J., and Zenuni, X. (2023). Migrating to

a microservice architecture: beneﬁts and challenges.

In 2023 46th MIPRO ICT and Electronics Convention

(MIPRO), pages 1670–1677.

Santos, J., Reppas, E., Wauters, T., Volckaert, B., and De

Turck, F. (2025). Gwydion: Efﬁcient auto-scaling

for complex containerized applications in kubernetes

through reinforcement learning. Journal of Network

and Computer Applications, 234:104067.

Tran, M.-N., Vu, D.-D., and Kim, Y. (2022). A survey of

autoscaling in kubernetes. In 2022 Thirteenth Interna-

tional Conference on Ubiquitous and Future Networks

(ICUFN), pages 263–265.

Wen, L., Xu, M., Gill, S. S., Hilman, M. H., Srirama, S. N.,

Ye, K., and Xu, C. (2024). Statuscale: Status-aware

and elastic scaling strategy for microservice applica-

tions.

Yuan, H. and Liao, S. (2024). A time series-based approach

to elastic kubernetes scaling. Electronics, 13(2).

SLO and Cost-Driven Container Autoscaling on Kubernetes Clusters