computing network mainly rely on the Kubernetes
resource orchestration platform. The open-source
version of Kubernetes provides HPA and VPA
policies. Among them, HPA adopts a polling
mechanism to obtain the CPU utilization rate of each
Pod from the metrics server, then calculates the
average resource utilization rate of the cluster,
compares it with the user-defined CPU resource
utilization threshold during the decision-making
phase, and finally calculates the expected number of
Pod replicas. To some extent, HPA solves some
elasticity requirements in the computing network.
However, the open-source version of HPA is a
reactive scaling strategy, which passively adjusts
resource allocation based on changes in application
load, resulting in a certain degree of lag and
increasing the SLAs violation rate (Shin S H, 2023)-
(Paulo Pereira, 2019).
The challenges of CFN is presented as follows.
1) Scaling decisions are typically based solely on the
observations from the current scaling period,
primarily focusing on average CPU usage. However,
considering a broader historical perspective could
provide a more accurate understanding of what lies
ahead.2) The default capacity expansion mechanism
is lagging behind. When user resource requirements
change rapidly, the default capacity expansion
mechanism reduces user QoS. 3) The existing
prediction algorithms have some problems, such as
insufficient prediction accuracy and large memory
consumption. Moreover, we propose a predictive
HPA mechanism, on the one hand to monitor the
traffic, on the other hand to achieve early expansion
before load burst. The contributions of this paper are
summarized as follows.
We proposed a new knowledge distillation
method and it is experimentally verified that
the proposed knowledge distillation method
improves the model accuracy compared to
existing knowledge distillation methods.
Based on the knowledge distillation method
and the existing time series prediction model
Informer, a new time series prediction model
KD-Informer is proposed in this article. It is
experimentally verified that the proposed
model improves the prediction accuracy
compared to Informer, Reformer, and other
time series models.
We propose P-HPA based on the above time
series prediction model. Experiment results
show that the P-HPA improves the scaling
accuracy and reduces the SLAs violation rate
compared to the default HPA.
2 RELATED WORK
Elastic resource scaling is not a unique concept to
the computing power network, which has always
been a key factor to consider in cloud computing.
We have conducted research and summarized the
current state of research on resource elasticity
scaling, and Classified it as follows.
Threshold-Based: (Yahya Al-Dhuraibi, 2017)
and (Gourav Rattihalli, 2019) improved vertical
elasticity in lightweight virtualization technology
cloud systems using threshold-based scaling rules.
(Hamzeh Khazaei, 2017) proposed Elastic Docker,
an autonomous solution based on IBM's MAPE-K
principle, which enables autonomous vertical
elasticity for Docker containers. In (Zhicheng Cai,
2022), the authors introduced an automatic scaling
system based on resource utilization, enhancing
Kubernetes' VPA and dynamically adjusting
container allocation in Kubernetes clusters. Both
papers explore container migration and investigate
vertical scaling possibilities, while our proposed
solution focuses on enhancing horizontal
autoscalers. (Toka, 2020) demonstrated the
architecture and initial implementation of Elascale,
which provides automatic scalability and monitoring
as a service for any cloud software system. Elascale
makes scaling decisions based on a tunable linear
combination of CPU, memory, and network
utilization.
Forecast-Based: Short-term demand forecasting
has been extensively studied in various domains
(Kader, 2022)-(Punia, 2020). In the energy sector,
accurate prediction of electricity generation from
wind turbines is crucial. Li et al. (H. Arabnejad,
2017) proposed a four-input neural network that
outperformed traditional single-parameter methods.
For predicting electricity market prices, Catalao et
al. introduced a three-layer feedforward neural
network approach, which proved to be superior to
the previously proposed autoregressive integrated
moving average (ARIMA) methods in terms of time
efficiency and ease of implementation. Inspired by
Tan's work (Horovitz, 2018), who designed a
predictive HPA using the AdaBoost-LSTM
algorithm with improved QoS, we adopt a
prediction-based approach in our system. We aim to
select the most suitable scaling method by
leveraging prediction techniques.
Reinforcement Learning-Based: Arabnejad et
al. (Sherstinsky, 2020) proposed a resource
allocation approach for virtual machines (VMs) by
combining Q-learning and the SARSA algorithm
with an adaptive fuzzy logic controller. They