Supporting Disconnected Operation of Stateful Services Using an
Envoy Enabled Dynamic Microservices Approach
Tim Farnham
a
Bristol Research and Innovation Lab., Toshiba Europe Ltd, Bristol, U.K.
Keywords: Dynamic Microservices, Service Mesh, Kubernetes, Stateful Services, Envoy, Edge, Cloud, Hybrid
Deployment, Continuous and Disconnected Operation.
Abstract: Dynamic microservice and service mesh approaches provide many benefits and flexibility for deploying
services and setting policies for access control, throttling, load balancing, retry, circuit breaker or shadow
mirror configurations. This paper examines extending this to support continuous operation of stateful
microservices in hybrid cloud / edge deployment, without loss of data, by permitting disconnected operation
and resynchronisation. These are important considerations for critical applications which must continue to
operate even during prolonged cloud disconnection and node or client failure. Such service requirements are
typical of retail and other scenarios in which services must run continuously, while maintaining a consistent
state between cloud and edge service instances. The approach taken and evaluated in this paper exploits a
lightweight Envoy proxy within Choreo connect microgateways and Consul service mesh sidecars. Envoy
proxies are able to efficiently perform shadow mirroring of requests and support graceful failover, but requires
additional functionality to support resynchronisation and recovery from failure that are examined in this paper.
1 INTRODUCTION
The use of dynamic microservice architecture and
service meshes provide resilience to failures. This is
by virtue of the replication of individual service
instances across different resources, or availability
zones, in a flexible manner, without needing the
services themselves to be aware of this. However, in
hybrid edge/cloud deployments this raises additional
challenges for stateful services due to the need to
constantly maintain synchronisation of replicas over
the Internet connections. There are two main
approaches to achieve this synchronisation, firstly, by
replicating the file or database systems supporting the
services and secondly by using service shadowing.
The latter approach is preferred when there is a need
to support services across hybrid environments as
there is no need to rely on certain network
performance or to restrict what persistent storage and
database mechanisms are employed to support the
various different services. This is also attractive from
the point of view of combining load balancing with
failover and tailoring the level of resilience when the
requirements and resources change over time.
a
https://orcid.org/0000-0002-5355-3982
Recent analysis in (Sampaio, 2019) and
(Mendonca, 2020) indicates that using properly
configured retry and circuit breaker functionality for
stateless microservices can greatly improve
performance under transient error and overload
conditions. This analysis also helps to define optimal
placement and scaling for resource usage. However,
little prior research has considered the optimisation of
failover for stateful services under prolonged failures.
Some prior investigations show that service outage
can be reduced when using Kubernetes stateful set
controllers with hot-standby services (Vayghan,
2019) and switching the standby to the active state
when failure occurs. However, they still result in
service outage and do not consider the prolonged
node or network failure scenarios which result in
disconnected operation and how to resynchronise the
state back to the replicated services during recovery
without service disruption.
In addition, investigations of optimal placement
of microservices (Sampaio, 2019) show that the
affinity between services is a vital factor for
determining optimal placement but do not consider
the failure resilience performance and disconnected
Farnham, T.
Supporting Disconnected Operation of Stateful Services Using an Envoy Enabled Dynamic Microservices Approach.
DOI: 10.5220/0011644100003488
In Proceedings of the 13th International Conference on Cloud Computing and Services Science (CLOSER 2023), pages 115-122
ISBN: 978-989-758-650-7; ISSN: 2184-5042
Copyright
c
2023 by SCITEPRESS Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)
115
operation support which require replication and
resynchronisation of these service clusters. There
have also been prior proposals for edge-cloud
computing that does not need to compute optimal
placement a-priori (Wang, 2021). This approach
mirrors services at the edge and in the cloud and uses
the first responses to achieve high performance, but
do not consider failure and recovery.
In this paper we address the solution to providing
failure resilience and disconnected operation support
for stateful microservices using the commonly used
Envoy proxy, within API microgateways and service
meshes in a dynamic microservice architecture. In
particular, the focus is on transactional client
application support, which is typical in retail
scenarios, such as in-store Point Of Sales (POS). In
this use-case, the state information for each
application session is independent of other
application sessions.
1.1 Dynamic Microservices
Dynamic microservices aim to provide resilience
through edge/cloud flexibility by permitting
microservices to be seamlessly distributed across
different environments. One example is in retail
services, such as transaction services, where the
storing of state information in the applications is not
possible as transfer of sessions between different
application clients or end devices is needed. Centrally
storing the state would also prevent disconnected
operation and present a performance bottleneck.
Therefore, storing service state in a distributed
manner, but not within the application clients, is a
fundamental requirement to support tolerance to
client device and application failures as well as edge
disconnected operation. In such a case replicated
stateful services store the state information and permit
dynamic transfer of sessions to other client devices
without loss of data.
In addition, tolerance to service or node and
network failures necessitates replicating service
instances between edge and cloud resources. In such
a situation the state information needs to be kept in
synchronisation as it must not be permitted to have
inconsistent state information to facilitate failover
and continued operation. Stateful service replication
must be considered carefully as the normal load
balancing and failover approaches supported in
existing dynamic microservices frameworks, such as
in Baboi (2019), do not address this specific problem.
1.1.1 Service Mesh
Service mesh approaches provide a common means
to control and route access to services in the so-called
east-west interactions, that are service to service. In
most service mesh solutions the proxy sidecars permit
dynamic load balancing and failover support to
ensure continued operation, using end-point health
metrics, without considering the state data embedded
within the service instances. However, when the
service instances are stateful, consideration of state
limits failover to a subset of shadow service instances
that contain the most recent state. Also, to provide
resilience against network failures it is necessary that
state is distributed and replicated widely, but this then
presents a challenge for keeping the state information
synchronised. In the extreme case of prolonged
disconnected operation, the most recent state may
only be stored in one instance and then
resynchronised with other instances on reconnection.
Traditional load balancing and failover logic becomes
less appropriate in these cases as they tend to assume
a pool of load balancing and failover endpoints. For
instance, within stateless service meshes it would be
sufficient to balance load and retry or failover based
only on the number of healthy and unhealthy
endpoints and treating all end points equally, but with
stateful services consideration must be given to the
replication and resynchronisation of service state as
the top priority.
Figure 1: Microgateway operation sequence.
1.1.2 API Gateway
API gateways provide the ingress point for
applications to the service meshes for the so-called
north-south interactions with services. They primarily
act to authorise access to the service meshes but also
to manage the ingress traffic through throttling based
on quotas and usage policies using the policy
enforcement function (see Figure 1). Hence, API
CLOSER 2023 - 13th International Conference on Cloud Computing and Services Science
116
gateways are aware of the individual application
sessions. It is also advantageous for reducing
overhead that the same proxy implementation is used
for both API gateways and service mesh sidecars. By
exploiting the implicit or explicit information relating
to application session embedded in the access tokens
it is possible to route ingress requests based on this
authorization bearer token header information. In this
manner handover of user sessions from one end
device to another can be performed in a transparent
manner. Typically, the authorization bearer tokens
are in JSON Web Token (JWT) format and can
contain subject, audience, issuer, and client claim or
consumer key (azp/aud) related information as well
as scope and expiry information.
Microgateways are API gateways that target a
distributed lightweight distirbuted deployment
paradigm. In this way many gateway instances are
used rather than centralised gateway solutions. They
support cloud or edge environments and provide the
API ingress points that can be integrated within
Kubernetes distributed processing environments to
provide ease of deployment and management. This
can be either as an ingress controller or mesh proxy
gateway.
2 SERVICE REPLICATION AND
RECOVERY
2.1 Shadow Mirror
Service shadow mirroring can be performed within
microgateway or service mesh proxies to duplicate all
incoming requests towards a shadow endpoint or
cluster (see Figure 3). This capability has historically
been used for background testing prior to canary
rollout of new versions of services rather than to
replicate service state across different instances.
However, for data manipulation using REST write
operations, such as PUT, POST and DELETE then
the shadowing of requests can replicate the service
state. Envoy proxies can permit very efficient shadow
mirroring, based on a fire-and-forget approach rather
than handling responses. An alternative approach for
shadow mirroring is duplicate two or more identical
request paths towards different service endpoints and
discard the responses from all the endpoints that are
slower than the first to respond. This is similar to the
proposed approach in (Wang, 2021), which claims to
provide performance benefits if the processing can
also be aborted on the slower endpoints. However, in
our case the processing must continue for reliably
replicating the session state. Therefore, our proposed
approach is to set the primary path to the most likely
path to fail in order that the secondary path has the
state already available prior to the failover.
In comparison, data replication techniques that
operate on the database or file system level can be
used. However, when operating over the cloud/edge
boundary the likelihood of disconnection or
performance bottleneck is high. Therefore, the local
edge environment is often treated as a separate cluster
to the cloud environment. The disconnected
operation creates a challenge when the number of
replicas is low, which is our scenario of interest. In
this case the use of two out of three or four out of six
quorum techniques are not possible, which are typical
in high performance state replication solutions.
Figure 2: Endpoint clusters for primary and secondary.
2.1.1 Resynchronisation
Resynchronisation occurs to recover state following
failure of the network, nodes, applications or services.
In the approach in which requests are cached and
replayed the process is an identical repeat of the initial
application write (i.e. REST based POST, PUT,
DELETE) requests. Hence, the main criteria are that
the order of requests are maintained and no duplicate
requests are made towards the same service instance
or cluster and no requests are lost. If these conditions
are fulfilled the state will be recovered. An alternative
approach to achieve the same goal is to use a
replicated database middleware such as Middle-R and
C-JDBC or file system solution such as AWS
DataSync. However, in such approaches the support
for disconnected operation across the edge/cloud
boundary relies on capabilities within the specific
database technology or file system Persistent Volume
(PV) storage classes such as NFS, Lustre, AWS EFS
or S3. At present there are at least 124 Container
Storage Interface (CSI) drivers supported by
Kubernetes, each having different capabilities and
implementations. This makes file system specific
approaches complex to support. Also, cross-
Supporting Disconnected Operation of Stateful Services Using an Envoy Enabled Dynamic Microservices Approach
117
edge/cloud approaches such as AWS DataSync only
support a few CSI with certain limitations, e.g. within
snowball edge devices there is no support for
symbolic links and files must not be modified during
the synchronisation process. In addition, the transfers
must be periodically scheduled and so can remain out
of sync for long periods. This limits the performance
and prevents continuous operation of the services as
updates would be lost during local node failure and
services would need to be suspended during backup
and resynchronisation.
2.2 Envoy Proxy
The Envoy proxy is a popular and efficient
lightweight implementation written in C++ and
configured via yaml files or a gRPC control plane. It
provides flexible L4 and L7 filtering capabilities that
permit customised handling of traffic. The main
advantages of the Envoy approach are that it is self-
contained and high performance with a small memory
footprint, it has first class support for HTTP/2 and
gRPC for both incoming and outgoing connections. It
is a transparent HTTP/1.1 to HTTP/2 proxy and
supports advanced load balancing features including
automatic retries, circuit breaking, global rate
limiting, request shadowing, zone local load
balancing.
The Envoy proxy is used in many service mesh
solutions, such as istio, Kong mesh, Consul and also
within API microgateways such as Choreo connect,
Apigee edge, Kusk gateway, Ambassador edge stack
or Kong within the recent Envoy Gateway initiative,
which is aiming to standardise an API for the
configuration of Envoy based microgateways.
Therefore, it is a logical choice for supporting stateful
microservice resilience. However, it does not
currently support the necessary caching and
resynchronisation functions to permit resilience to
prolonged failure. This is an intentional omission as
it is a very high performance and lightweight proxy
solution which would be sacrificed if it were to
support these features. Hence, our proposed solution
approach is to combine Envoy with an external
caching capability for failover endpoints in order to
support resynchronisation without compromising
performance.
2.2.1 Failover and Load Balancing
The Envoy proxy supports graceful failover based on
the proportion of healthy endpoints within a cluster
using an overprovisioning factor. Endpoint health is
monitored using active probing in a fully
decentralised manner. This provides a means to
balance load across healthy endpoints and also
support failover at the same time. Such probabilistic
failover can be compatible with stateful services if the
endpoint clusters do not have common cross-session
state information and sticky session load balancing
policies used, such as ring hash and maglev (Eisenbud,
2016). Alternatively, a cluster-wide shared PV can be
used in which all services in a cluster share the same
or a replicated persistent storage resource, such as a
distributed file system. Then, a round robin, least
request or random load balancing about those
instances can be performed. State replication between
the edge and cloud is still required for resilience to
network failures and so request shadowing between
edge and cloud is a simpler solution compared with
implementing a shared PV across the edge-cloud
boundary that can support disconnected operation.
Endpoints should also be replicated in different
clusters and availability zones to avoid site or
network faults causing a single point of failure.
2.3 External Caching
The main aim of the caching proxy functionality is to
provide a means to store service requests in order to
perform resynchronisation of service state when
endpoints recover or become available again.
Therefore, it is only required in the failover chain
(cluster) endpoints (see Figure 4). The solution used
for supporting caching is with a local postgres
database. As postgres can support an efficient, high
performance and resilient solution it is an ideal
candidate even for relatively lightweight edge nodes.
At present only the POST, DELETE and PUT REST
based API requests are cached as these correspond to
the stateful write operations. However, to support
other APIs such as websocket or streaming APIs it
would be necessary to develop corresponding and
more suitable caching approaches. The advantage of
focussing on REST based APIs is that the caching
solution is very simple. The postgres table used to
store the requests consists of two elements, namely
the HTTP request header and the payload request
body. Hence, when replaying the requests the table is
retrieved in sequence and converted back into HTTP
requests.
2.4 Duplicate Removal
An important issue that arises from the proposed
approach (as shown in Figure 4) results from the
duplication of requests. As the shadow and the
failover endpoints are towards the same service
CLOSER 2023 - 13th International Conference on Cloud Computing and Services Science
118
instances, during failover the same requests are sent
to an endpoint via the two different routes. This
would be unacceptable in situations in which the
service is not able to detect duplication, so to avoid
this a duplicate removal function is inserted. Ideally
this would be embedded within the Envoy router
itself and a solution based on using the aggregate
cluster feature of Envoy was explored, but currently
this feature is not security hardened and requires
trusted upstream and downstream endpoints.
Therefore, to avoid increasing the Envoy complexity
and need to store state within the proxy an external
function was created. The way in which the duplicate
requests are removed is by exploiting additional
header information inserted by the Envoy router
within the Choreo connect MGW. The header
parameters used for detecting duplicate requests are:
x-wso2-cluster-header - primary cluster
x-envoy-internal - true if in shadow chain
x-trace-key - unique id for request
The way in which the data is distributed is by
using an additional shared control plane which shares
this information between the caching proxy instances
and duplicate removal functions (see Figure 3). A
circular buffer containing these header parameters is
maintained in each duplicate removal function which
is checked for each incoming request to determine
whether to route it. The control plane used for sharing
the request header information is based on the Object
Management Group (OMG) real-time publish
subscribe Data Distribution Service (DDS) protocol.
This is a fully decentralised UDP based protocol that
permits efficient distribution across the various
duplicate removal and cache instances within the
clusters. The OpenSplice community edition
implementation of DDS was selected for this purpose
due to the lightweight and high-performance
implementation. The performance of different
approaches for supporting DDS are compared by
(Kang, 2021) and show that for multi-subscriber use
cases, DDS security produces better throughput
performance than security provided by virtual
networks. The reason is that DDS security only
ciphers the data once when transmitting samples to
multiple recipients. Hence, DDS multicast is a good
choice for the shared control plane. We selected to
use the WeaveNet Container Network Interface
(CNI) plugin approach to support the DDS reliable
multicast on the edge cluster. This is because it uses
an encapsulated multicast to unicast approach which
makes transmission over wireless or heterogeneous
edge networks such as WiFi/Ethernet more suitable
and efficient with reliability QoS class.
Figure 3: Control and data planes with MGW and Consul.
3 IMPLEMENTATION AND
EVALUATION
In order to evaluate the proposed approach to stateful
service resilience a test environment was setup within
AWS VPCs and edge environments. The setup
consists of three Kubernetes clusters with a K3S
cluster within the edge and two EKS clusters for the
cloud data centre services spread across 3 availability
zones. A federated consul service mesh is configured
across the Kubernetes data centre clusters,
representing the edge and cloud, using the proxy
default local mode. In this manner the interaction
between the edge and the cloud is directed via mesh
gateways. The Envoy based Choreo connect
Microgateway is used with load balancer as the
ingress for application requests.
The Choreo connect Envoy microgateway router
deployments provide the ingress point for client
application requests. WSO2 API managers (v4.1.0)
are used to configure the microgateway routers via
adapters that are located in the edge cluster along with
policy enforcers. A global AMQP based control plane
between adapters and enforcers with the API manager
is used for this together with a local gRPC based
control plane between the adapter and enforcers and
the routers. In this manner the overheads are
minimised, while at the same time maintaining high
tolerance to individual node failures in order to
support continuous operation. Enforcers can cache
subscription and access token related information
locally to support fault tolerance and improve
performance. It is also necessary to have at least 2
adapter and enforcer instances to provide this
tolerance to individual node failures.
3.1 MGW Configuration
The Envoy router is configured via the local Choreo
connect MGW xDS / gRPC control plane from the
Supporting Disconnected Operation of Stateful Services Using an Envoy Enabled Dynamic Microservices Approach
119
adapter. The adapter retrieves the API definitions
from the API manager and consul server using the
corresponding REST APIs and creates the
configurations for the routers. Each router is
configured identically for each instance. For each
incoming request the header authorisation bearer
token is verified using the Choreo connect enforcer
over the gRPC control plane. The enforcer retrieves
the subscription information relating to the token
claim from the API manager. The subscription
information contains the accessible APIs as well as
corresponding quota and scope related information.
This permits the enforcer to decide whether to permit
routing of the request. The API usage data is recorded
by the enforcer and is updated globally with all
enforcer instances via the API manager and to the
optional Choreo analytics service.
In addition, the Choreo connect adapter has been
updated to include the shadow mirror policy
configurations. This permits the transparent
shadowing of requests towards the shadow endpoints,
which is via the duplicate removal function instances.
The adapters also configure the graceful failover
cluster policies based on the default overprovisioning
factor of 1.4. The primary endpoints being the priority
level 0 and the secondary being priority level 1. The
HTTP based health checking option is configured for
each cluster. In this manner the failover from primary
to secondary starts when there is less than 72% of
endpoints in the P0 cluster in a healthy state.
Figure 4: Configuration of edge / cloud cluster routing.
3.2 Caching Proxy and Duplicate
Removal
The caching proxy used for the failover path is an
asynchronous streaming reverse proxy
implementation based on the Apache non-blocking
NIO libraries. This is a high-performance solution
that can scale to support massive parallel session
handling. It also permits access to the request header
information necessary for passing to the duplicate
removal function and to the complete request body
for request caching. It is integrated with a postgres
database in the same pod in order to provide the local
cache storage which uses a local PV for resilience to
service or node failures and restarts. The cache
solution also performs the resynchronisation to the
remote endpoint when they return to the healthy state.
The same approach is used for the duplicate
removal function implementation, but without the
caching of requests. As the shadow mirroring
performed by the Envoy proxy is fire-and-forget,
upstream issues in the shadow chain are only detected
by health check probes.
3.4 Performance Evaluation
3.4.1 Microgateway
The enforcer within the microgateway is the most
computationally intensive element of the proposed
approach. It provides the entry point for authorising
application requests and to secure the interactions
within the service mesh. The shadow mirroring itself is
a low overhead approach with negligible additional
latency to the primary path. The performance of the
Choreo connect microgateway is summarised in Figure
and indicates that around 500 requests per second can
be handled per vCPU instance. In this configuration the
API invocations utilise two levels of load balancing.
The first is the K3S or EKS service load balancer
followed by the Envoy router load balancer, which also
performs authorisation bearer token validation, but no
rate limits are placed on individual applications or
APIs in the test configuration. There is a linear scaling
in resource utilisation until the xDS control plane limits
performance. This is shown in Figure and indicates that
around 12 router instances can handle almost 4000
requests/s from 3200 parallel client sessions.
These results are also consistent with observation
by (Johansson, 2022), which indicates that Envoy
performs well with up to ~300 parallel sessions
per
instance. Beyond this level other load balancing
Figure 5: CPU utilisation of Choreo connect microgateway.
CLOSER 2023 - 13th International Conference on Cloud Computing and Services Science
120
Figure 6: Microgateway throughput with round robin LB.
solution approaches, such as NGINX and HAProxy
can perform better. Therefore, the two-level load
balancing is preferable.
3.4.2 Service Mirroring
In terms of end-to-end performance of the mirrored
services it is possible to see from Figure that there is
a linear relationship between number of nodes in each
cluster and the requests per second that can be
handled. When mirroring is performed in a separate
cluster the performance is reduced due to the
duplicate removal and processing of shadow path
requests in the mirror chains. This results in a 17%
reduction for request throughput per node compared
to the no mirror case.
Figure 7: Mirror performance for large REST POST
requests (7kByte request + 35kByte response).
However, if mirroring occurs within the same
cluster, then the performance reduction is around
55% due to doubling of the number of service
requests handled per node in the same cluster as well
as duplicate removal checking.
3.4.3 Service State Recovery
The recovery resynchronisation processes using file
system or database replication ensure consistency
with ACID transaction handling mechanisms. This
involves complex transaction commit and consensus
procedures such as the classical quorum replication
approaches, such as the minimal two out of three
selection. This can allow for disconnected operation
but impacts on performance of all updates, when
using synchronous replication, and in the case of
lightweight edge deployments can be too complex
and resource intensive. It is therefore common that
only the master replica instance performs write
operations and the replicas receive the updates from
the elected master. When failure of the master or
network partitioning occurs it is important that a new
master is elected but can lead to an outage period.
Also, during recovery all replicas synchronise with
the new master, but this may not be possible if
network partitions each contain a master replica node.
For instance, during disconnected operation when
both the edge and cloud environments contain a
master replica node. In this situation there is a
potential for conflicting state and to prevent this it is
necessary to ensure that this situation cannot happen.
Two issues can arise from this, the first is that the
requests may have succeeded after the failover to the
secondary. Secondly, the replay of the requests occurs
at a later point in time and so if the services have a
time dependent processing behaviour the result
applied in the database may be different. These issues
can be avoided if the request duplicate detection
mechanisms are implemented in both the shadow and
failover paths, within the caching proxies, to remove
duplicate requests. Secondly, primary path responses
codes can be cached, near to the services, in addition
to requests and cross-checked for request duplication
and the service times dynamically adjusted to emulate
the time of the original requests during replay.
The recovery performance is an important
consideration for the viability of the approach. To
evaluate this both network disconnection and node
rebooting tests were used. Replication middleware
methods for databases, such as Middle-R and C-
JDBC are most similar to the proposed approach, but
suffer from a prolonged increase in response times
when failure occurs even under modest (i.e. < 50%)
loads. In contract request failover is at the API
microgateway level only relies on a request or health
check timeout to the service endpoint. This leads to a
faster failover and recovery process. For instance,
Middle-R requires 60 seconds and C-JDBC around
180 seconds (Dhamane, 2014). In contrast the API
microgateway failover timeouts can be typically
configured to respond in <2 seconds and immediately
have the same prior performance. A 20 millisecond
Supporting Disconnected Operation of Stateful Services Using an Envoy Enabled Dynamic Microservices Approach
121
delay is inserted into the shadow chain to permit
duplicate request removal.
The resynchronisation recovery process has been
evaluated. In the case of link disconnect the time of
failure is 10 seconds and the recovery occurs at 95ms
per request with overall throughput of 13.1 requests
per second over a one minute test duration. The
maximum request latency is 1.7s in this case. For the
node reboot failure case the recovery time is
marginally faster than the network disconnection, but
fewer requests need to be buffered and the
corresponding request replay time is longer at 235ms.
This is due to the node taking some time to return to
normal operation state. The maximum latency is also
1.7s in this case.
4 CONCLUSIONS
In hybrid edge/cloud microservice deployment it is
necessary to consider the resilience of stateful
services carefully in order to fully support continuous
operation without performance degradation. The
proposed approach, using Envoy based proxies, is a
promising method of permitting replication and
recovery of service state, by using shadowing,
without requiring performance limiting distributed
file system storage or database replication solutions
across the edge cloud boundary. The approach
proposed and evaluated within this paper has many
advantages as it does not require services or
underlying storage solutions to be modified and does
not impact significantly on throughput and latency
performance. For instance, 17% reduction in
throughput was observed for large update requests
and negligible additional latency is introduced in the
primary request paths. However, it is evident that
some limitations arise and have been mostly
overcome or avoided. In particular, the independence
between different client application sessions is
assumed. In which case, the use of sticky session load
balancing such as Maglev or hash tables can avoid the
need for a cluster-wide shared PV storage which can
be difficult or undesirable to support. Also, avoiding
the use of cross-session and service state that would
create conflicts between parallel sessions leading to
complex state synchronisation. In addition, it is
necessary that updates do not have time dependent
logic (or the original time of request need to be
emulated in the service). This approach has been
evaluated in the context of retail services which
support transaction handling, inventory management
and promotion services. These require consistent and
replicated service state data, across the edge/cloud,
without loss to permit continuous operation, with low
overhead, even under node failure and cloud network
disconnect events.
Further work is needed to evaluate the
improvements that are possible by integrating the
duplicate removal and caching functionality into the
Envoy proxies, rather than keeping them separate.
REFERENCES
Sampaio, A., Rubin, J., Beschastnikh, I. and Rosa, N.
(2019) Improving microservice-based applications with
runtime placement adaptation. J Internet Serv Appl 10,
4 (2019). https://doi.org/10.1186/s13174-019-0104-0
Vayghan, A. L., Saied, M., Toeroe, M. and Khendek, F.,
(2019) Microservice Based Architecture: Towards
High-Availability for Stateful Applications with
Kubernetes, 2019 IEEE 19th International Conference
on Software Quality, Reliability and Security (QRS),
2019, pp. 176-185, doi: 10.1109/QRS.2019.00034
Mendonca, N., Aderaldo, C., Camara, J. and Garlan, D.
(2020) Model-Based Analysis of Microservice
Resiliency Patterns, In 2020 IEEE International
Conference on Software Architecture (ICSA), 2020, pp.
114-124, doi: 10.1109/ICSA47634.2020.00019.
Wang, Y., Xia, Y., Zhang, Y., Melissourgos, D., Odegbile,
J., Chen, S. (2021) A Full Mirror Computation Model
for Edge-Cloud Computing, In IC3 '21: 2021
Thirteenth International Conference on Contemporary
Computing (IC3-2021) August 2021 Pages 132–139
https://doi.org/10.1145/3474124.3474142
Baboi, M., Iftene, A., Gîfu, D. (2019). Dynamic
Microservices to Create Scalable and Fault Tolerance
Architecture. In Procedia Computer Science, Volume
159, 2019, Pages 1035-1044, ISSN 1877-0509
Dhamane, R., Patino, M., Valerio, M., Peris, R. (2014)
Performance Evaluation of Database Replication
Systems. In Proceedings of the 18th International
Database Engineering & Applications Symposium
IDEAS, July 2014 Pages 288–293 https://doi.org/
10.1145/2628194.2628214
Kang, Z., An, K., Gokhale, A., Pazandak, P. (2021). A
Comprehensive Performance Evaluation of Different
Kubernetes CNI Plugins for Edge-based and
Containerized Publish/Subscribe Applications.
Conference: 9th IEEE International Conference on
Cloud Engineering 10.1109/IC2E52221.2021.00017.
Johansson, A. (2022). HTTP Load Balancing Performance
Evaluation of HAProxy, NGINX, Traefik and Envoy
with the Round-Robin Algorithm (Dissertation).
Retrieved from: http://urn.kb.se/resolve?urn=urn:nbn:
se:his:diva-21475
Eisenbud, D. E., Yi, C., Contavalli C., Smith, C., Kononov,
R., Mann-Hielscher, E., Cilingiroglu, A., Cheyney, B.,
Shang, W., Hosein, J. D., (2016) Maglev: A Fast and
Reliable Software Network Load Balancer, In 13th
USENIX Symposium on Networked Systems Design and
Implementation (NSDI 16), USENIX Association, Santa
Clara, CA (2016), pp. 523-535.
CLOSER 2023 - 13th International Conference on Cloud Computing and Services Science
122