
posed a novel two-part fog-native resiliency solu-
tion, providing idempotency-safe retries and auto-
matic completion of requests for HTTP POST mi-
croservices. The proposed solution first enables
idempotency-safe retries following infrastructure fail-
ure, while preventing non-safe retries following ap-
plication functional failure. The completer part of
the solution, overcomes functional failure through au-
tomatic application of developer-defined solvers via
FaaS. The two parts were implemented as microser-
vice patterns. They have been integrated with Con-
sul service mesh and Fission FaaS platform, as exam-
ple enablers. The proposed solution has been evalu-
ated experimentally. The results have shown ≈ 0.03 −
0.07 improvement in request satisfaction due to safe-
retries, compared to no-retries. The results have fur-
ther shown 100% request completion rate, facilitated
by the Completer system. This illustrated the benefits
of the proposed solution in enabling resilient opera-
tion of fog-native applications.
ACKNOWLEDGMENTS
This work has been partially co-funded by the Smart
Networks and Services Joint Undertaking (SNS JU)
and the UK Research and Innovation (UKRI), un-
der the European Union’s Horizon Europe research
and innovation programme, in the frame of the NAT-
WORK project (Grant Agreement No 101139285).
REFERENCES
Amiri, Z., Heidari, A., Navimipour, N. J., and Unal,
M. (2023). Resilient and dependability manage-
ment in distributed environments: a systematic and
comprehensive literature review. Cluster Computing,
26(2):1565–1600. https://doi.org/10.1007/s10586-
022-03738-5.
Chandramouli, R. (2022). Implementation of devsecops for
a microservices-based application with service mesh.
Technical report, NIST.
Consul (2024). Enabling peering control plane traffic.
https://developer.hashicorp.com/consul/docs/connect/
gateways/mesh-gateway/peering-via-mesh-gateways.
de O. J
´
unior, R. S., da Silva, R. C. A., Santos, M. S., Albu-
querque, D. W., Almeida, H. O., and Santos, D. F. S.
(2022). An extensible and secure architecture based
on microservices. In 2022 IEEE International Confer-
ence on Consumer Electronics (ICCE), pages 01–02.
Fission (2024). Define correct resource request/lim-
its. https://fission.io/docs/installation/advanced-
setup/#define-correct-resource-requestlimits.
Furusawa, T., Abe, H., Okada, K., and Nakao, A. (2022).
Service mesh controller for cooperative load balanc-
ing among neighboring edge servers. In 2022 IEEE
International Symposium on Local and Metropolitan
Area Networks (LANMAN), pages 1–6.
Gabrielson, J. (2019). Challenges with distributed
systems. https://aws.amazon.com/builders-
library/challenges-with-distributed-
systems/#Distributed bugs are often latent.
Gattobigio, L., Thielemans, S., Benedetti, P., Reali, G.,
Braeken, A., and Steenhaut, K. (2022). A multi-cloud
service mesh approach applied to internet of things. In
IECON 2022 – 48th Annual Conference of the IEEE
Industrial Electronics Society, pages 1–6.
Karn, R. R., Das, R., Pant, D. R., Heikkonen, J., and Kanth,
R. (2022). Automated testing and resilience of mi-
croservice’s network-link using istio service mesh. In
2022 31st Conference of Open Innovations Associa-
tion (FRUCT), pages 79–88.
Kubernetes (2024). Multicluster services api.
https://multicluster.sigs.k8s.io/concepts/multicluster-
services-api.
Leach, B. (2017). Implementing stripe-like idempotency
keys in postgres. https://brandur.org/idempotency-
keys.
Mendonca, N. C., Aderaldo, C. M., Camara, J., and Gar-
lan, D. (2020). Model-based analysis of microservice
resiliency patterns. In 2020 IEEE International Con-
ference on Software Architecture (ICSA), pages 114–
124.
Microsoft (2022). Retry storm antipattern. https:
//learn.microsoft.com/en-us/azure/architecture/
antipatterns/retry-storm.
Netflix (2018). How netflix increased developer produc-
tivity and defeated the thundering herd with grpc.
https://www.cncf.io/case-studies/netflix.
Nottingham, M., Wilde, E., and Dalal, S. (2023). Problem
Details for HTTP APIs. RFC 9457. https://www.rfc-
editor.org/info/rfc9457.
Prokhorenko, V. and Ali Babar, M. (2020). Architectural
resilience in cloud, fog and edge systems: A survey.
IEEE Access, 8:28078–28095.
Saleh Sedghpour, M. R., Klein, C., and Tordsson, J. (2022).
An empirical study of service mesh traffic manage-
ment policies for microservices. In Proceedings of the
2022 ACM/SPEC on International Conference on Per-
formance Engineering, ICPE ’22, page 17–27, New
York, NY, USA. Association for Computing Machin-
ery.
Services, A. W. (2024). What is a service mesh?
https://aws.amazon.com/what-is/service-mesh/#seo-
faq-pairs#how-does-a-service-mesh-work.
Shahid, M. A., Islam, N., Alam, M. M., Mazliham, M., and
Musa, S. (2021). Towards resilient method: An ex-
haustive survey of fault tolerance methods in the cloud
computing environment. Computer Science Review,
40:100398. https://www.sciencedirect.com/science/
article/pii/S1574013721000381.
Idempotency in Service Mesh: For Resiliency of Fog-Native Applications in Multi-Domain Edge-to-Cloud Ecosystems
189