6 CONCLUSIONS
Network faults are an unavoidable reality in any large-
scale complex distributed systems, as is often the case
for the infrastructure that supports cloud computing.
In this paper, an experimental evaluation of the im-
pact that network faults can have in a cloud comput-
ing system was performed. We focused on the CMP,
more specifically in OpenStack, due to its popularity
and due to being a complex distributed system where
the network plays an important role. Fault injection
of 3 different types of common network fault was per-
formed with the help of Sidekick.
The results show that different network links have
different importance in the impact experienced by the
applications hosted on the infrastructure. The results
show that network faults affecting the link between
the compute node and the storage node can cause
applications running on the infrastructure to fail to
provide correct service, even if the network faults
only lead to increased latency or reduced bandwidth.
These results serve as the basis for future work on the
development of fault tolerance mechanisms for CMPs
that increase its tolerance of network faults while car-
rying minimal cost and overhead. Furthermore, as fu-
ture work, we will carry out more experiments featur-
ing more complex setups and setups where autoscal-
ing is present, as to evaluate how network faults affect
these setups.
ACKNOWLEDGEMENTS
This work is funded by the FCT - Foundation for
Science and Technology, I.P./MCTES through na-
tional funds (PIDDAC), within the scope of CISUC
R&D Unit - UIDB/00326/2020 or project code
UIDP/00326/2020. This work is also supported by
Project Reference ECSEL/0017/2019 and 876852-
ECSEL-RIA-VALU3S, financed by Fundac¸
˜
ao para a
Ci
ˆ
encia e a Tecnologia, I.P./MCTES through national
funds (PIDDAC) and funding from the ECSEL Joint
Undertaking (JU) under grant agreement No 876852.
The JU receives support from the European Union’s
Horizon 2020 research and innovation programme
and Sweden, Italy, Spain, Portugal, Czech Republic,
Germany, Austria, Ireland, France and Turkey.
REFERENCES
Avi
ˇ
zienis, A., Laprie, J.-C., and Randell, B. (2004). De-
pendability and its threats: a taxonomy. In Building
the Information Society, pages 91–120. Springer.
AWS (2022). Amazon ec2 instance types, https://aws.ama
zon.com/ec2/instance-types/, access date:2022-09-30
Cassandra (2022). Apache cassandra, https://cassandra.
apache.org, access date: 2022-09-30.
Cerveira, F., Barbosa, R., Madeira, H., and Araujo, F.
(2015). Recovery for virtualized environments. In
2015 11th European Dependable Computing Confer-
ence (EDCC), pages 25–36. IEEE.
Cocozza, F., L
´
opez, G., Marın, G., Villal
´
on, R., and Arroyo,
F. (2015). Cloud management platform selection: A
case study in a university setting. Cloud Computing,
2015:92.
Cooper, B. F., Silberstein, A., Tam, E., Ramakrishnan, R.,
and Sears, R. (2010). Benchmarking cloud serving
systems with ycsb. In Proceedings of the 1st ACM
symposium on Cloud computing, pages 143–154.
Cotroneo, D., De Simone, L., Liguori, P., Natella, R., and
Bidokhti, N. (2019). Enhancing failure propagation
analysis in cloud computing systems. In 2019 IEEE
30th International Symposium on Software Reliability
Engineering (ISSRE), pages 139–150. IEEE.
Cotroneo, D., De Simone, L., and Natella, R. (2022).
Thorfi: a novel approach for network fault injection
as a service. Journal of Network and Computer Appli-
cations, 201:103334.
Dantas, J., Matos, R., Araujo, J., and Maciel, P. (2012).
Models for dependability analysis of cloud comput-
ing architectures for eucalyptus platform. Interna-
tional Transactions on Systems Science and Applica-
tions, 8(5):13–25.
Ju, X., Soares, L., Shin, K. G., Ryu, K. D., and Da Silva, D.
(2013). On fault resilience of openstack. In Proceed-
ings of the 4th annual Symposium on Cloud Comput-
ing, pages 1–16.
Kumari, P. and Kaur, P. (2021). A survey of fault tolerance
in cloud computing. Journal of King Saud University-
Computer and Information Sciences, 33(10):1159–
1176.
Lu, Y., Cheng, H., Ma, Y., and Wu, S. (2020). Research
on the technology of power unified cloud management
platform. In 2020 IEEE 9th Joint International Infor-
mation Technology and Artificial Intelligence Confer-
ence (ITAIC), volume 9, pages 770–773.
Natella, R., Cotroneo, D., and Madeira, H. S. (2016). As-
sessing dependability with software fault injection: A
survey. ACM Computing Surveys(CSUR),48(3):1–55
OpenStack (2022). Openstack- open source cloud comput-
ing platform software.
Pham, C., Wang, L., Tak, B. C., Baset, S., Tang, C., Kalbar-
czyk, Z., and Iyer, R. K. (2016). Failure diagnosis
for distributed systems using targeted fault injection.
IEEE Transactions on Parallel and Distributed Sys-
tems, 28(2):503–516.
Qi, Y., Fang, C., Liu, H., Kang, D., Lyu, B., Cheng, P.,
and Chen, J. (2021). A survey of cloud network fault
diagnostic systems and tools. Frontiers of Information
Technology and Electronic Engineering, 22(8):1031–
1045.
Network Failures in Cloud Management Platforms: A Study on OpenStack
235