CloudWatcher is based on a two-tier architecture,
in which a set of Managers controls overlay networks
composed of Probe VMs spread among all the mon-
itored DCs. Periodically, each Manager collects data
on the status of the infrastructure interacting with its
Probes while performing Tasks, i.e., activities to col-
lect fault and performance data.
Both the types of Probe and the Tasks are managed
declaratively through a machines.json file and de-
signed to be easily customised and extended so as to
integrate personalised monitoring activities and met-
rics. Each Task employs a data collection function for
gathering data on the health of the monitored DCs, an
aggregation policy to generate a single global report
from the single Tasks’ outputs and possibly a set of
SLO thresholds, also defined declaratively.
As a use case, we deployed CloudWatcher on the
cloud of the Italian Research and Education Network
Consortium (GARR), consisting of three datacentres
scattered throughout the Italian territory. We em-
ployed a total of 3 overlay networks and 12 probes
per Manager. During the monitoring, we measured
the disk I/O performance, the latency and bandwidth
of the network and the behaviour of the Clouds while
executing a remote script and during the random dele-
tion and re-creation of the VMs. Additionally, a Web
Dashboard and alarm system using Telegram were de-
veloped. In our future work, we intend to:
• Data Analysis Pipeline. Design and imple-
ment a pipeline for the automatic production of
human-readable, insightful reports for the Cloud
administrators, based on the data available on the
database, highlighting the evolution over time and
the critical aspects of the monitored parameters.
• Large Scale Assessment. Deploy CloudWatcher
in a large-scale infrastructure for a long period to
assess its behaviour, also comparing it with other
tools. For example, in terms of the overhead pro-
duced by CloudWatcher, the reactivity of the mon-
itoring activities on spotting possible failures or
unusual measurements and how CloudWatcher re-
acts on cloud errors
• Cloud-Edge Applicability. Study the feasibil-
ity of design and develop an extension of Cloud-
Watcher suitable for a dynamic and very hetero-
geneous environment, e.g. Cloud-Edge comput-
ing. Such an extension should be also able to both
manage the scale of such infrastructures and the
mobility of Edge and IoT resources.
REFERENCES
Aceto, G., Botta, A., De Donato, W., and Pescap
`
e, A.
(2013). Cloud monitoring: A survey. Computer Net-
works, 57(9):2093–2115.
Alhamazani, K. et al. (2015). An overview of the com-
mercial cloud monitoring tools: research dimensions,
design issues, and state-of-the-art. Computing, 97(4).
Andreolini, M., Colajanni, M., and Pietri, M. (2012). A
scalable architecture for real-time monitoring of large
information systems. In 2012 2nd Symp. on Network
Cloud Computing and Applications, pages 143–150.
Bicaku, A., Balaban, S., Tauber, M. G., Hudic, A., Mauthe,
A., and Hutchison, D. (2016). Harmonized monitor-
ing for high assurance clouds. In 2016 IEEE IC2EW,
pages 118–123.
Bystrov, O., Pacevi
ˇ
c, R., and Ka
ˇ
ceniauskas, A. (2021).
Performance of communication- and computation-
intensive saas on the openstack cloud. Applied Sci-
ences, 11(16).
De Chaves, S. A., Uriarte, R. B., and Westphall, C. B.
(2011). Toward an architecture for monitoring private
clouds. IEEE Comm.Mag., 49(12):130–137.
Ding, J., Cao, R., Saravanan, I., Morris, N., and Stewart,
C. (2019). Characterizing service level objectives for
cloud services: Realities and myths. In 2019 IEEE
ICAC, pages 200–206.
Fatema, K., Emeakaroha, V. C., Healy, P. D., Morrison, J. P.,
and Lynn, T. (2014). A survey of cloud monitoring
tools: Taxonomy, capabilities and objectives. JPDC,
74(10):2918–2933.
Mancas¸, C. (2019). Performance analysis in private and
public cloud infrastructures. In RoEduNet, pages 1–6.
Miloji
ˇ
ci
´
c, D., Llorente, I. M., and Montero, R. S. (2011).
Opennebula: A cloud management tool. IEEE Inter-
net Computing, 15(2):11–14.
Moses, J., Iyer, R., Illikkal, R., Srinivasan, S., and Aisopos,
K. (2011). Shared resource monitoring and through-
put optimization in cloud-computing datacenters. In
2011 IEEE IPDPS, pages 1024–1033.
Nastic, S., Morichetta, A., Pusztai, T., Dustdar, S., Ding,
X., Vij, D., and Xiong, Y. (2020). Sloc: Service level
objectives for next generation cloud computing. IEEE
Internet Computing, 24(3):39–50.
Odun-Ayo, I., Ajayi, O., and Falade, A. (2018). Cloud
computing and quality of service: Issues and devel-
opments. In IMECS.
Pflanzner, T., Tornyai, R., Gibizer, B., Schmidt, A., and
Kertesz, A. (2016). Performance analysis of an open-
stack private cloud. SciTePress.
Uriarte, R. B. and Westphall, C. B. (2014). Panoptes: A
monitoring architecture and framework for supporting
autonomic clouds. In 2014 IEEE NOMS, pages 1–5.
Ward, J. S. and Barker, A. (2014). Observing the clouds: a
survey and taxonomy of cloud monitoring. Journal of
Cloud Computing, 3(1):1–30.
Zareian, S., Fokaefs, M., Khazaei, H., Litoiu, M., and
Zhang, X. (2016). A big data framework for cloud
monitoring. In Proc. BIGDSE ’16, page 58–64. ACM.
Customisable Fault and Performance Monitoring Across Multiple Clouds
219