Figure 9: Dataproc-based infrastructure scalability evalua-
tion considering two types of virtual machines cluster hav-
ing different compute resources.
undertaken in the past. In this work, we studied the
Google cluster-traces v3 dataset, the latest of the
Google cluster traces, by analyzing its properties and
performing a workload characterization of the traces
using our proposed scalable infrastructure based on
Google Cloud Dataproc. We perform the workload
characterization on this dataset, focusing on the het-
erogeneity of the workload, the variations in job dura-
tions, aspects of resources consumption, and the over-
all availability of resources provided by the cluster.
Furthermore, we also show the scalability analysis of
the proposed infrastructure. The findings reported in
the paper will be beneficial for cloud infrastructure
providers and users while managing the cloud com-
puting resources, especially serverless platforms.
In the future, we will further analyze missing in-
sights of the workload traces using our scalable in-
frastructure to study properties such as page cache
memory, CPI, and CPU usage percentiles to provide
further insights regarding the workload.
ACKNOWLEDGEMENTS
This work was supported by the funding of the Ger-
man Federal Ministry of Education and Research
(BMBF) in the scope of the Software Campus pro-
gram. Google Cloud credits in this work were pro-
vided by the Google Cloud Research Credits program
with the award number NH93G06K20KDXH9U.
REFERENCES
Alam, M., Shakil, K. A., and Sethi, S. (2015). Analysis and
clustering of workload in google cluster trace based
on resource usage.
Carreira, J., Fonseca, P., Tumanov, A., Zhang, A., and Katz,
R. (2019). Cirrus: A serverless framework for end-to-
end ml workflows. In Proceedings of the ACM Sym-
posium on Cloud Computing, pages 13–24.
Chadha, M., Jindal, A., and Gerndt, M. (2021).
Architecture-specific performance optimization of
compute-intensive faas functions. In 2021 IEEE
14th International Conference on Cloud Computing
(CLOUD), pages 478–483.
Cortez, E., Bonde, A., Muzio, A., Russinovich, M., Fon-
toura, M., and Bianchini, R. (2017). Resource central:
Understanding and predicting workloads for improved
resource management in large cloud platforms. In
Proceedings of the 26th Symposium on Operating Sys-
tems Principles, SOSP ’17, page 153167, New York,
NY, USA. Association for Computing Machinery.
Elgamal, T., Sandur, A., Nahrstedt, K., and Agha, G.
(2018). Costless: Optimizing cost of serverless com-
puting through function fusion and placement. CoRR,
abs/1811.09721.
Espe., L., Jindal., A., Podolskiy., V., and Gerndt., M.
(2020). Performance evaluation of container run-
times. In Proceedings of the 10th International Con-
ference on Cloud Computing and Services Science -
CLOSER,, pages 273–281. INSTICC, SciTePress.
Fan., C., Jindal., A., and Gerndt., M. (2020). Microservices
vs serverless: A performance comparison on a cloud-
native web application. In Proceedings of the 10th In-
ternational Conference on Cloud Computing and Ser-
vices Science - CLOSER,, pages 204–215. INSTICC,
SciTePress.
Gao, J., Wang, H., and Shen, H. (2020). Machine learn-
ing based workload prediction in cloud computing.
In 2020 29th International Conference on Computer
Communications and Networks (ICCCN), pages 1–9.
GoogleCloud (2016a). Bigquery documentation. Technical
report.
GoogleCloud (2016b). What is dataproc? Google cloud
documentation. Posted at https://cloud.google.com/
dataproc/docs/concepts/overview.
Guo, J., Chang, Z., Wang, S., Ding, H., Feng, Y., Mao, L.,
and Bao, Y. (2019). Who limits the resource efficiency
of my datacenter: An analysis of alibaba datacenter
traces. In Proceedings of the International Symposium
on Quality of Service, IWQoS ’19, New York, NY,
USA. Association for Computing Machinery.
Hellerstein, J. L. (2010). Google cluster data. Google re-
search blog. Posted at http://googleresearch.blogspot.
com/2010/01/google-cluster-data.html.
Kunde, S. and Mukherjee, T. (2015). Workload characteri-
zation model for optimal resource allocation in cloud
middleware. In 4th SAC 15: Proceedings of the 30th
Annual ACM Symposium on Applied Computing, page
442447.
Minet, P., Renault, r., Khoufi, I., and Boumerdassi, S.
(2018). Analyzing traces from a google data cen-
ter. In 2018 14th International Wireless Communica-
tions Mobile Computing Conference (IWCMC), pages
1167–1172.
Pacheco-Sanchez, S., Casale, G., Scotney, B., McClean, S.,
Parr, G., and Dawson, S. (2011). Markovian work-
load characterization for qos prediction in the cloud.
In 2011 IEEE 4th International Conference on Cloud
Computing, pages 147–154.
CLOSER 2022 - 12th International Conference on Cloud Computing and Services Science
262