APOENA: Towards a Cloud Dimensioning Approach for Executing SQL-like Workloads Using Machine Learning and Provenance

Raslan Ribeiro, Rafaelli Coutinho, Daniel de Oliveira

2024

Abstract

Over the past decade, data production has accelerated at a fast pace, posing challenges in processing, querying, and analyzing huge volumes of data. Several platforms and frameworks have emerged to assist users in handling large-scale data processing through distributed and HPC environments, including clouds. Such platforms offer a plethora of cloud-based services for executing workloads efficiently in the cloud. Among these workloads are SQL-like queries, the focus of this paper. However, leveraging these platforms usually requires users to specify the type and number of virtual machines (VMs) to be deployed in the cloud. This task is not straightforward, even for expert users, as they must choose the VM type and number from several options available in a cloud provider’s catalog. Although autoscaling mechanisms can be available, non-expert users may find it challenging to configure them. To assist non-expert users in dimensioning the cloud environment for executing SQL-like workloads in such platforms, e.g., Databricks, this paper introduces a middleware named APOENA, which is designed to dimension the cloud for specific SQL-like workloads by collecting provenance data. These data are used to train Machine Learning (ML) models capable of predicting query performance for a particular combination of query characteristics and VM configuration.

Download


Paper Citation


in Harvard Style

Ribeiro R., Coutinho R. and de Oliveira D. (2024). APOENA: Towards a Cloud Dimensioning Approach for Executing SQL-like Workloads Using Machine Learning and Provenance. In Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-692-7, SciTePress, pages 289-296. DOI: 10.5220/0012633000003690


in Bibtex Style

@conference{iceis24,
author={Raslan Ribeiro and Rafaelli Coutinho and Daniel de Oliveira},
title={APOENA: Towards a Cloud Dimensioning Approach for Executing SQL-like Workloads Using Machine Learning and Provenance},
booktitle={Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2024},
pages={289-296},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012633000003690},
isbn={978-989-758-692-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - APOENA: Towards a Cloud Dimensioning Approach for Executing SQL-like Workloads Using Machine Learning and Provenance
SN - 978-989-758-692-7
AU - Ribeiro R.
AU - Coutinho R.
AU - de Oliveira D.
PY - 2024
SP - 289
EP - 296
DO - 10.5220/0012633000003690
PB - SciTePress