Predicting SQL Query Execution Time with a Cost Model for Spark Platform
Aleksey Burdakov, Viktoria Proletarskaya, Andrey Ploutenko, Oleg Ermakov, Uriy Grigorev
2020
Abstract
The paper proposes a cost model for predicting query execution time in a distributed parallel system requiring time estimation. The estimation is paramount for running a DaaS environment or building an optimal query execution plan. It represents a SQL query with nested stars. Each star includes dimension tables, a fact table, and a Bloom filter. Bloom filters can substantially reduce network traffic for the Shuffle phase and cut join time for the Reduce stage of query execution in Spark. We propose an algorithm for generating a query implementation program. The developed model was calibrated and its adequacy evaluated (50 points). The obtained coefficient of determination R2=0.966 demonstrates a good model accuracy even with non-precise intermediate table cardinalities. 77% of points for the modelling time over 10 seconds have modelling error Δ<30%. Theoretical model evaluation supports the modelling and experimental results for large databases.
DownloadPaper Citation
in Harvard Style
Burdakov A., Proletarskaya V., Ploutenko A., Ermakov O. and Grigorev U. (2020). Predicting SQL Query Execution Time with a Cost Model for Spark Platform.In Proceedings of the 5th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS, ISBN 978-989-758-426-8, pages 279-287. DOI: 10.5220/0009396202790287
in Bibtex Style
@conference{iotbds20,
author={Aleksey Burdakov and Viktoria Proletarskaya and Andrey Ploutenko and Oleg Ermakov and Uriy Grigorev},
title={Predicting SQL Query Execution Time with a Cost Model for Spark Platform},
booktitle={Proceedings of the 5th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS,},
year={2020},
pages={279-287},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009396202790287},
isbn={978-989-758-426-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 5th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS,
TI - Predicting SQL Query Execution Time with a Cost Model for Spark Platform
SN - 978-989-758-426-8
AU - Burdakov A.
AU - Proletarskaya V.
AU - Ploutenko A.
AU - Ermakov O.
AU - Grigorev U.
PY - 2020
SP - 279
EP - 287
DO - 10.5220/0009396202790287