loading
Documents

Research.Publish.Connect.

Paper

Authors: Aleksey Burdakov 1 ; Viktoria Proletarskaya 1 ; Andrey Ploutenko 2 ; Oleg Ermakov 1 and Uriy Grigorev 1

Affiliations: 1 Informatics and Control Systems, Bauman Moscow State Technical University, Moscow, Russia ; 2 Mathematics and Informatics, Amur State University, Blagoveschensk, Russia

ISBN: 978-989-758-426-8

Keyword(s): SQL, Apache Spark, Bloom Filter, TPC-H Test, Big Data, Cost Model.

Abstract: The paper proposes a cost model for predicting query execution time in a distributed parallel system requiring time estimation. The estimation is paramount for running a DaaS environment or building an optimal query execution plan. It represents a SQL query with nested stars. Each star includes dimension tables, a fact table, and a Bloom filter. Bloom filters can substantially reduce network traffic for the Shuffle phase and cut join time for the Reduce stage of query execution in Spark. We propose an algorithm for generating a query implementation program. The developed model was calibrated and its adequacy evaluated (50 points). The obtained coefficient of determination R2=0.966 demonstrates a good model accuracy even with non-precise intermediate table cardinalities. 77% of points for the modelling time over 10 seconds have modelling error Δ<30%. Theoretical model evaluation supports the modelling and experimental results for large databases.

PDF ImageFull Text

Download
CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.235.74.77

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Burdakov, A.; Proletarskaya, V.; Ploutenko, A.; Ermakov, O. and Grigorev, U. (2020). Predicting SQL Query Execution Time with a Cost Model for Spark Platform.In Proceedings of the 5th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS, ISBN 978-989-758-426-8, pages 279-287. DOI: 10.5220/0009396202790287

@conference{iotbds20,
author={Aleksey Burdakov. and Viktoria Proletarskaya. and Andrey Ploutenko. and Oleg Ermakov. and Uriy Grigorev.},
title={Predicting SQL Query Execution Time with a Cost Model for Spark Platform},
booktitle={Proceedings of the 5th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS,},
year={2020},
pages={279-287},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009396202790287},
isbn={978-989-758-426-8},
}

TY - CONF

JO - Proceedings of the 5th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS,
TI - Predicting SQL Query Execution Time with a Cost Model for Spark Platform
SN - 978-989-758-426-8
AU - Burdakov, A.
AU - Proletarskaya, V.
AU - Ploutenko, A.
AU - Ermakov, O.
AU - Grigorev, U.
PY - 2020
SP - 279
EP - 287
DO - 10.5220/0009396202790287

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.