SQL Query Performance on Hadoop: An Analysis Focused on Large Databases of Brazilian Electronic Invoices
Cristiano Cortez da Rocha, Márcio Parise Boufleur, Leandro da Silva Fornasier, Júlio César Narciso, Andrea Schwertner Charão, Vinícius Maran, João Carlos D. Lima, Benhur O. Stein
2018
Abstract
Hadoop clusters have established themselves as a foundation for various applications and experiments in the field of high-performance processing of large datasets. In this context, SQL-on-Hadoop emerged as trend that combines the popularity of SQL with the performance of Hadoop. In this work, we analyze the performance of SQL queries on Hadoop, using the Impala engine, comparing it with a RDBMS-based approach. The analysis focuses on a large set of electronic invoice data, representing an important application to support fiscal audit operations. The experiments performed included frequent queries in this context, which were implemented with and without data partitioning in both RDBMS and Impala/Hadoop. The results show speedups from 2.7 to 14x with Impala/Hadoop for the queries considered, on a lower cost hardware/software platform.
DownloadPaper Citation
in Harvard Style
Cortez da Rocha C., Parise Boufleur M., da Silva Fornasier L., César Narciso J., Schwertner Charão A., Maran V., D. Lima J. and O. Stein B. (2018). SQL Query Performance on Hadoop: An Analysis Focused on Large Databases of Brazilian Electronic Invoices.In Proceedings of the 20th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-298-1, pages 29-37. DOI: 10.5220/0006690400290037
in Bibtex Style
@conference{iceis18,
author={Cristiano Cortez da Rocha and Márcio Parise Boufleur and Leandro da Silva Fornasier and Júlio César Narciso and Andrea Schwertner Charão and Vinícius Maran and João Carlos D. Lima and Benhur O. Stein},
title={SQL Query Performance on Hadoop: An Analysis Focused on Large Databases of Brazilian Electronic Invoices},
booktitle={Proceedings of the 20th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2018},
pages={29-37},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006690400290037},
isbn={978-989-758-298-1},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 20th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - SQL Query Performance on Hadoop: An Analysis Focused on Large Databases of Brazilian Electronic Invoices
SN - 978-989-758-298-1
AU - Cortez da Rocha C.
AU - Parise Boufleur M.
AU - da Silva Fornasier L.
AU - César Narciso J.
AU - Schwertner Charão A.
AU - Maran V.
AU - D. Lima J.
AU - O. Stein B.
PY - 2018
SP - 29
EP - 37
DO - 10.5220/0006690400290037