SQL Query Performance on Hadoop: An Analysis Focused on Large Databases of Brazilian Electronic Invoices

Cristiano Cortez da Rocha; Márcio Parise Boufleur; Leandro da Silva Fornasier; Júlio César Narciso; Andrea Schwertner Charão; Vinícius Maran; João Carlos D. Lima; Benhur O. Stein

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

SQL Query Performance on Hadoop: An Analysis Focused on Large Databases of Brazilian Electronic Invoices

Topics: Large Scale Databases; Performance Evaluation and Benchmarking

In Proceedings of the 20th International Conference on Enterprise Information Systems - Volume 1: ICEIS, 29-37, 2018 , Funchal, Madeira, Portugal

Authors: Cristiano Cortez da Rocha ¹ ; Márcio Parise Boufleur ¹ ; Leandro da Silva Fornasier ¹ ; Júlio César Narciso ² ; Andrea Schwertner Charão ³ ; Vinícius Maran ³ ; João Carlos D. Lima ³ and Benhur O. Stein ³

Affiliations: ¹ Centro de Informática e Automação do Estado de Santa Catarina (CIASC), Brazil ; ² Secretaria de Estado da Fazenda de Santa Catarina, Brazil ; ³ Universidade Federal de Santa Maria (UFSM), Brazil

Keyword(s): Large Database, Query Performance, Data Management, Business-critical Data.

Related Ontology Subjects/Areas/Topics: Data Engineering ; Databases and Data Security ; Databases and Information Systems Integration ; Enterprise Information Systems ; Large Scale Databases ; Performance Evaluation and Benchmarking

Abstract: Hadoop clusters have established themselves as a foundation for various applications and experiments in the field of high-performance processing of large datasets. In this context, SQL-on-Hadoop emerged as trend that combines the popularity of SQL with the performance of Hadoop. In this work, we analyze the performance of SQL queries on Hadoop, using the Impala engine, comparing it with a RDBMS-based approach. The analysis focuses on a large set of electronic invoice data, representing an important application to support fiscal audit operations. The experiments performed included frequent queries in this context, which were implemented with and without data partitioning in both RDBMS and Impala/Hadoop. The results show speedups from 2.7 to 14x with Impala/Hadoop for the queries considered, on a lower cost hardware/software platform.

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.59

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Cortez da Rocha, C., Parise Boufleur, M., da Silva Fornasier, L., César Narciso, J., Schwertner Charão, A., Maran, V., D. Lima, J. C. and O. Stein, B. (2018). SQL Query Performance on Hadoop: An Analysis Focused on Large Databases of Brazilian Electronic Invoices. In Proceedings of the 20th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-298-1; ISSN 2184-4992, SciTePress, pages 29-37. DOI: 10.5220/0006690400290037

@conference{iceis18,
author={Cristiano {Cortez da Rocha} and Márcio {Parise Boufleur} and Leandro {da Silva Fornasier} and Júlio {César Narciso} and Andrea {Schwertner Charão} and Vinícius Maran and João Carlos {D. Lima} and Benhur {O. Stein}},
title={SQL Query Performance on Hadoop: An Analysis Focused on Large Databases of Brazilian Electronic Invoices},
booktitle={Proceedings of the 20th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2018},
pages={29-37},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006690400290037},
isbn={978-989-758-298-1},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 20th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - SQL Query Performance on Hadoop: An Analysis Focused on Large Databases of Brazilian Electronic Invoices
SN - 978-989-758-298-1
IS - 2184-4992
AU - Cortez da Rocha, C.
AU - Parise Boufleur, M.
AU - da Silva Fornasier, L.
AU - César Narciso, J.
AU - Schwertner Charão, A.
AU - Maran, V.
AU - D. Lima, J.
AU - O. Stein, B.
PY - 2018
SP - 29
EP - 37
DO - 10.5220/0006690400290037
PB - SciTePress