Salt Lake City, USA, March 1, 2014 and Hangzhou,
China, September 5, 2014, Revised Selected Papers,
pages 154–166. Springer International Publishing.
Costea, A. et al. (2016). VectorH: Taking SQL-on-Hadoop
to the next level. In Proceedings of the 2016 Interna-
tional Conference on Management of Data, SIGMOD
’16, pages 1105–1117, New York, NY, USA. ACM.
Dean, J. and Ghemawat, S. (2008). Mapreduce: Simpli-
fied data processing on large clusters. Commun. ACM,
51(1):107–113.
Earley, C. E. (2015). Data analytics in auditing: Opportu-
nities and challenges. Business Horizons, 58(5):493–
500.
Floratou, A., Minhas, U. F., and
¨
Ozcan, F. (2014). SQL-on-
Hadoop: Full circle back to shared-nothing database
architectures. Proceedings of the Very Large Data
Base Endowment Inc., 7(12):1295–1306.
IBGE (2012). Demographics of companies [online].
Available: http://biblioteca.ibge.gov.br/visualizacao/
livros/liv88028.pdf. [Accessed 20 October 2017].
Kim, G.-H., Trimi, S., and Chung, J.-H. (2014). Big-data
applications in the government sector. Communicati-
ons of the ACM, 57(3):78–85.
Kornacker, M., Behm, A., Bittorf, V., Bobrovytsky, T.,
Choi, A., Erickson, J., Grund, M., Hecht, D., Jacobs,
M., Joshi, I., Kuff, L., Kumar, D., Leblang, A., Li,
N., Robinson, H., Rorke, D., Rus, S., Russell, J., Tsi-
rogiannis, D., Wanderman-milne, S., and Yoder, M.
(2015). Impala: A modern, open-source SQL en-
gine for Hadoop. In Proceedings of the 7h Bien-
nial Conference on Innovative Data Systems Research
(CIDR’2015).
Melnik, S., Gubarev, A., Long, J. J., Romer, G., Shivaku-
mar, S., Tolton, M., and Vassilakis, T. (2010). Dre-
mel: Interactive analysis of web-scale datasets. In
Proceedings of the 36th International Conference on
Very Large Data Bases, pages 330–339.
Pannu, M., Gill, B., Tebb, W., and Yang, K. (2016). The
impact of big data on government processes. In Pro-
ceedings of the IEEE 7th Annual Information Techno-
logy, Electronics and Mobile Communication Confe-
rence (IEMCON), pages 1–5. IEEE.
Paula, E. L., Ladeira, M., Carvalho, R. N., and Marzag
˜
ao,
T. (2016). Deep learning anomaly detection as sup-
port fraud investigation in brazilian exports and anti-
money laundering. In Proceedings of the 15th IEEE
International Conference on Machine Learning and
Applications (ICMLA), pages 954–960. IEEE.
Rad, M. S. and Shahbahrami, A. (2016). Detecting high risk
taxpayers using data mining techniques. In Internati-
onal Conference of Signal Processing and Intelligent
Systems (ICSPIS), pages 1–5. IEEE.
RFB (2010). Sped: Public system of digital bookkeeping
[online]. Available: http://sped.rfb.gov.br. [Accessed
20 October 2017].
Shvachko, K., Kuang, H., Radia, S., and Chansler, R.
(2010). The hadoop distributed file system. In Pro-
ceedings of the 2010 IEEE 26th Symposium on Mass
Storage Systems and Technologies (MSST), MSST
’10, pages 1–10, Washington, DC, USA. IEEE Com-
puter Society.
Stonebraker, M., Abadi, D., DeWitt, D. J., Madden, S.,
Paulson, E., Pavlo, A., and Rasin, A. (2010). MapRe-
duce and parallel DBMSs: friends or foes? Commu-
nications of the ACM, 53(1):64–71.
Vavilapalli, V. K., Murthy, A. C., Douglas, C., Agarwal, S.,
Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H.,
Seth, S., Saha, B., Curino, C., O’Malley, O., Radia,
S., Reed, B., and Baldeschwieler, E. (2013). Apa-
che hadoop yarn: Yet another resource negotiator. In
Proceedings of the 4th Annual Symposium on Cloud
Computing, SOCC ’13, pages 5:1–5:16, New York,
NY, USA. ACM.
White, T. (2012). Hadoop: The definitive guide. ” O’Reilly
Media, Inc.”.
SQL Query Performance on Hadoop: An Analysis Focused on Large Databases of Brazilian Electronic Invoices
37