Authors:
Eduardo Nascimento
1
;
2
;
Grettel García
1
;
Lucas Feijó
1
;
Wendy Victorio
1
;
2
;
Yenier Izquierdo
1
;
Aiko R. de Oliveira
2
;
Gustavo Coelho
1
;
Melissa Lemos
2
;
1
;
Robinson Garcia
3
;
Luiz Leme
4
and
Marco Casanova
2
;
1
Affiliations:
1
Instituto Tecgraf, PUC-Rio, Rio de Janeiro, 22451-900, RJ, Brazil
;
2
Departamento de Informática, PUC-Rio, Rio de Janeiro, 22451-900, RJ, Brazil
;
3
Petrobras, Rio de Janeiro, 20031-912, RJ, Brazil
;
4
Instituto de Computação, UFF, Niterói, 24210-310, RJ, Brazil
Keyword(s):
Text-to-SQL, GPT, Large Language Models, Industrial Databases.
Abstract:
Text-to-SQL refers to the task defined as “ given a relational database D and a natural language sentence S that describes a question on D, generate an SQL query Q over D that expresses S”. Numerous tools have addressed this task with relative success over well-known benchmarks. Recently, several LLM-based text-to-SQL tools, that is, text-to-SQL tools that explore Large Language Models (LLMs), emerged that outperformed previous approaches. When adopted for industrial-size databases, with a large number of tables, columns, and foreign keys, the performance of LLM-based text-to-SQL tools is, however, significantly less than that reported for the benchmarks. This paper then investigates how a selected set of LLM-based text-to-SQL tools perform over two challenging databases, an openly available database, Mondial, and a proprietary industrial database. The paper also proposes a new LLM-based text-to-SQL tool that combines features from tools that performed well over the Spider and BIRD b
enchmarks. Then, the paper describes how the selected tools and the proposed tool, running under GPT-3.5 and GPT-4, perform over the Mondial and the industrial databases over a suite of 100 carefully defined natural language questions that are closely related to those observed in practice. It concludes with a discussion of the results obtained.
(More)