My Database User Is a Large Language Model

Eduardo R. Nascimento; Yenier T. Izquierdo; Grettel García; Gustavo Coelho; Lucas Feijó; Melissa Lemos; Luiz Leme; Marco Casanova; Marco Casanova

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

My Database User Is a Large Language Model

Topics: Deep Learning; Natural Language Interfaces to Intelligent Systems; Neural Network Software and Applications

In Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS, 800-806, 2024 , Angers, France

Authors: Eduardo R. Nascimento ¹ ; Yenier T. Izquierdo ¹ ; Grettel García ¹ ; Gustavo Coelho ¹ ; Lucas Feijó ¹ ; Melissa Lemos ¹ ; Luiz Leme ² and Marco Casanova ^{3

;

1}

Affiliations: ¹ Instituto Tecgraf, PUC-Rio, Rio de Janeiro, 22451-900, RJ, Brazil ; ² Instituto de Computação, UFF, Niterói, 24210-310, RJ, Brazil ; ³ Departamento de Informática, PUC-Rio, Rio de Janeiro, 22451-900, RJ, Brazil

Keyword(s): Text-to-SQL, GPT, Large Language Models, Relational Databases.

Abstract: The leaderboards of familiar benchmarks indicate that the best text-to-SQL tools are based on Large Language Models (LLMs). However, when applied to real-world databases, the performance of LLM-based text-to-SQL tools is significantly less than that reported for these benchmarks. A closer analysis reveals that one of the problems lies in that the relational schema is an inappropriate specification of the database from the point of view of the LLM. In other words, the target user of the database specification is the LLM rather than a database programmer. This paper then argues that the text-to-SQL task can be significantly facilitated by providing a database specification based on the use of LLM-friendly views that are close to the language of the users’ questions and that eliminate frequently used joins, and LLM-friendly data descriptions of the database values. The paper first introduces a proof-of-concept implementation of three sets of LLM-friendly views over a relational database , whose design is inspired by a proprietary relational database, and a set of 100 Natural Language (NL) questions that mimic users’ questions. The paper then tests a text-to-SQL prompt strategy implemented with LangChain, using GPT-3.5 and GPT-4, over the sets of LLM-friendly views and data samples, as the LLM-friendly data descriptions. The results suggest that the specification of LLM-friendly views and the use of data samples, albeit not too difficult to implement over a real-world relational database, are sufficient to improve the accuracy of the prompt strategy considerably. The paper concludes by discussing the results obtained and suggesting further approaches to simplify the text-to-SQL task. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 18.116.21.229

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

R. Nascimento, E.; T. Izquierdo, Y.; García, G.; Coelho, G.; Feijó, L.; Lemos, M.; Leme, L. and Casanova, M. (2024). My Database User Is a Large Language Model. In Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-692-7; ISSN 2184-4992, SciTePress, pages 800-806. DOI: 10.5220/0012697700003690

@conference{iceis24,
author={Eduardo {R. Nascimento}. and Yenier {T. Izquierdo}. and Grettel García. and Gustavo Coelho. and Lucas Feijó. and Melissa Lemos. and Luiz Leme. and Marco Casanova.},
title={My Database User Is a Large Language Model},
booktitle={Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2024},
pages={800-806},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012697700003690},
isbn={978-989-758-692-7},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - My Database User Is a Large Language Model
SN - 978-989-758-692-7
IS - 2184-4992
AU - R. Nascimento, E.
AU - T. Izquierdo, Y.
AU - García, G.
AU - Coelho, G.
AU - Feijó, L.
AU - Lemos, M.
AU - Leme, L.
AU - Casanova, M.
PY - 2024
SP - 800
EP - 806
DO - 10.5220/0012697700003690
PB - SciTePress