My Database User Is a Large Language Model

Eduardo R. Nascimento, Yenier T. Izquierdo, Grettel García, Gustavo Coelho, Lucas Feijó, Melissa Lemos, Luiz Leme, Marco Casanova, Marco Casanova

2024

Abstract

The leaderboards of familiar benchmarks indicate that the best text-to-SQL tools are based on Large Language Models (LLMs). However, when applied to real-world databases, the performance of LLM-based text-to-SQL tools is significantly less than that reported for these benchmarks. A closer analysis reveals that one of the problems lies in that the relational schema is an inappropriate specification of the database from the point of view of the LLM. In other words, the target user of the database specification is the LLM rather than a database programmer. This paper then argues that the text-to-SQL task can be significantly facilitated by providing a database specification based on the use of LLM-friendly views that are close to the language of the users’ questions and that eliminate frequently used joins, and LLM-friendly data descriptions of the database values. The paper first introduces a proof-of-concept implementation of three sets of LLM-friendly views over a relational database, whose design is inspired by a proprietary relational database, and a set of 100 Natural Language (NL) questions that mimic users’ questions. The paper then tests a text-to-SQL prompt strategy implemented with LangChain, using GPT-3.5 and GPT-4, over the sets of LLM-friendly views and data samples, as the LLM-friendly data descriptions. The results suggest that the specification of LLM-friendly views and the use of data samples, albeit not too difficult to implement over a real-world relational database, are sufficient to improve the accuracy of the prompt strategy considerably. The paper concludes by discussing the results obtained and suggesting further approaches to simplify the text-to-SQL task.

Download


Paper Citation


in Harvard Style

R. Nascimento E., T. Izquierdo Y., García G., Coelho G., Feijó L., Lemos M., Leme L. and Casanova M. (2024). My Database User Is a Large Language Model. In Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-692-7, SciTePress, pages 800-806. DOI: 10.5220/0012697700003690


in Bibtex Style

@conference{iceis24,
author={Eduardo R. Nascimento and Yenier T. Izquierdo and Grettel García and Gustavo Coelho and Lucas Feijó and Melissa Lemos and Luiz Leme and Marco Casanova},
title={My Database User Is a Large Language Model},
booktitle={Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2024},
pages={800-806},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012697700003690},
isbn={978-989-758-692-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - My Database User Is a Large Language Model
SN - 978-989-758-692-7
AU - R. Nascimento E.
AU - T. Izquierdo Y.
AU - García G.
AU - Coelho G.
AU - Feijó L.
AU - Lemos M.
AU - Leme L.
AU - Casanova M.
PY - 2024
SP - 800
EP - 806
DO - 10.5220/0012697700003690
PB - SciTePress