Contrato360 2.0: A Document and Database-Driven Question-Answer

System Using Large Language Models and Agents

Antony Seabra

1,2 a

, Claudio Cavalcante

1,2 b

, Jo

ao Nepomuceno

1 c

, Lucas Lago

1 d

Nicolaas Ruberg

1 e

and Sergio Lifschitz

2 f

BNDES,

Area de Tecnologia da Informac¸

ao, Rio de Janeiro, Brazil

PUC-Rio, Departamento de Inform

atica, Rio de Janeiro, Brazil

Keywords:

Information Retrieval, Question Answer, Large Language Models, Documents, Databases, Prompt

Engineering, Retrieval Augmented Generation, Text-to-SQL.

Abstract:

We present a question-and-answer (Q&A) application designed to support the contract management process

by leveraging combined information from contract documents (PDFs) and data retrieved from contract man-

agement systems (database). This data is processed by a large language model (LLM) to provide precise and

relevant answers. The accuracy of these responses is further enhanced through the use of Retrieval-Augmented

Generation (RAG), text-to-SQL techniques, and agents that dynamically orchestrate the workﬂow. These tech-

niques eliminate the need to retrain the language model. Additionally, we employed Prompt Engineering to

ﬁne-tune the focus of responses. Our ﬁndings demonstrate that this multi-agent orchestration and combination

of techniques signiﬁcantly improve the relevance and accuracy of the answers, offering a promising direction

for future information systems.

1 INTRODUCTION

Contract management in large corporations involves

overseeing legally binding agreements from their ini-

tiation through to execution and ﬁnalization. This

process encompasses ensuring that services or prod-

ucts are delivered in accordance with contractual

terms, monitoring their execution, and continuously

evaluating both operational and ﬁnancial performance

throughout the service or product lifecycle. In the

case of public sector companies, this process becomes

even more complex due to stringent regulatory frame-

works. In Brazil, for instance, Law No. 14,133/2021

mandates that contract management includes a wide

range of activities, such as technical and adminis-

trative oversight, adherence to contract duration, re-

evaluation of economic and ﬁnancial terms, modiﬁca-

tions to service scope, and the enforcement of penal-

ties and ﬁnes when necessary. These regulations im-

pose an additional layer of complexity on the contract

management process, demanding a robust and sys-

https://orcid.org/0009-0007-9459-8216

https://orcid.org/0009-0007-6327-4083

https://orcid.org/0009-0004-5441-8426

https://orcid.org/0009-0001-4094-1978

https://orcid.org/0009-0005-4388-4656

https://orcid.org/0000-0003-3073-3734

tematic approach to ensure compliance and efﬁciency.

Beyond contract managers, dedicated organiza-

tional units are essential to support the contract man-

agement process, ensuring that the diverse range of

activities associated with contract execution is man-

aged efﬁciently. Often, these units require special-

ized knowledge to handle complex services effec-

tively. Notable examples include information and

communication technology (ICT) services, property

and asset management, and construction and engi-

neering projects, each of which demands a high level

of expertise. Additionally, these units typically rely

on Contract Management Systems (CMS) to stream-

line their operations. Public companies may either de-

velop these systems in-house or opt for widely-used

market solutions, such as SAP Contract Life-cycle

Management and IBM Emptoris Contract Manage-

ment, among others.

While these systems efﬁciently handle general

contract information, such as signatures, expiration

dates, payment terms, and contract agents, many spe-

ciﬁc details required to support effective management

activities remain accessible only through the original

documents. For instance, traditional Contract Man-

agement Systems (CMS) are often unable to respond

to inquiries concerning particular aspects of a con-

tract, such as penalties, discounts, or ﬁnes associated

with delays in service or product delivery. More-

Seabra, A., Cavalcante, C., Nepomuceno, J., Lago, L., Ruberg, N. and Lifschitz, S.

Contrato360 2.0: A Document and Database-Driven Question-Answer System Using Large Language Models and Agents.

DOI: 10.5220/0013070400003838

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2024) - Volume 1: KDIR, pages 167-178

ISBN: 978-989-758-716-0; ISSN: 2184-3228

167

over, they lack the capability to provide insights into

comparative characteristics across different contracts,

such as penalty clauses related to database support

agreements. These tasks are highly time-consuming.

The objective of this study is to provide a solu-

tion that aids contract managers in addressing queries

related to both contract documents and data housed

within traditional Contract Management Systems.

One of the key challenges faced by contract man-

agers is the time-consuming process of searching for

and retrieving relevant information from lengthy and

complex contract texts. To address this, we lever-

age state-of-the-art large-scale Language Modeling

(LLM) technologies to analyze and extract pertinent

details from contract documents efﬁciently. This not

only improves the accuracy of the information re-

trieved but also signiﬁcantly enhances the productiv-

ity of contract managers by reducing the manual effort

required to locate speciﬁc information. Additionally,

our approach integrates data from traditional Con-

tract Management Systems, ensuring that responses

are both relevant and comprehensive, thereby stream-

lining contract management activities.

In this work, we evaluated and integrated sev-

eral Natural Language Processing (NLP) techniques

to develop a Q&A system speciﬁcally designed for

IC contracts, using contract PDF ﬁles and data from

Contract Management Systems (CMS) as primary

data sources. To enhance the relevance of user

queries, prior work by (Seabra et al., 2024) employed

Retrieval-Augmented Generation (RAG) techniques

and a static approach to text-to-SQL for extracting

relevant metadata from contract systems. Building

upon this, our approach utilizes agents to dynamically

improve the accuracy and contextual relevance of re-

sponses, with a particular focus on a context-aware

text-to-SQL agent that interprets user queries more

effectively. Furthermore, similar to (Seabra et al.,

2024), we applied Prompt Engineering techniques to

standardize responses and ensure greater precision in

the answers provided.

One of the primary challenges in interpreting con-

tract documents lies in distinguishing between rele-

vance and similarity, a complexity that arises due to

the standardized formats and repetitive textual struc-

tures commonly found in these documents. This stan-

dardization is a challenge for LLMs because there is

a great deal of textual similarity, which does not nec-

essarily translate into relevance. Using a mix of NLP

techniques, we developed a solution that minimizes

the impact of standardization and provides relevant

answers. This approach made it possible to design

a solution without needing traditional ﬁne-tuning or

re-training of language models.

The paper is organized as follows: Section 2 pro-

vides technical background on LLMs, RAGs text-to-

SQL, agents, and prompt engineering. Section 3 dis-

cusses the methodology of the use of the presented

techniques, while Section 4 details the architecture of

our solution. Section 5 describes how we evaluated

the proposed solution and the experimentation of the

Q&A application. Finally, Section 6 concludes our

study and proposes directions for future research in

this ﬁeld.

2 BACKGROUND

The dissemination of several applications in the area

of Natural Language Processing (NLP) was made

possible by Large Scale Language Models (LLMs),

including question and answer (Q&A) systems. Re-

cently, the use of agents has been introduced as a cru-

cial component in LLM-based systems to orchestrate

and manage task execution dynamically. Agents, such

as router agents, SQL agents, and RAG agents, enable

the efﬁcient allocation of tasks by directing queries to

the most suitable processing modules, enhancing sys-

tem adaptability and performance. This approach al-

lows LLMs to better handle complex queries, making

responses more accurate and contextually relevant by

integrating external data sources and specialized pro-

cessing routines (Mialon et al., 2023).

2.1 Large Language Models

Large-scale Language Models (LLMs) have revolu-

tionized the ﬁeld of natural language processing with

their ability to understand and generate human-like

text. In their architecture, they utilize a speciﬁc neu-

ral network structure, Transformers, which allows the

model to weight the inﬂuence of different parts of the

input texts at different times (Vaswani et al., 2017).

Conversational applications, a speciﬁc use case

for LLMs, specialize in generating text that is co-

herent and contextualized. This is achieved through

training, in which the models are fed vast amounts

of conversational data, allowing them to learn the

nuances of dialogue (OpenAI, 2023a). In this way,

LLMs have established a new paradigm for NLP.

Moreover, by expanding the search space with exter-

nal data or specializing through ﬁne-tuning, LLMs

become platforms for building specialized applica-

tions. In this work, all language models utilized were

based on OpenAI’s GPT series. Speciﬁcally, we em-

ployed the text-davinci-002 model for generating em-

beddings and the gpt-4-turbo model for generating

answers to user queries.

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

168

2.2 Retrieval-Augmented Generation

(RAG)

According to (Chen et al., 2024), LLMs face signif-

icant challenges such as factual hallucination, out-

dated knowledge, and lack of domain-speciﬁc exper-

tise. In response to these challenges, RAG represents

a paradigm shift in the way LLMs process and gen-

erate text. The principle behind RAG involves us-

ing vector storage to retrieve text fragments similar

to the input query (Gao et al., 2023b). This tech-

nique converts both the query text and the information

database into high-dimensional vectors, allowing one

to retrieve similar information, which is then fed to an

LLM.

(Gao et al., 2023b) and (Feng et al., 2024) de-

scribe frameworks that exploit the advantages of this

technique by providing additional data to the LLM

without re-training the (Li et al., 2022) model. By di-

viding the available text into manageable chunks and

embedding these chunks in high-dimensional vector

spaces, it is possible to quickly retrieve contextually

relevant information in response to a query, which in-

forms the next processing steps. As shown in Figure

1, the ﬁrst step (1) involves reading the textual con-

tent of the PDF documents into manageable chunks

(chunks), which are then transformed (embedding) (2)

into high-dimensional vectors. The text in vector for-

mat captures the semantic properties of the text, a for-

mat that can have 1536 dimensions.

These embeddings vectors are stored in a vector-

store (3), a database specialized in high-dimensional

vectors. The vector store allows efﬁcient querying of

vectors through their similarities, using the distance

for comparison (whether Manhatan, Euclidean or co-

sine).

Once the similarity metric is established, the

query is embedded in the same vector space (4); this

allows a direct comparison between the vectorized

query and the vectors of the stored chunks, retriev-

ing the most similar chunks (5), which are then trans-

parently integrated into the LLM context to generate

a prompt (6). The prompt is then composed of the

question, the texts retrieved from the vectorstore, the

speciﬁc instructions and, optionally, the chat history,

all sent to the LLM which generates the ﬁnal response

(7).

In RAG, the chunking strategy is important be-

cause it directly inﬂuences the quality of the retrieved

information. A well-designed chunk generation en-

sures that the information is cohesive and semanti-

cally complete, capturing its essence.

A key aspect of RAG is the difference between

similarity and relevance. Similar passages may not

contain the information relevant to answering a query,

posing a challenge to accurately retrieve information,

especially in cases where data comes from multiple

documents with similar structure. In such contexts,

documents may share a high degree of structural and

lexical similarity, making it difﬁcult for retrieval algo-

rithms to distinguish between content that is merely

similar in form and content that is truly relevant to a

query.

2.3 Text-to-SQL

Text-to-SQL is a technology that enables the conver-

sion of natural language queries into SQL commands

based solely on the database schema, eliminating the

need for knowledge of the underlying data (Liu et al.,

2023). This approach leverages the capabilities of

LLMs to understand and interpret human language,

allowing users to retrieve data from databases through

plain text input without requiring specialized knowl-

edge of SQL syntax (Gao et al., 2023a).

By translating natural language into SQL queries,

text-to-SQL brings complex database structures and

end users closer together, making access more in-

tuitive and efﬁcient. This technique is particularly

useful because it allows non-expert users to access

databases by asking natural language queries. It im-

proves data accessibility, reduces the learning curve

associated with database querying, and speeds up data

analysis processes, enabling more users to make data-

driven decisions.

The main distinction between RAG and text-to-

SQL techniques lies in how information is retrieved.

RAG relies on retrieving text segments from a vec-

tor store that are similar to the user’s question, and

using these segments to generate a coherent and con-

textually relevant answer. This method is effective

for questions where the answer can be synthesized

from existing text. However, it is not always pos-

sible to identify the information expected as the an-

swer. In another aspect, text-to-SQL translates natu-

ral language queries into SQL commands, as demon-

strated in (Pinheiro et al., 2023), which are then exe-

cuted against a structured database to retrieve exact

data matches. This ensures that if the text-to-SQL

translation is accurate, the user will receive a highly

speciﬁc answer directly from the database ﬁelds.

Therefore, while RAG operates on the principle

of textual similarity and generative capabilities, text-

to-SQL offers a more intrusive mechanism for data

retrieval by executing queries that directly match the

user’s intent, making it particularly effective for data

investigations.

Contrato360 2.0: A Document and Database-Driven Question-Answer System Using Large Language Models and Agents

169

Figure 1: Retrieval-Augmented Generation.

2.4 Prompt Engineering

Prompt engineering is the art of designing and opti-

mizing prompts to guide LLMs in generating desired

outputs. The goal of prompt engineering is to maxi-

mize the potential of LLMs by providing them with

instructions and context (OpenAI, 2023b).

In the context of prompt engineering, prompts

are a fundamental part of the process. Through

prompts, engineers can outline the script for a re-

sponse, specifying the desired style and format for

the LLM response (White et al., 2023) (Giray, 2023).

For example, to deﬁne the style of a conversation, a

prompt could be formulated as ”Use professional lan-

guage and treat the customer with respect” or ”Use

informal language and emojis to convey a friendly

tone.” To specify the format of dates in responses, a

prompt instruction could be ”Use the American for-

mat, MM/DD/YYYY, for all dates.”

On the other hand, context refers to the informa-

tion provided to LLMs along with the main prompts.

The most important aspect of context is that it can pro-

vide additional information to support the response

given by the LLM, which is very useful when im-

plementing Q&A systems. This supplemental context

can include relevant background details, speciﬁc ex-

amples, and even previous dialogue exchanges, which

collectively help the model generate more accurate,

detailed, and contextually appropriate responses. Ac-

cording to (Wang et al., 2023), prompts provide guid-

ance to ensure that the model generates responses that

are aligned with the user’s intent. As a result, well-

crafted prompts signiﬁcantly improve the effective-

ness and appropriateness of responses.

Recent studies have begun to explore the syner-

gistic integration of these techniques with LLMs to

create more sophisticated Q&A systems. For exam-

ple, (Jeong, 2023) reinforces the importance of us-

ing Prompt Engineering with RAG to improve the re-

trieval of relevant documents, which are then used to

generate both contextually relevant and information-

rich answers. Similarly, (Gao et al., 2023a) ex-

plores the integration of text-to-SQL with Prompt En-

gineering to enhance the model’s ability to interact

directly with relational databases, thereby expanding

the scope of queries that can be answered accurately.

2.5 Agents

The use of agents in applications built around Large

Language Models (LLMs) is relatively recent but has

already became common. Agents act as intelligent in-

termediaries that route, process, and present informa-

tion in ways tailored to the context of the query. These

agents leverage recent advancements in AI, such as

Retrieval-Augmented Generation (RAG) and tool uti-

lization, to perform more complex and contextually

aware tasks (Lewis et al., 2020). They play a piv-

otal role in orchestrating complex tasks, integrating

various data sources, and ensuring that the system re-

sponds accurately and efﬁciently to user queries.

In a complex LLM-based system, different tasks

often require specialized handling. Agents enable

task orchestration by directing queries to the most ap-

propriate component, whether it’s for retrieving data,

performing calculations, or generating visualizations.

For example, an application may have a Text-to-SQL

agent to perform queries over a relational database

and a Graph agent to visualize graphs after an answer,

if appropriate. According to (Jin et al., 2024), apply-

ing LLMs to text-to-database management and query

optimization is also a novel research direction in nat-

ural language to code generation task. By converting

natural language queries into SQL statements, LLMs

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

170

help developers quickly generate efﬁcient database

query code. In the realm of integrating heteroge-

neous data sources, Q&A applications often need to

access data from documents, databases, APIs, and

other repositories. Agents facilitate the seamless in-

tegration of these heterogeneous data sources, allow-

ing the system to extract relevant information dynam-

ically.

There are several agent types. As outlined in

(Singh et al., 2024), agent workﬂows allow LLMs

to operate more dynamically by incorporating spe-

cialized agents that manage task routing, execution,

and optimization. These agents serve as intelligent

intermediaries, directing speciﬁc tasks—such as data

retrieval, reasoning, or response generation—to the

most suitable components within the system. One

of the most important ones in place are the Router

Agents, as they are the decision-makers of the sys-

tem. When a user poses a query, the router agent an-

alyzes the input and decides the best path forward.

For instance, if a query is identiﬁed as needing fac-

tual data, the router agent might direct it to a RAG

model. If the question involves speciﬁc data retrieval

from a database, it will engage an SQL agent instead.

As mentioned before, RAG and SQL Agents

are very relevant too. According to (Saeed et al.,

2023), SQL agents can effectively manage data re-

trieval tasks by leveraging LLMs. The SQL queries

are transformed into prompts for LLMs, allowing

the system to interact with unstructured data stored

in the model, mimicking traditional database opera-

tions. (Fan et al., 2024) provides a comprehensive

overview of the integration of RAG techniques in

LLMs but moreover, (Wang et al., 2024) introduces a

novel approach that combines RAG techniques with a

drafting-veriﬁcation process to improve the reasoning

capabilities of LLMs when handling retrieved docu-

ments. The RAG agent, termed the ”drafter,” gener-

ates multiple answer drafts based on retrieved results,

while a larger generalist LLM, the ”veriﬁer,” assesses

these drafts and selects the most accurate one. This

approach effectively integrates retrieval and genera-

tion, enhancing the overall performance of LLMs in

knowledge-intensive tasks such as question answer-

ing and information retrieval systems.

3 METHODOLOGY

To address the challenges faced by contract managers

in terms of complex information retrieval, we pro-

pose Contrato360, a Q&A system supported by an

LLM and orchestrated by agents. The system em-

ploys a range of techniques designed to enhance the

relevance of responses while mitigating the risks asso-

ciated with the standardized textual structures of con-

tracts.

To achieve this goal of increasing the relevance

of the responses obtained by Contrato360, we com-

bined four techniques: 1) Retrieval-Augmented Gen-

eration (RAG) to increase the relevance of informa-

tion about contracts contained in PDF documents; 2)

Agents to orchestrate and route the ﬂow of execu-

tion, enabling the dynamic selection of the most ap-

propriate approach for each query context; 3) Text-to-

SQL agent to retrieve the relevant information from

contract systems; 4) Prompt Engineering techniques

to standardize and ensure greater accuracy in the re-

sponses produced.

3.1 Applying RAG

One of the ﬁrst decisions to be made is to choose

the best strategy to segment the document, that is,

how to perform the chunking of the PDF ﬁles. A

common chunking strategy involves segmenting doc-

uments based on a speciﬁc number of tokens and an

overlap (overlap). This is useful when dealing with

sequential texts where it is important to maintain the

continuity of the context between the chunks.

Contracts have a standardized textual structure,

organized into contractual sections. Therefore, sec-

tions with the same numbering or in the same vicinity

describe the same contractual aspect, that is, they have

similar semantics. For example, in the ﬁrst section of

contract documents, we always ﬁnd the object of the

contract. In this scenario, we can assume that the best

chunking strategy is to separate the chunks by section

of the document. In this case, the overlap between

the chunks occurs by section, since the questions will

be answered by information contained in the section

itself or in previous or subsequent sections. For the

contract page in the example in Figure 3, we would

have a chunk for the section on the object of the con-

tract, another chunk for the section on the term of the

contract, that is, a chunk for each clause of the con-

tract and its surroundings. This approach ensures that

each snippet represents a semantic unit, making re-

trievals more accurate and aligned with queries.

Having the contract section as the limit of the

chunks improves the relevance of the responses within

a single contract. However, when increasing the num-

ber of contracts that the Contract360 intends to re-

spond to, we observe the problem in correctly deter-

mining the contract to be treated. In the following

example, we detail this aspect:

Consider the contract documents shown in Fig-

ure 3. showcases two service contracts be-

Contrato360 2.0: A Document and Database-Driven Question-Answer System Using Large Language Models and Agents

171

Figure 2: Methodology Workﬂow Combining Different Techniques.

tween BNDES (Banco Nacional de Desenvolvimento

Econ

omico e Social) and companies (Oracle do Brasil

Sistemas Ltda. and IBM Brasil Ind

ustria M

aquinas e

Servic¸os Ltda.), highlighting key clauses relevant to

the provision of technical support and software up-

dates. The contracts are presented in Portuguese, re-

ﬂecting the original legal terms and speciﬁc obliga-

tions of each party. For instance, the contract with

Oracle (Contract No. 278/2023) details the provision

of services for Oracle Database and associated tech-

nologies, emphasizing software support and entitle-

ment to updates. Similarly, the contract with IBM

(Contract No. 159/2021) focuses on support services

related to IBM Content Management software. The

"CL

AUSULA PRIMEIRA - OBJETO" (ﬁrst clause - ob-

ject) details the object of the contract and a frequently

asked question is: ”What is the object of contract

OCS 278/2023?”. In this example, the RAG will

store vectors containing the sections of both contracts,

since this clause is common to both. However, when

we inspect what is expressed in the chunk, its content

does not contain the contract number, Figure 3. Thus,

with great probability, a query about a speciﬁc con-

tract may return a segment (chunk) unrelated to the

contract, for example OCS 159/2021, being retrieved

instead of the contract we want. In the case of our ex-

ample, the chunk referring to the question that should

be returned is related to contract OCS 278/2023.

To overcome this issue, it is necessary to add se-

mantics to the chunks, by including document meta-

data. And when accessing the vectorstore, use this

metadata to ﬁlter the information returned. In this

way, we improve the relevance of the retrieved texts.

Figure 4 displays the most relevant metadata for the

contracts (source, contract and clause). Where source

is the name of the contract PDF ﬁle), contract is the

OCS number and clause is the section title. Thus,

for the question ”What is the object of contract OCS

278/2023?”, the chunks of contract OCS 278/2023

are retrieved and then the similarity calculation is ap-

plied, retrieving the text segments to be sent to the

LLM.

3.2 Applying Text-to-SQL

Contracts are dynamic and undergo several events like

operational changes and management adjustments

throughout their life-cycle. To deal with this com-

plexity, organizations use contract monitoring sys-

tems, such as SAP Contract Life-cycle Management

and IBM Emptoris Contract Management. These sys-

tems control several aspects, such as the technical per-

son responsible for the contract, changes in the con-

tractor’s representative, and the end of the provision

of services. During the contract term, these events can

occur and signiﬁcantly affect contract management.

The Contract360 retrieves those events from the

Contract Management System (CMS) and incorpo-

rates them so the LLM can provide relevant responses

to the user. Therefore, a text-to-SQL technique was

natural to implement the reasoning and decision-

making task (Yao et al., 2023) to obtain relevant re-

sponses from the CMS database to the contract man-

agers.

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

172

Figure 3: Chunking applied to Contracts.

Figure 4: Contracts metadata.

The LangChain SQL Agent (Langchain, 2024)

has proven to be a highly ﬂexible tool for interact-

ing with the CMS database. Upon system startup,

our SQL agent establishes an authenticated connec-

tion to the database and retrieves the schema. When

it receives a user question, it performs Entity Recog-

nition, maps those entities to the database tables and

columns, and prepares the SQL statement.

Ensuring the safety of our SQL agent is central.

We validate each generated query to ensure it does

not contain harmful commands, such as ’UPDATES,’

’DROP TABLE,’ ’INSERT,’ or any other command

that can alter the database, providing a sense of secu-

rity about the system’s integrity.

Finally, the output generated from the executed

SQL statement goes to a prompt generation stage for

further analysis of the LLM.

3.3 Applying Prompt Engineering

The prompt engineering technique provides a pattern

for the style of responses and the reuse of the solution

when accessing the LLM, as it provides instructions

and context. Based on these observations, instruc-

tions were developed in the application to improve the

responses. The instructions include basic guidelines,

such as ”Do not use prior knowledge”, which ensures

that the responses are based only on vectorstore con-

tracts, and speciﬁc instructions, such as ”Whenever

you answer a question about a contract, provide the

OCS number.” Thus, the question ”Do we have an

Oracle Support contract?” would have as a possible

answer ”Yes, we have an Oracle Database Support

contract. The OCS number is 278/2023.”.

Maintenance and guidelines on how to use the

chat context were also applied to ensure uniformity

Contrato360 2.0: A Document and Database-Driven Question-Answer System Using Large Language Models and Agents

173

and coherence. For example, we inform the expected

style for responses: ”You should use a formal and

objective tone.”, determining the style of LLM re-

sponses. Another guideline instructs LLM: ”Given

the chat history and the question asked, construct the

response completely, without the user needing to re-

view the history”.

Finally, the context passed to the LLM can be use-

ful to establish the style of the answers according to

the role of the user of the Q&A system. In the case

of Contract360, we have three roles: 1) contract man-

ager; 2) contract management support; and 3) man-

ager of the contract management support unit. For

each of these roles a speciﬁc context was deﬁned, for

example for role 3 we have: ”You are an assistant

specialized in answering questions about administra-

tive contracts, who provides management and sum-

marized information about the contracts.”

With these three techniques we obtained more rel-

evant answers. In the following section, we detail the

implementation and the components used in the de-

velopment of the system.

3.4 Applying Agents

In Contrato360, Agents play a pivotal role in orches-

trating the ﬂow of execution and enhancing the overall

efﬁciency of the question-and-answer process. Also,

considering the workﬂow on understanting the user

query, an agent approach is a clever choice to imple-

ment this several specialized activities that needs to be

taken in building the correct answer for the user. We

designed three agents to implement this workﬂow.

As shown in ﬁgure 2, the Router Agent is cen-

tral to its architecture, acting as the primary decision-

making entity that orchestrates the ﬂow of tasks

needed to answer a user’s question. The ”Router

Agent” decides if the user’s question is related to

the Contract Manager domain, e.g., ”How are you?”,

”Will Bologna FC win the 2025 Champions League?”

or ”Who is the contract manager for the Database sup-

port?”. An out-of-topic question is redirected to the

LLM with a context limiting its role to the domain of

contract management. In A question on the contract

domain will follow our workﬂow to ﬁnd a relevant

answer.

In the sequel, the Router Agent sends the user

question to two specialized agents: a) SQL agent and

b) RAG agent. The RAG agent retrieves from the vec-

torstore chunks of documents similar to the user ques-

tion. In parallel, a SQL agent retrieves form the CMS

database content related to the user question. This ar-

chitectural choice proved to be robust in the reports of

the contract managers, as it semantically enriches the

contract information, as shown in Figure 1.

One of the specialized agents in Contrato360 is the

RAG (Retrieval Augmented Generation) Agent, re-

sponsible for retrieving relevant information from the

contracts vectorstore. When directed by the Router

Agent, the RAG Agent searches for similar data

chunks that can help contextualize the question. An-

other specialized component is the SQL Agent, which

handles queries requiring structured data extraction

from the contracts database. Upon receiving routing

instructions from the Router Agent, the SQL Agent

executes SQL queries to retrieve speciﬁc data points

relevant to the user’s question.

With all textual and information retrieve, another

”Router Agent” craft an answer. If needed to add

an visual information, the Graph Agent and LLM

Answer Generation Agent add further depth to Con-

trato360’s response capabilities. The Graph Agent is

tasked with creating visual representations, such as

charts, when the Router Agent determines that a vi-

sual answer would better serve the user’s needs. This

agent ensures that complex data can be conveyed in a

clear and understandable format, enhancing user com-

prehension. Meanwhile, the LLM Answer Genera-

tion Agent works closely with the prompt generation

module to produce coherent and contextually relevant

textual responses. Together, these agents provide a

multi-faceted approach to answering questions, com-

bining data retrieval, visualization, and language gen-

eration to deliver comprehensive solutions.

4 ARCHITECTURE

The architecture of the Contrato360 application illus-

trates a comprehensive system designed to facilitate a

question-answering application that integrates Large

Language Models (LLMs), document processing, and

databases. The architecture consists of three main lay-

ers: the User Interface Layer, the Backend Layer, and

the Language Model Integration Layer, each playing

its role in delivering accurate and context-aware re-

sponses to users.

The User Interface Layer is represented by the

User Interface (Streamlit), which serves as the front-

end of the application. This layer provides an interac-

tive platform where users can input their queries and

view the responses generated by the system. The in-

terface directly communicates with the backend layer,

sending user inputs for processing and displaying the

responses generated by the various integrated compo-

nents.

At the heart of the system lies the Backend Layer,

which is primarily managed by the Backend Agents

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

174

Figure 5: Application architecture.

(Python and Langchain). This layer orchestrates in-

teractions between the document processing, vector

storage, contracts database, and the language model

integration layer. The backend layer leverages Python

and Langchain to handle the logic, task execution,

and chat functionalities, particularly through Ope-

nAI’s chat models. It processes user inputs received

from the interface and interacts with both the Con-

tracts Database and Vectorstore (ChromaDb) to re-

trieve relevant information necessary for formulating

comprehensive answers.

Within the backend layer, the Contracts Database

(SQLite) serves as the structured data source, storing

structured information related to contracts. This com-

ponent allows the system to handle contract-related

questions by processing SQL queries generated by the

backend agents. The contracts database responds to

these queries with relevant data, which is then used to

construct natural language responses for the user.

The Vectorstore (ChromaDb) is another vital com-

ponent of the backend layer, acting as a storage so-

lution for vectorized data, including document em-

beddings. It plays a key role in efﬁcient similar-

ity searches and retrieval tasks, enhancing the sys-

tem’s ability to provide context-aware responses.

The backend agents utilize the Vectorstore to match

user queries against stored embeddings, enabling ad-

vanced semantic search capabilities. This compo-

nent also stores embeddings generated from docu-

ment processing, ensuring that data is readily avail-

able for future query matching.

The Language Model Integration Layer is respon-

sible for transforming and embedding data for use

within the system. This layer includes the PDF Docu-

ments Processing module, which ingests and prepro-

cesses documents, particularly PDFs, to make them

suitable for use within the application. This step in-

volves reading and extracting text and relevant meta-

data, preparing the content for the next stages of

processing. The Chunking and Metadata Generation

component further reﬁnes the documents by dividing

them into manageable chunks and generating meta-

data that improves retrieval efﬁciency, ensuring that

the data is optimally split for better embedding gener-

ation and response times.

The ﬁnal stage of the language model integra-

tion layer is the Embeddings Generation module,

which converts the chunked documents and meta-

data into vector embeddings using LLM-based mod-

els like OpenAI Embeddings. These embeddings cap-

ture the semantic nuances of the text, facilitating ef-

ﬁcient search and retrieval tasks within the system.

Once generated, these embeddings are stored in the

Vectorstore (ChromaDb), where they can be accessed

for matching against user queries.

The overall workﬂow begins when a user inputs

a question through the User Interface Layer, initi-

ating a sequence of processes across the backend

and language model integration layers. The back-

end agents handle query processing, interacting with

the Contracts Database for SQL queries and perform-

ing semantic searches using embeddings from the

Vectorstore. The document processing involves pre-

processing PDFs, chunking the content, and generat-

ing embeddings that are then stored for efﬁcient re-

trieval. The backend agents combine data retrieved

from the contracts database and the Vectorstore to

generate a coherent response, which is then presented

back to the user through the User Interface Layer.

This architecture effectively combines the User

Interface Layer, Backend Layer, and Language Model

Integration Layer, enabling Contrato360 to function

Contrato360 2.0: A Document and Database-Driven Question-Answer System Using Large Language Models and Agents

175

as a robust and powerful application for answering

questions based on complex data sources. The seam-

less integration of multiple technologies ensures that

users receive accurate and contextually relevant re-

sponses, enhancing the overall functionality and us-

ability of the system.

5 EVALUATION

The experiment to validate the application was con-

ducted by two IT contract specialists from BNDES.

The system was prepared with 75 contracts (PDFs and

data from the contract system). And to validate the

relevance of the answers, benchmark questions were

prepared, from two distinct groups: ”direct” and ”in-

direct” questions. ”Direct” questions are those that

can be answered through the PDFs and their meta-

data. ”Indirect” questions are those that obtain better

relevance when searched in the contract system data.

In Tables 1 and 2 we present the users’ perception of

the quality of the answers. In the evaluation, the rel-

evance of the answers was categorized as ”Correct”

and ”Incomplete”.

We can observe that for the ”direct” questions the

system presents relevant answers for all experiments.

However, in the ”indirect” questions, despite being

satisfactory, the results in one speciﬁc question were

limited and incomplete. In our evaluation, these ques-

tions require a more elaborate semantic evaluation. In

the ﬁrst case, we realized that the concept of ”Waiver

of Bidding” was not well captured. We believe that

an adjustment in the queries and/or in the prompt can

add this type of semantics.

Table 1: Direct Questions.

Question Correct Incomplete

What is the subject of the OCS

nnn/yy contract?

10 -

Do we have any contract whose

subject is xxxx?

9 1

Do we have any contract with the

supplier xxx?

10 -

Who is the manager of the OCS

nnn/yy contract?

10 -

Who is the supplier of the nnn/yy

contract?

10 -

What is the term of the OCS nnn/yy

contract?

10 -

A key aspect observed from the users is the so-

lution’s capability to combine answers from both the

structured data store and the contract’s texts. This in-

A third category would be ”Incorrect”, but this option

was not obtained in any of the questions.

Table 2: Indirect Questions.

Question Correct Incomplete

How many active IT contracts do we cur-

rently have?

10 -

List the contracts that will end in the year

yy?

10 -

How many contracts do we have with

supplier xxxx?

10 -

How many contracts have we signed due

to inﬂexibility?

9 1

How many DLs (Exemptions from Ten-

ders) were contracted in yy?

- 10

Who are the managers of the contracts

we have with company xxxx?

8 2

How many contracts does employee

xxxx have under his/her management?

8 2

Show a summary of contract nnn/yy. 10 -

tegration is perceived as a signiﬁcant time-saving fea-

ture, as users typically need to locate the relevant con-

tracts, open the respective PDFs, and manually search

for additional information. The example below illus-

trates this. It identiﬁes contract managers and out-

lines the penalties associated with contractual non-

compliance. The system’s ability to deliver precise,

context-relevant answers from contracts highlights its

effectiveness in reducing manual search efforts for

users.

Figure 6: Contracts Q&A Streamlit application.

In fact, by directly addressing questions with spe-

ciﬁc details, the system saves time and improves the

user experience, as users can quickly access criti-

cal information without sifting through lengthy doc-

uments. Finally, the system’s ability to automatically

generate graphs using its Plotly agent, when a table of

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

176

values is included in the response, has been positively

received by users. This feature not only provides im-

mediate visual insights, enhancing the understanding

of the data, but also supports users in creating pro-

fessional presentations. The integration of dynamic

graph generation into the query response process sig-

niﬁcantly enriches the user experience, allowing for a

more comprehensive analysis and efﬁcient communi-

cation of contract-related information.

Figure 7: Plotly Agent.

6 CONCLUSIONS

We developed a Q&A application in the domain of

service and product contracts, using PDF contracts

and data from the Contract Management System as

information sources. In this development, we em-

ployed four techniques to improve the relevance of the

answers: 1) Augmented Retrieval (RAG) combined

with semantic augmentation using metadata to re-

trieve information from PDFs; 2) Text-to-SQL, aggre-

gating dynamic information from the contracts made

available in the Contract Management System; 3)

Prompt Engineering to contextualize, instruct and di-

rect the answers produced by the LLM; and 4) Agents

to call the most appropriate approach depending on

query context and determining the ﬂow of execution

of tasks in the system.

The 8 demonstrates the ability of Contrato360 in

retrieving and summarizing contract information re-

lated to Oracle through a question-and-answer inter-

face. When asked if there is a contract with Ora-

cle, the system efﬁciently identiﬁes the relevant con-

Figure 8: Contract Summarization.

tract, numbered 0278/2023, and provides a concise

summary of its key details stored in the database.

The summarized information includes the contract’s

object, which covers technical support and software

upgrades for Oracle’s Database Management System

(DBMS), details about the contract manager, supplier,

total value, validity dates, and the current situation.

This functionality highlights the system’s ability to

streamline access to speciﬁc contract data, facilitating

quick and accurate information retrieval for users by

directly interacting with the database through natural

language queries

In our experiment, we addressed an initial set of

questions that were able to produce a robust system

that meets current user needs. However, exploring

other questions in depth will allow us to enrich the

metadata and the set of queries that extract informa-

tion from traditional systems.

Finally, to consolidate the techniques developed to

address our application, we envision that building a

system in a different problem domain may shed light

on limitations and the possible need for reﬁnement

or adaptation. Such future exploration will not only

reinforce conﬁdence in the implementation of these

techniques in real-world scenarios, but also pave the

way for their optimization and possible customization

for speciﬁc domains, ultimately increasing the utility

and impact of LLMs in enterprise applications.

Contrato360 2.0: A Document and Database-Driven Question-Answer System Using Large Language Models and Agents

177

REFERENCES

Chen, J., Lin, H., Han, X., and Sun, L. (2024). Benchmark-

ing large language models in retrieval-augmented gen-

eration. In Proceedings of the AAAI Conference on Ar-

tiﬁcial Intelligence, volume 38, pages 17754–17762.

Fan, W., Ding, Y., Ning, L., Wang, S., Li, H., Yin, D.,

Chua, T.-S., and Li, Q. (2024). A survey on rag meet-

ing llms: Towards retrieval-augmented large language

models. In Proceedings of the 30th ACM SIGKDD

Conference on Knowledge Discovery and Data Min-

ing, pages 6491–6501.

Feng, Z., Feng, X., Zhao, D., Yang, M., and Qin, B.

(2024). Retrieval-generation synergy augmented large

language models. In ICASSP 2024-2024 IEEE Inter-

national Conference on Acoustics, Speech and Signal

Processing (ICASSP), pages 11661–11665. IEEE.

Gao, D., Wang, H., Li, Y., Sun, X., Qian, Y., Ding, B.,

and Zhou, J. (2023a). Text-to-sql empowered by large

language models: A benchmark evaluation. arXiv

preprint arXiv:2308.15363.

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y.,

Sun, J., and Wang, H. (2023b). Retrieval-augmented

generation for large language models: A survey. arXiv

preprint arXiv:2312.10997.

Giray, L. (2023). Prompt engineering with chatgpt: a guide

for academic writers. Annals of biomedical engineer-

ing, 51(12):2629–2633.

Jeong, C. (2023). A study on the implementation

of generative ai services using an enterprise data-

based llm application architecture. arXiv preprint

arXiv:2309.01105.

Jin, H., Huang, L., Cai, H., Yan, J., Li, B., and Chen, H.

(2024). From llms to llm-based agents for software

engineering: A survey of current, challenges and fu-

ture. arXiv preprint arXiv:2408.02479.

Langchain (2024). Langchain agents documenta-

tion. https://python.langchain.com/v0.1/docs/use

cases/sql/agents/. Accessed: 2024-09-06.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin,

V., Goyal, N., K

uttler, H., Lewis, M., Yih, W.-t.,

Rockt

aschel, T., et al. (2020). Retrieval-augmented

generation for knowledge-intensive nlp tasks. Ad-

vances in Neural Information Processing Systems,

33:9459–9474.

Li, H., Su, Y., Cai, D., Wang, Y., and Liu, L. (2022). A

survey on retrieval-augmented text generation. arXiv

preprint arXiv:2202.01110.

Liu, A., Hu, X., Wen, L., and Yu, P. S. (2023). A com-

prehensive evaluation of chatgpt’s zero-shot text-to-

sql capability. arXiv preprint arXiv:2303.13547.

Mialon, G., Dess

ı, R., Lomeli, M., Nalmpantis, C., Pa-

sunuru, R., Raileanu, R., Rozi

ere, B., Schick, T.,

Dwivedi-Yu, J., Celikyilmaz, A., et al. (2023). Aug-

mented language models: a survey. arXiv preprint

arXiv:2302.07842.

OpenAI (2023a). Chatgpt ﬁne-tune descrip-

tion. https://help.openai.com/en/articles/

6783457-what-is-chatgpt. Accessed: 2024-03-

01.

OpenAI (2023b). Chatgpt prompt engineer-

ing. https://platform.openai.com/docs/guides/

prompt-engineering. Accessed: 2024-04-01.

Pinheiro, J., Victorio, W., Nascimento, E., Seabra, A.,

Izquierdo, Y., Garcıa, G., Coelho, G., Lemos, M.,

Leme, L. A. P. P., Furtado, A., et al. (2023). On the

construction of database interfaces based on large lan-

guage models. In Proceedings of the 19th Interna-

tional Conference on Web Information Systems and

Technologies - Volume 1: WEBIST, pages 373–380.

INSTICC, SciTePress.

Saeed, M., De Cao, N., and Papotti, P. (2023). Query-

ing large language models with sql. arXiv preprint

arXiv:2304.00472.

Seabra, A., Nepomuceno, J., Lago, L., Ruberg, N., and

Lifschitz, S. (2024). Contrato360: uma aplicac¸

ao de

perguntas e respostas usando modelos de linguagem,

documentos e bancos de dados. In Anais do XXXIX

Simp

osio Brasileiro de Bancos de Dados.

Singh, A., Ehtesham, A., Kumar, S., and Khoei, T. T.

(2024). Enhancing ai systems with agentic workﬂows

patterns in large language model. In 2024 IEEE World

AI IoT Congress (AIIoT), pages 527–532. IEEE.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,

L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.

(2017). Attention is all you need. Advances in neural

information processing systems, 30.

Wang, M., Wang, M., Xu, X., Yang, L., Cai, D., and Yin,

M. (2023). Unleashing chatgpt’s power: A case study

on optimizing information retrieval in ﬂipped class-

rooms via prompt engineering. IEEE Transactions on

Learning Technologies.

Wang, Z., Wang, Z., Le, L., Zheng, H. S., Mishra, S., Perot,

V., Zhang, Y., Mattapalli, A., Taly, A., Shang, J., et al.

(2024). Speculative rag: Enhancing retrieval aug-

mented generation through drafting. arXiv preprint

arXiv:2407.08223.

White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert,

H., Elnashar, A., Spencer-Smith, J., and Schmidt,

D. C. (2023). A prompt pattern catalog to enhance

prompt engineering with chatgpt. arXiv preprint

arXiv:2302.11382.

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan,

K., and Cao, Y. (2023). React: Synergizing rea-

soning and actin in langage models. arXiv preprint

arXiv:2210.03629v3.

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

178