Improving Large Language Models Responses with Retrieval Augmented

Generation in Animal Production Certiﬁcation Platforms

Pedro Bilar Montero

, Jonas Bulegon Gassen

, Gl

enio Descovi

, Vin

ıcius Maran

Tais Oltramari Barnasque, Matheus Friedhein Flores

and Alencar Machado

Laboratory of Ubiquitous, Mobile and Applied Computing (LUMAC), Federal University of Santa Maria, Brazil

{pedro.bilar.montero, glenio.descovi, viniciusmaran, matheusfriedhein, alencar.comp}@gmail.com, jonas.gassen@ufsm.br,

Keywords:

Retrieval-Augmented Generation, Large Language Models, Poultry Health, Sanitary Certiﬁcation, PDSA-RS,

Brazilian Regulations, Legal Texts, Animal Health, Natural Language Processing.

Abstract:

This study explores the potential of integrating Large Language Models (LLMs) with Retrieval-Augmented

Generation (RAG) to enhance the accuracy and relevance of responses in domain-speciﬁc tasks, particularly

within the context of animal health regulation. Our proposal solution incorporates a RAG system on the PDSA-

RS platform, leveraging an external knowledge base to integrate localized legal information from Brazilian

legislation into the model’s response generation process. By combining LLMs with an information retrieval

module, we aim to provide accurate, up-to-date responses grounded in relevant legal texts for professionals in

the veterinary health sector.

1 INTRODUCTION

The ﬁeld of artiﬁcial intelligence (AI) has made

signiﬁcant advancements in recent years, encom-

passing a variety of subﬁelds like computer vision,

robotics, and, most notably, natural language process-

ing (NLP). NLP has especially thrived with the emer-

gence of Large Language Models (LLMs), which

have achieved state-of-the-art results across a range

of tasks, including text generation, summarization,

and language translation. These models, based on

deep neural networks and transformer architectures,

learn from vast corpora and adapt to complex linguis-

tic patterns with minimal supervision, allowing them

to generate coherent and contextually relevant text in

diverse applications (Brown et al., 2020). However,

while LLMs demonstrate broad generalization abili-

ties, they often encounter challenges when applied to

specialized domains, as these areas require extensive,

localized knowledge and precise understanding that is

not always encapsulated within large, generic datasets

https://orcid.org/0009-0002-9224-7694

https://orcid.org/0000-0001-8384-7132

https://orcid.org/0000-0002-0940-9641

https://orcid.org/0000-0003-1916-8893

https://orcid.org/0000-0003-4436-4327

https://orcid.org/0000-0002-6334-0120

(Gururangan et al., 2020; Bommasani et al., 2021)

In industry, the integration of AI into operational

systems has led to signiﬁcant advancements across

sectors such as healthcare, ﬁnance, and legal domains

(Bommasani et al., 2021; Brown et al., 2020). Within

the domain of animal health, for example, LLMs

hold potential to streamline administrative and com-

pliance processes by aiding in decision-making, regu-

latory adherence, and response generation. However,

domain-speciﬁc applications introduce complexities

that require accurate, context-aware responses to nu-

anced queries (Gururangan et al., 2020).

The regulatory frameworks governing animal

health, particularly in Brazil, exemplify this chal-

lenge, as professionals must navigate intricate le-

gal and sanitary guidelines to ensure compliance and

protect public health. The Brazilian Minist

erio da

Agricultura e Pecu

aria (MAPA) plays a key role

in overseeing and enforcing animal health regula-

tions. MAPA’s responsibilities include regulating vet-

erinary practices, approving vaccines, and monitor-

ing the health status of livestock throughout the coun-

try. A speciﬁc challenge in animal health regulation in

Brazil is ensuring compliance with the sanitary certi-

ﬁcation processes that guarantee the safety of animal

products for both domestic consumption and interna-

tional trade. The certiﬁcation process, which veriﬁes

that a farm or animal facility meets required sanitary

Montero, P. B., Gassen, J. B., Descovi, G., Maran, V., Barnasque, T. O., Flores, M. F. and Machado, A.

Improving Large Language Models Responses with Retrieval Augmented Generation in Animal Production Certiﬁcation Platforms.

DOI: 10.5220/0013286100003929

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 27th International Conference on Enterprise Information Systems (ICEIS 2025) - Volume 1, pages 489-500

ISBN: 978-989-758-749-8; ISSN: 2184-4992

489

standards, is particularly critical for industries such

as poultry farming. These certiﬁcations ensure that

animal products are free from diseases like avian in-

ﬂuenza, salmonella, and mycoplasmosis, which are

both economically damaging and potentially harmful

to humans.

The Plataforma de Defesa Sanit

aria Animal do

Rio Grande do Sul (PDSA-RS) is a platform de-

signed to support the ﬁeld of animal health regula-

tion in Brazil by implementing an information sys-

tem that integrates all stages of certiﬁcation processes

for poultry and swine farming in Rio Grande do

Sul. This platform facilitates the organization of pro-

duction activities while ensuring sanitary compliance

with Brazilian animal health regulations. In the case

of the PDSA-RS, veterinary certiﬁcation processes

depend on the model’s capacity to interpret complex

Brazilian legislation on animal health (Descovi et al.,

2021; Ebling et al., 2024; Schneider et al., 2024).

To address these specialized needs, the Retrieval-

Augmented Generation (RAG) framework has

emerged as a solution for enhancing LLM capa-

bilities by allowing the models to access external

knowledge bases. This framework retrieves relevant

information from a connected knowledge base

during the response-generation process, making it a

valuable approach for domains where specialized,

dynamic knowledge is required. Studies show that

RAG systems can effectively improve the accuracy

of generated responses in specialized ﬁelds by

supplementing LLMs with precise, domain-relevant

information (Lewis et al., 2020; Karpukhin et al.,

2020).

In this study, we aim to implement a RAG sys-

tem that can be integrated into the PDSA-RS plat-

form, enhancing its response-generation process with

domain-speciﬁc knowledge pertinent to the animal

health regulatory environment in Rio Grande do Sul.

By integrating retrieval mechanisms that draw from a

knowledge base of Brazilian legislation, our system

can produce contextually grounded responses aligned

with the requirements of the animal health regula-

tion in Brazil. This RAG integration holds the po-

tential to bridge the gap between general-purpose

language models and the precise, regulatory-driven

needs of professionals in the veterinary health sector,

contributing to a more effective and reliable AI appli-

cation within the industry.

Our study case will focus on the integration of

this RAG system within the PDSA, speciﬁcally in the

module responsible for poultry certiﬁcation. Our ob-

jective is to create an assistant that will help the pro-

fessional responsible for analyzing poultry certiﬁca-

tion processes to make decisions more quickly. The

purpose of this assistant is to validate all relevant data

for these requests, which will be processed by our

RAG system. Our focus will be to evaluate its perfor-

mance in providing accurate responses to certiﬁcation

and regulatory questions.

Our paper is organized as follows: Section 2

covers the background of our research, showing all

needed for this paper. In Section 3 we present our

methodology describing how we organized our ar-

chitecture and developed the RAG System. Section

4 is our study case about the implementation of the

RAG System within the PDSA-RS and in Section 5

we show the conclusions of this work and future re-

search possibilities.

2 BACKGROUND

This section provides context for the concepts utilized

in this paper.

2.1 Artiﬁcial Intelligence (AI)

Artiﬁcial Intelligence (AI) focuses on systems ca-

pable of performing tasks requiring human intelli-

gence. Over decades, AI has progressed remark-

ably, driven by machine learning (ML) and neural

networks, which shifted from symbolic logic-based

methods to data-driven approaches capable of iden-

tifying patterns in unstructured data (Bishop, 2006).

The advent of deep learning enabled signiﬁcant ad-

vancements, leveraging multilayered neural networks

to process complex data representations effectively

(LeCun et al., 2015). These breakthroughs led to

practical AI applications in healthcare, ﬁnance, and

legal compliance (Esteva et al., 2017; Sari and In-

drabudiman, 2024). Among these advancements,

Large Language Models (LLMs) have emerged as

transformative tools for tasks like text generation,

summarization, and translation, powered by innova-

tions like the Transformer architecture (Vaswani et al.,

2017). AI’s potential lies in its adaptability to do-

mains where contextual precision is critical.

2.2 Large Language Models and

Specialized Domains

A Large Language Model (LLM) is a type of artiﬁcial

intelligence (AI) designed to process and understand

human language at scale. These models are trained on

vast amounts of text data, enabling them to learn pat-

terns, relationships, and nuances of language. LLMs

have made a lot of progress in recent years, achiev-

ing state-of-the-art results in various NLP tasks such

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

490

as language translation, question answering, and text

generation.

The development of the transformer architecture

by (Vaswani et al., 2017) laid the groundwork for nu-

merous advancements in large-scale language mod-

eling, leading to the creation of inﬂuential mod-

els such as GPT by OpenAI. These models demon-

strated that language generation could achieve un-

precedented levels of ﬂuency, coherence, and adapt-

ability in tasks like translation, summarization, and

question answering (Radford and Narasimhan, 2018;

Brown et al., 2020). However, most early LLMs were

proprietary, restricting their accessibility and limiting

the potential for customization and improvement by

the broader research community.

The release of the LLaMA (Large Language

Model Meta AI) family by Meta marked a signiﬁcant

shift in this paradigm, as it offered high-performance,

large-scale models with an open-access architecture.

LLaMA’s open-source nature allows researchers and

developers to ﬁne-tune and adapt the model for spe-

ciﬁc tasks, including specialized domains that require

knowledge and expertise beyond general language

capabilities. This openness has made LLaMA par-

ticularly valuable in academic and applied research

settings, where access to large models with ﬂexible

adaptation options is essential for innovation (Tou-

vron et al., 2023).

However, despite their impressive capabilities,

LLMs often struggle with domain-speciﬁc tasks that

require specialized knowledge and context. This lim-

itation arises from several factors:

• Lack of Domain Expertise. While LLMs can be

ﬁne-tuned on speciﬁc domains, they may not fully

grasp the intricacies of that domain without exten-

sive training data (Zhang et al., 2023).

• Domain-Speciﬁc Nuances. Domains like law,

healthcare, ﬁnance, or agriculture involve com-

plex rules, exceptions, and subtleties that may

be difﬁcult for LLMs to capture without explicit

training on these domains (Google Cloud, 2023).

• Limited Contextual Understanding. LLMs rely

on statistical patterns in language to make predic-

tions or generate text. However, this approach can

lead to misinterpretation or misunderstanding of

domain-speciﬁc context, leading to inaccurate re-

sults (Brown et al., 2020).

These limitations have led to the development of

hybrid architectures, such as Retrieval-Augmented

Generation (RAG), that aim to enhance LLMs by in-

corporating external knowledge sources to improve

their performance in speciﬁc ﬁelds (Lewis et al.,

2020; Karpukhin et al., 2020). Unlike ﬁne-tuning,

which requires training the model on a domain-

speciﬁc dataset and periodically retraining to update

its knowledge, RAG offers a more dynamic approach

by retrieving relevant information from an external

database or knowledge source in real-time. This ca-

pability is particularly advantageous in ﬁelds with fre-

quently updated information, such as legal, medical,

and regulatory domains, where the model’s responses

need to reﬂect the latest standards and guidelines.

2.3 Retrieval-Augmented Generation

(RAG) Architecture

Retrieval-Augmented Generation (RAG) is an archi-

tecture that enhances large language models (LLMs)

by dynamically combining information retrieval with

text generation, thus allowing LLMs to leverage ex-

ternal knowledge sources while generating responses.

Unlike traditional LLMs that rely solely on pre-

trained knowledge, RAG introduces a retrieval mech-

anism that fetches relevant external documents or data

points, incorporating them into the generation pro-

cess for more contextually accurate responses (Lewis

et al., 2020; Karpukhin et al., 2020).

In RAG, the process starts with transforming user

input into an embedding — a mathematical repre-

sentation of the text. This embedding is then used

to search a vector database, where documents are

pre-processed and stored as embeddings. The vec-

tor database is essential for ﬁnding semantically rele-

vant information in response to the user’s query, as

it allows efﬁcient matching of queries with stored

knowledge chunks. Upon retrieval, these documents

are integrated into the generation component of the

model, producing responses that are contextually rel-

evant and informed by up-to-date knowledge sources

(Lewis et al., 2020).

Figure 1 outlines the RAG workﬂow in four steps:

• 1: The user inputs a query, for example, asking a

question about a recent topic that isn’t present on

the training data of the LLM.

• 2: This step illustrates the indexing of documents

that are split into chunks, encoded into vectors,

and stored in a vector database. The user query

is then used to search this database for relevant

content by a similarity search.

• 3: After the search, the most relevant chunks are

retrieved based on semantic similarity.

• 4: After the search, the most relevant chunks are

retrieved based on semantic similarity.

One of the signiﬁcant advantages of RAG is its

ability to integrate continuously updated information,

Improving Large Language Models Responses with Retrieval Augmented Generation in Animal Production Certiﬁcation Platforms

491

Figure 1: Illustration of the RAG Architecture: Information retrieval and generation process. Adapted from (Gao et al., 2023).

making it particularly valuable in ﬁelds that require

timely data, such as regulatory environments where

guidelines frequently change. By dynamically ac-

cessing relevant external content, RAG reduces the

risk of outdated or inaccurate responses, improving

both the accuracy and relevance of the model’s out-

put. Furthermore, RAG minimizes the computational

demands associated with continuous ﬁne-tuning, as

it allows the model to access and integrate domain-

speciﬁc knowledge on demand rather than retraining

on static datasets (Izacard and Grave, 2021; Lewis

et al., 2020).

In summary, RAG’s approach of combining re-

trieval and generation enables LLMs to adapt to spe-

ciﬁc knowledge requirements, providing more robust

support in specialized domains that beneﬁt from up-

dated and context-sensitive responses.

2.4 Fine-Tuning vs

Retrieval-Augmented Generation

(RAG)

In this section, we delve deeper into whether retrieval

modules, as employed in RAG, provide a more robust

benchmark than ﬁne-tuning techniques for achieving

domain-speciﬁc accuracy and efﬁciency.

2.4.1 Retrieval as a Benchmark for

Domain-Speciﬁc Accuracy

As Section 2.3 outlines, RAG serve as a dynamic

component by enabling LLMs to access external, up-

to-date knowledge bases in real time. This capability

contrasts with ﬁne-tuning, which relies on embedding

static domain knowledge into the model. Retrieval’s

real-time adaptability ensures that responses remain

accurate in evolving ﬁelds such as legal or regulatory

environments.

2.4.2 Efﬁciency in Computational Resource

Utilization

Fine-tuning techniques, while effective, demand ex-

tensive computational resources and data curation.

QLoRA, for instance, reduces these requirements

by ﬁne-tuning low-rank adaptation layers (Dettmers

et al., 2024), yet it still requires periodic retraining to

incorporate new domain knowledge. Retrieval, on the

other hand, bypasses this need by separating knowl-

edge storage from the generative process, thereby

minimizing overhead and ensuring efﬁcient use of re-

sources (Lewis et al., 2020).

2.4.3 Comparative Evaluation Metrics

When considered as a tool for achieving domain-

speciﬁc accuracy, retrieval modules excel in:

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

492

• Adaptability. Retrieval enables models to re-

spond to domain-speciﬁc queries with real-time

context, making it better suited for ﬁelds with dy-

namic knowledge requirements.

• Scalability. By ofﬂoading knowledge storage to

external databases, retrieval reduces the need for

model scaling, unlike ﬁne-tuning which often re-

quires larger model sizes to capture domain intri-

cacies.

• Benchmarking Potential. Retrieval serves as an

ongoing benchmark by continuously updating its

knowledge base, allowing real-world validation of

LLM performance in specialized domains.

From a theoretical standpoint, retrieval modules

highlight a paradigm shift in LLM optimization by

decoupling knowledge retrieval from generative ca-

pabilities. While ﬁne-tuning such as QLoRA embeds

domain-speciﬁc expertise into the model, retrieval

treats knowledge as an external, modular component.

This distinction positions retrieval as not merely a

complement to ﬁne-tuning but as a potential alterna-

tive benchmark for evaluating domain-speciﬁc effec-

tiveness.

2.5 PDSA-RS and Animal Health

Regulations in Brazil

Established in 2019, the Plataforma de Defesa

Sanit

aria Animal do Rio Grande do Sul (PDSA-RS)

supports animal health and production in Rio Grande

do Sul through a real-time digital platform for manag-

ing certiﬁcations and ensuring compliance with sani-

tary regulations. Developed by the Federal University

of Santa Maria and MAPA, with Fundesa’s support, it

enhances biosecurity, traceability, and export facilita-

tion.

The system’s modules, such as the poultry health

certiﬁcation feature, streamline data collection and

certiﬁcation issuance, linking veterinary inspections,

laboratories, and the agricultural defense authorities.

This interconnected system helps ofﬁcials monitor

disease control in ﬂocks and facilitates efﬁcient re-

sponse to health risks. PDSA-RS allows inspectors

and producers to follow up on health tests, sam-

ple processing, and the issuance of certiﬁcates re-

quired for both domestic and international movement

of poultry, ensuring that health standards are met con-

sistently.

As illustrated in ﬁgure 2, the platform adopts

a microservices-oriented architecture. The front-

end comprises several specialized portals tailored for

different stakeholders: the State Veterinary Service

(SVE), technical managers (RTs), agricultural labo-

ratories, and the Ministry of Agriculture (MAPA).

On the back-end, the architecture differentiates

between two distinct types of REST APIs. The busi-

ness APIs manage the core logic and processes asso-

ciated with the platform’s regulatory functions, ensur-

ing that workﬂows and data management align with

speciﬁc legal and procedural requirements. In con-

trast, the service APIs provide more generic func-

tionality, supporting integration and interoperability

with the business APIs by delivering reusable services

across the platform.

This study introduces a dedicated service API to

integrate the platform with a RAG System, facilitat-

ing retrieval and synthesis of regulatory knowledge,

as detailed in subsequent sections.

3 METHODOLOGY

This section outlines the core components of our RAG

System architecture, including the setup of the RAG

Module and the tool used to run our LLM.

3.1 RAG Module

The RAG Module is a structured system designed to

ingest, store, and retrieve documents content based

on semantic similarity. It primarily functions to im-

prove response accuracy by retrieving relevant doc-

ument segments that align with user queries, utiliz-

ing vector embeddings for efﬁcient similarity-based

search to later be fed into the LLM.

3.1.1 Document Ingestion

The ingestion process prepares documents for storage

in the vector database. This involves multiple steps:

• Text Extraction. Document content is extracted

from various ﬁle types (PDFs, text ﬁles, etc.),

preparing it for chunking and processing.

• Chunking and Tokenization. Documents are di-

vided into smaller sections, such as sentences or

paragraphs. Chunking improves retrieval preci-

sion and allows targeted access to speciﬁc infor-

mation. Tokenization follows, breaking down text

into discrete tokens for embedding..

• Embedding Creation. Each chunk is trans-

formed into a vector embedding using a pre-

trained embedding model, which captures the se-

mantic content of the text. The embedding is a

high-dimensional numerical representation of the

chunk, enabling similarity-based retrieval within

the vector database.

Improving Large Language Models Responses with Retrieval Augmented Generation in Animal Production Certiﬁcation Platforms

493

Figure 2: Overview of the PDSA-RS Architecture.

We choose to use the LlamaIndex data framework

for this stage, leveraging the Python package llama-

index. This package provides ready-to-use functions

for document ingestion, which simpliﬁes the process

of embedding documents, indexing them, and storing

them in a vector database for efﬁcient retrieval.

3.1.2 Vector Storage

The RAG module utilizes a vector database to store

embeddings. This type of storage allows efﬁcient re-

trieval through similarity search, where embeddings

with high semantic similarity to a query are located

based on distance metrics. PostgreSQL was selected

as the database solution, using the PGVector exten-

sion to store embeddings. PGVector is a PostgreSQL

extension that provides efﬁcient support for vector-

based data storage, making it well-suited for handling

the embeddings generated by the LlamaIndex frame-

work.

3.1.3 Retrieval Process

The RAG module’s retrieval component is used when

a user submits a query. The query is converted into

an embedding and used to locate relevant document

chunks stored in the vector database:

• Query Pre-Processing. Before retrieval, the

query may undergo ﬁltering and rephrasing to

enhance search relevance. Steps may include

expanding acronyms, removing stop words, and

restructuring complex queries into simpler sub-

queries. These adjustments help the RAG mod-

ule better interpret the query’s intent and improve

retrieval accuracy.

• Similarity Search. Using the query embed-

ding, a similarity search is conducted in the vec-

tor database. This search identiﬁes chunks that

closely align with the query’s semantic content.

Cosine similarity or other distance-based metrics

are applied to ﬁnd the most relevant chunks, en-

suring that only document sections pertinent to the

query are retrieved.

We are again utilizing LlamaIndex for the retrieval

process, taking advantage of its built-in search func-

tions. LlamaIndex provides efﬁcient methods for per-

forming semantic searches over document embed-

dings stored in the vector database. These search

functions allow us to ﬁnd the most relevant document

chunks based on the similarity between the query em-

bedding and the stored document embeddings.

3.1.4 Maintenance and Updates

To keep information relevant and accurate, the RAG

module may undergo periodic updates:

• Embedding Updates. As new documents be-

come available, they are embedded and stored in

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

494

the vector database to expand the scope of retriev-

able content.

• Maintenance of Vector Store. Regular mainte-

nance helps optimize performance and relevance.

Outdated information can be removed, and fre-

quently accessed chunks may be optimized for

faster retrieval, ensuring that the vector store re-

mains responsive and accurate.

3.2 Large Language Model Setup

The LLM selected for use in our RAG system is the

Llama 3.1 8B Instruct model. This model, part of

Meta’s LLaMA family, is a ﬁne-tuned model from the

base Llama 3.1 8B, designed for instruction-following

tasks. We preferred the 8B version as it has a balance

between computational efﬁciency and performance,

offering a strong capacity for understanding and gen-

erating responses while also maintaining an efﬁcient

power consumption.

3.2.1 Model Selection and Conﬁguration

To run our LLM, we have chosen to use privateGPT

in conjunction with the Ollama framework. This com-

bination allows us to maintain complete control over

the model and data, ensuring privacy and security.

The privateGPT is a solution designed to run mod-

els locally, ensuring that the data and queries remain

private without needing to send sensitive information

to external servers. It is useful when working with

conﬁdential or proprietary data, like legal or medical

information. In this setup, privateGPT operates as the

interface to manage the LLM, which is hosted on a

private server, offering full control over the model.

To run the model locally, privateGPT offers two

frameworks: llama.cpp and Ollama. We chose Ol-

lama based on the privateGPT documentation, which

recommends it for its greater versatility in running

across different computational environments. This

framework is designed to simplify model deployment,

providing an efﬁcient and ﬂexible setup that can be

easily adapted to various hardware conﬁgurations.

3.2.2 API

To allow external services to interact with our system,

we chose to use an API. PrivateGPT already includes

an API as part of its structure, making it easy to in-

tegrate our system with external services. This API

provides all the necessary endpoints for document in-

gestion, query handling, and response generation, ab-

stracting the complexities of the RAG pipeline while

allowing external systems to make requests and re-

ceive responses in a standardized way.

This API is divided into two logical blocks:

• High-Level API. Abstracts the complexities of

the RAG pipeline by automating document pro-

cessing and response generation. It manages

tasks such as document parsing, splitting, meta-

data extraction, embedding generation, and stor-

age, preparing documents for efﬁcient retrieval.

Additionally, the API handles chat and comple-

tion processes by retrieving relevant content, en-

gineering prompts, and generating responses, al-

lowing users to focus on querying the system and

receiving contextually relevant answers based on

the ingested documents, without needing to man-

age the intricate details of retrieval and genera-

tion.

• Low-Level API. Provides advanced users with

the ability to generate embeddings for any given

piece of text. It also includes an endpoint for

contextual chunks retrieval, which, when given

a query, searches the ingested documents and re-

turns the most relevant text chunks.

3.3 Architecture Overview

Our architecture can be summarized as shown in ﬁg-

ure 3. The system is composed of the RAG Module,

which is responsible for the entire process of inges-

tion, storage, and retrieval of context, and the LLM,

which processes user queries. The RAG module han-

dles document parsing, embedding generation, and

storing the relevant context in a vector database for

efﬁcient retrieval. The LLM, once the context is re-

trieved, generates responses based on the processed

queries, ensuring the responses are contextually rele-

vant and informative.

4 CASE STUDY

To evaluate the effectiveness of our RAG system on

the PDSA-RS platform, we conducted a case study to

integrate the RAG system within the platform and as-

sess its performance in handling regulatory questions

related to poultry certiﬁcation.

In order for an establishment to obtain a sanitary

certiﬁcate they must submit samples of birds, poul-

try products (such as eggs), among other materials

for laboratory testing in institutions accredited by the

MAPA. The purpose of these collections is to ensure

that the batches are free from pathogenic agents such

as Salmonella or other relevant infectious agents. For

each age group of birds, there are different rules re-

garding the quantity of materials to be collected, the

Improving Large Language Models Responses with Retrieval Augmented Generation in Animal Production Certiﬁcation Platforms

495

Figure 3: Overview of the RAG System Architecture.

types that must be collected, and the combinations

of each material. Furthermore, the purpose of pro-

duction for these birds must also be considered, as

it inﬂuences the aforementioned parameters. We use

these parameters to deﬁne the query that will be sent

to the retrieval module to search for context in our

knowledge base.

For the documents used in the analyses performed

by the RAG system, we are utilizing IN 78/2003

(Minist

erio da Agricultura, 2003) and IN 44/2001

(Minist

erio da Agricultura, 2001). These documents

outline the technical standards for disease control and

certiﬁcation of poultry establishments free or con-

trolled from diseases like Salmonella and Mycoplas-

mosis.

4.1 Query Structure

We analyzed an object representing a completed certi-

ﬁcation request, which includes data on diseases mon-

itored , sanitary conditions and laboratory testings.

In this study, the focus was verifying whether all the

laboratory testing for monitored diseases met the re-

quired standards.

The data structure from a sample certiﬁcation ob-

ject includes:

• Nucleus Details. Information about the facility,

such as active status, purpose of production (e.g.,

Hatchery), and other relevant parameters.

• Laboratory Exams. The list of tested diseases,

including Pullorum (SP), Gallinarum (SG), and

others, containing dates of each test and also ma-

terials used for testing.

• Sanitary Conditions. Each disease status (e.g.,

Free or Vaccinated).

Our system was tasked to analyze the object, par-

ticularly the laboratory exams, to ensure that all ele-

ments were compliant with the required standards for

certiﬁcation.

As the context of our work is within a controlled

environment, we have predictability of which data we

want to feed into our LLM, thus we can use pre-

established queries for the context where the module

will be used. In the example of the health certiﬁcation

process for poultry establishments, we make choices

according to the rules of each stage of the life cycle

of a poultry establishment.

We did some experimentation and arrived at the

following system prompt, which will be used to

provide background instructions on how the model

should answer. This prompt showed little incidence

of hallucinations in the tests carried out:

”You must act as an expert in poultry farming

and certiﬁcation. Answer questions accurately about

poultry certiﬁcation and biosecurity. If unsure, indi-

cate that the question cannot be answered.”

For the condensed prompt we followed the struc-

ture below:

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

496

Use the content above to respond to the query

below if relevant, or respond to the best of

your ability without it.

The condensed prompt is designed aiming to

boost contextual awareness and foster model behavior

that prioritizes accuracy by only generating responses

aligned with the provided information.

As shown in table 1, for the parameters of our

LLM, we set the temperature value to 0.8. With this

setting we were able to balance the model’s ability to

explore different possibilities while avoiding trivial or

overly repetitive answers. Temperature is a hyperpa-

rameter that controls the randomness or softness of

output probabilities in LLMs, allowing for more di-

verse or exploratory language generation rather than

deterministic predictions. Additionally, we employed

an adaptive context window size with a maximum ca-

pacity of 1024 tokens, accommodating a wide range

of input lengths while ensuring efﬁcient memory use.

For output search strategies, we implemented a beam

search with a width of 16, enabling the model to eval-

uate multiple possible answers and select the best out-

put path based on probability scores.

Resource allocation was also a focus, with GPU

memory dynamically set at 8GB to allow efﬁcient

processing on available hardware. CPU utilization

was capped at 80%, with a maximum of 16 threads,

mostly because of our limitation with the test hard-

ware, to ensure that the model could make full use of

our CPU power.

Table 1: LLM Hyperparameter Conﬁguration.

Hyperparameter Value

Temperature 0.8

Context Window Size Adaptive (max 1024 tokens)

Beam Search Width 16

GPU Memory Allocation 8GB (dynamic allocation)

CPU Thread Utilization 80% (max 32 threads)

We encountered signiﬁcant challenges in running

inferences on our test hardware setup, which con-

sists of a Ryzen 7 5800X3D CPU and a Radeon RX

7700XT 12GB GPU. Speciﬁcally, we had to carefully

manage GPU memory allocation to prevent excessive

memory usage, as the RX 7700XT’s 12GB VRAM

is limited compared to more powerful professional-

grade GPUs.

The integration of the RAG System with PDSA-

RS is done within one of the platform’s several mi-

croservices, more speciﬁcally the service responsible

for the poultry certiﬁcation process, called Aves API.

Within this API, an endpoint is responsible for mak-

ing the call to the RAG system for chat completion,

containing all the information that is extracted from

PDSA-RS according to the user’s request. As shown

in ﬁgure 4, this call is made through a button found

on the front-end of the platform, within the compo-

nent where certiﬁcate requests are analyzed.

4.2 Results

We then conducted tests using certiﬁcation request

data to evaluate the system’s performance. For eval-

uation metrics, we used response accuracy and gen-

eration time. To measure the accuracy of the RAG

system, we extracted real data from certiﬁcations that

were produced by specialists that use the PDSA-RS.

We used the condensed prompt described in section

4.1 to each evaluation step, baseline and RAG.

We take the laboratory exams and create a query

to ask the LLM if, based on these exams, all the re-

quired materials were collected on the correct dates,

according to the purpose of production for each stage.

Based on this analysis, we ask the LLM to determine

the sanitary condition of the farm for the tested dis-

ease and if it would be eligible for certiﬁcation. We

compare the results with the sanitary conditions that

were made by a specialist to verify if the generation

by the LLM was correct.

We categorized responses into two types:

• Fully Correct Responses (Rc): The model cor-

rectly identiﬁed compliance or non-compliance

for all nuclei.

• Partially Correct Responses (Rp): The model

provided partially accurate responses, missing or

misinterpreting some details.

The formula used to calculate accuracy was:

Accuracy =



Rc +(W × Rp)



× 100

Where:

• N = Total queries in the benchmark

• W = Weight for partially correct responses (set to

0.5 in our case)

For comparison, we generated responses in two

modes: with RAG, using the IN documents incorpo-

rated into our knowledge base, and without RAG, us-

ing only the base model. This approach allowed us

to assess the impact of the RAG module on response

quality and processing efﬁciency.

For the baseline model, which does not use the

RAG system, the evaluation metrics were as follows:

• N = 100: Total queries related to certiﬁcation re-

quests.

Improving Large Language Models Responses with Retrieval Augmented Generation in Animal Production Certiﬁcation Platforms

497

Figure 4: Screenshot of the poultry certiﬁcation analysis from the PDSA-RS.

• Rc = 2: Fully aligned responses.

• Rp = 20: Partially correct responses.

Out of 100, there were 78 completely incorrect re-

sponses that were not accounted and therefore do not

appear in the above parameters.

Applying the formula for the baseline:

Accuracy =



2 + (0.5 × 20)

100



× 100 = 12%

For the RAG-enhanced system, the evaluation

metrics were:

• N = 100: Total queries related to certiﬁcation re-

quests.

• Rc = 40: Fully aligned responses.

• Rp = 30: Partially correct responses.

Out of 100, there were 30 completely incorrect re-

sponses that were not accounted and therefore do not

appear in the above parameters.

Applying the formula for the RAG-enhanced sys-

tem:

Accuracy =



40 + (0.5 × 30)

100



× 100 = 55%

As shown in table 2, baseline generation showed a

low accuracy of 12%, which is a result of the LLM’s

Table 2: Model Performance.

Phase Response Accuracy Generation Time (seconds)

Baseline 12% 5.2

RAG 55% 8.3

limited capacity to contextualize and apply domain-

speciﬁc knowledge regarding poultry certiﬁcation and

internal monitoring standards.

In contrast, the RAG demonstrated a signiﬁcant

improvement in accuracy. The model’s ability to re-

trieve relevant legal texts allowed it to generate re-

sponses that were more informed and contextually ac-

curate.

The generation time is slightly longer when using

RAG, as content retrieval is required before the re-

sponse generation process can begin. This additional

retrieval step, which involves searching for and load-

ing relevant context from the knowledge base, nat-

urally extends the time needed to generate each re-

sponse. However, this trade-off enhances response

accuracy and relevance by incorporating contextual

information.

The results indicate that the integration of a

RAG system substantially improves response accu-

racy, showing an accuracy increase of 43% when us-

ing the RAG approach.

The ﬁndings from this case study suggest that inte-

grating Retrieval-Augmented Generation (RAG) into

a platform like PDSA-RS offers a practical solution

for enhancing the performance of LLMs in domain-

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

498

speciﬁc tasks as well as assisting and accelerating the

process of analyzing the required standards necessary

for poultry certiﬁcation.

It is important to emphasize that the objective of

this case study was not to replace human expertise

in the poultry certiﬁcation process. Rather, the focus

was to assess how a LLM, integrated with a Retrieval-

Augmented Generation (RAG) system, could assist in

streamlining certain tasks by providing relevant le-

gal information and context. While LLMs can offer

valuable support in processing complex data and re-

trieving domain-speciﬁc knowledge, they still present

limitations, such as the potential for generating in-

accurate or incomplete responses. Therefore, human

oversight remains crucial to ensure the reliability and

precision of the certiﬁcation process, particularly in

areas where nuanced judgment and expert knowledge

are required.

5 CONCLUSION

The integration of a Retrieval-Augmented Generation

(RAG) system into the PDSA-RS platform has proven

to be a valuable advancement in improving the accu-

racy and contextual relevance of large language mod-

els (LLMs) within the specialized domain of animal

health regulation. By leveraging domain-speciﬁc re-

trieval to supplement the generative capabilities of

LLMs, our system bridges the gap between general-

purpose language models and the unique, nuanced

needs of regulatory environments.

Our case study highlights the potential for RAG

systems to streamline and enhance the way legal and

regulatory queries are handled, especially in complex

sectors like veterinary health. The performance gains

observed through the incorporation of relevant reg-

ulatory texts into the LLM’s output underscore the

value of domain-adapted retrieval processes in in-

creasing both the precision and usefulness of the gen-

erated responses. This approach demonstrates that

RAG not only improves response quality but also pro-

vides a scalable solution to address domain-speciﬁc

challenges, where accuracy and legal compliance are

paramount.

Looking ahead, the continuous reﬁnement of both

the retrieval module and the underlying language

model will be critical in ensuring that the PDSA-RS

platform remains capable of delivering accurate and

timely legal guidance.

From a theoretical perspective, this study empha-

sizes the adaptability of RAG in dynamic environ-

ments and its ability to overcome knowledge obso-

lescence—a limitation inherent to ﬁne-tuned models.

Furthermore, the ﬁndings contribute to the broader

discourse on hybrid AI systems, where generative

and retrieval capabilities are combined to enhance

domain-speciﬁc applications.

For future work, there is the potential to ﬁne-

tune the model with speciﬁc documents, which could

further enhance the relevance and accuracy of con-

tent generation on this subject. This additional ﬁne-

tuning would allow the model to better handle com-

plex, domain-speciﬁc queries, delivering more pre-

cise responses tailored to the chosen topic. Addi-

tionally, developing a graphical user interface (GUI)

could signiﬁcantly improve user interaction by allow-

ing users to directly input various documents and in-

teract with them. This interface would enable users

to upload documents, ask questions, and receive con-

textually relevant responses, making the system more

intuitive and accessible for end-users engaging with

diverse content.

In conclusion, while the RAG-enhanced LLM sys-

tem offers signiﬁcant beneﬁts, it is essential to main-

tain human oversight, particularly in legal and reg-

ulatory contexts where misinterpretation of guide-

lines could have serious consequences. The role

of the LLM is not to replace human expertise but

to augment decision-making by providing informed,

contextually relevant suggestions. This hybrid ap-

proach—leveraging both cutting-edge AI and human

expertise—represents a promising path forward for

regulatory platforms like PDSA-RS, fostering innova-

tion while ensuring the accuracy and integrity of the

certiﬁcation processes that underpin Brazil’s animal

health sector.

ACKNOWLEDGEMENT

This research is supported by FUNDESA, project

“Combining Process Mapping and Improvement with

BPM and the Application of Data Analytics in

the Context of Animal Health Defense and Inspec-

tion of Animal-Origin Products in the State of RS”

(UFSM/060496) and by MPA (Minist

erio da Pesca e

Aquicultura), project “Use of Artiﬁcial Intelligence

in the Systematization of Hygienic-Sanitary Certiﬁ-

cation Processes for Vessels and Accreditation of Le-

gal Origin of Fish” (UFSM/060642). The research by

Vin

ıcius Maran is partially supported by CNPq grant

306356/2020-1 DT-2, CNPq PIBIC and PIBIT pro-

gram and FAPERGS PROBIC program.

Improving Large Language Models Responses with Retrieval Augmented Generation in Animal Production Certiﬁcation Platforms

499

REFERENCES

Bishop, C. (2006). Pattern Recognition and Machine

Learning, volume 16, pages 140–155.

Bommasani, R., Hudson, D., Adeli, E., Altman, R., Arora,

S., Arx, S., Bernstein, M., Bohg, J., Bosselut, A.,

Brunskill, E., Brynjolfsson, E., Buch, S., Card, D.,

Castellon, R., Chatterji, N., Creel, K., Davis, J., Dem-

szky, D., and Liang, P. (2021). On the opportunities

and risks of foundation models.

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D.,

and Dhariwal, e. a. (2020). Language models are few-

shot learners. In Larochelle, H., Ranzato, M., Hadsell,

R., Balcan, M., and Lin, H., editors, Advances in Neu-

ral Information Processing Systems, volume 33, pages

1877–1901. Curran Associates, Inc. pp. 33-34.

Descovi, G., Maran, V., Ebling, D., and Machado, A.

(2021). Towards a blockchain architecture for animal

sanitary control. In Proceedings of the 23rd Inter-

national Conference on Enterprise Information Sys-

tems - Volume 1: ICEIS,, pages 305–312. INSTICC,

SciTePress.

Dettmers, T., Pagnoni, A., Holtzman, A., and Zettlemoyer,

L. (2024). Qlora: efﬁcient ﬁnetuning of quantized

llms. In Proceedings of the 37th International Con-

ference on Neural Information Processing Systems,

NIPS ’23, Red Hook, NY, USA. Curran Associates

Inc.

Ebling, D., Machado, F., Descovi, G., Cardenas, N.,

Machado, G., Maran, V., and Machado, A. (2024). A

distributed processing architecture for disease spread

analysis in the pdsa-rs platform. In Proceedings of the

26th International Conference on Enterprise Informa-

tion Systems - Volume 2: ICEIS, pages 313–320. IN-

STICC, SciTePress.

Esteva, A., Kuprel, B., Novoa, R., Ko, J., Swetter, S., Blau,

H., and Thrun, S. (2017). Dermatologist-level clas-

siﬁcation of skin cancer with deep neural networks.

Nature, 542.

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y.,

Dai, Y., Sun, J., Guo, Q., Wang, M., and Wang, H.

(2023). Retrieval-augmented generation for large lan-

guage models: A survey. ArXiv, abs/2312.10997.

Google Cloud (2023). A three-step design pattern for spe-

cializing llms. Accessed: 2024-09-13.

Gururangan, S., Marasovi

c, A., Swayamdipta, S., Lo, K.,

Beltagy, I., Downey, D., and Smith, N. A. (2020).

Don’t stop pretraining: Adapt language models to do-

mains and tasks. In Jurafsky, D., Chai, J., Schluter,

N., and Tetreault, J., editors, Proceedings of the 58th

Annual Meeting of the Association for Computational

Linguistics, pages 8342–8360, Online. Association

for Computational Linguistics.

Izacard, G. and Grave, E. (2021). Leveraging passage re-

trieval with generative models for open domain ques-

tion answering. In Merlo, P., Tiedemann, J., and Tsar-

faty, R., editors, Proceedings of the 16th Conference

of the European Chapter of the Association for Com-

putational Linguistics: Main Volume, pages 874–880,

Online. Association for Computational Linguistics.

Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L.,

Edunov, S., Chen, D., and Yih, W.-t. (2020). Dense

passage retrieval for open-domain question answer-

ing. In Webber, B., Cohn, T., He, Y., and Liu,

Y., editors, Proceedings of the 2020 Conference on

Empirical Methods in Natural Language Processing

(EMNLP), pages 6769–6781, Online. Association for

Computational Linguistics.

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learn-

ing. Nature, 521:436–44.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin,

V., Goyal, N., K

uttler, H., Lewis, M., Yih,

W.-t., Rockt

aschel, T., Riedel, S., and Kiela,

D. (2020). Retrieval-augmented generation for

knowledge-intensive nlp tasks. In Proceedings of the

34th International Conference on Neural Information

Processing Systems, NIPS ’20, Red Hook, NY, USA.

Curran Associates Inc.

Minist

erio da Agricultura, Pecu

aria e Abastecimento, S. d.

D. A. (2001). Instruc¸

ao normativa nº 44, de 23 de

agosto de 2001. Accessed: 2024-10-23.

Minist

erio da Agricultura, Pecu

aria e Abastecimento, S. d.

D. A. (2003). Instruc¸

ao normativa nº 78, de 3 de

novembro de 2003. Accessed: 2024-10-23.

Radford, A. and Narasimhan, K. (2018). Improving lan-

guage understanding by generative pre-training.

Sari, Y. and Indrabudiman, A. (2024). The role of artiﬁcial

intelligence (ai) in ﬁnancial risk management. For-

mosa Journal of Sustainable Research, 3:2073–2082.

Schneider, R., Machado, F., Trois, C., Descovi, G., Maran,

V., and Machado, A. (2024). Speeding up the simu-

lation animals diseases spread: A study case on r and

python performance in pdsa-rs platform. In Proceed-

ings of the 26th International Conference on Enter-

prise Information Systems - Volume 2: ICEIS, pages

651–658. INSTICC, SciTePress.

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux,

M.-A., Lacroix, T., Rozi

ere, B., Goyal, N., Hambro,

E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E.,

and Lample, G. (2023). Llama: Open and efﬁcient

foundation language models.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,

L., Gomez, A. N., Kaiser, L. u., and Polosukhin,

I. (2017). Attention is all you need. In Guyon,

I., Luxburg, U. V., Bengio, S., Wallach, H., Fer-

gus, R., Vishwanathan, S., and Garnett, R., editors,

Advances in Neural Information Processing Systems,

volume 30. Curran Associates, Inc.

Zhang, S., Dong, L., Li, X., Zhang, S., Sun, X., Wang, S.,

Li, J., Hu, R., Zhang, T., Wu, F., et al. (2023). In-

struction tuning for large language models: A survey.

arXiv preprint arXiv:2308.10792.

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

500