Retrieval-Augmented Generation Solutions for Typical Application

Process Issues

Yudi Zhang

Information and Computing Science, Beijing University of Civil Engineering and Architecture,

No. 15 Yongyuan Road, Beijing, China

Keywords: Retrieval-Augmented Generation, Large Language Model, Reranking.

Abstract: Challenges such as the generation of factually incorrect illusions, privacy issues, and outdated information

often hinder the practical deployment in the Large Language Model (LLM). Retrieval-Augmented Generation

(RAG), which utilizes advanced retrieval technology, is designed to address these issues. RAG will use the

embedded vector model to build an external database for the information to be updated and the information

of the application field, enhance the user prompt by adding the retrieved relevant data in the context, and

retrieve the matched content of the vector library. In this process, how to improve the retrieval efficiency and

quality, and how to improve the robustness of the model are the focus of the method discussed in this paper.

Gradient Guided Prompt Perturbation (GGPP) uses top k to minimize the distance between the target

paragraph embedding vector and query embedding vector while maximizing the distance between original

paragraph embedding and query embedding to reduce the influence of perturbation on the model and improve

the robustness of the model. Boolean agent RAG setups improve markup efficiency in a language model by

incorporating Boolean decision steps where the language model determines whether to query vector databases

based on user input. This setting saves a lot of tokens. GenRT is an algorithm that optimizes reordering and

truncation strategies to improve efficiency and accuracy in processing long text. Finally, the application of

medical question answering system is cited to find the best combination of the retrieval and LLM model in

this field.

1 INTRODUCTION

Large language models (LLMS) are driving a major

revolution in global development - freeing

productivity. LLMS show an amazing talent for

semantic understanding and reasoning (Chang, 2023;

Kasneci, 2023). When using talent to join the Q&A

system, it is inevitable to encounter problems caused

by illusions and prejudices. For one thing, large

language models may fabricate answers that

contradict reality when they generate text. Second,

large language models lack knowledge and

information lag in vertical domains. These two

factors will reduce the reliability of LLM large model,

reduce the reliability of the model, and also encounter

resistance in the process of commercial

implementation (Zhao, 2023; Zhao, 2024).

In order to reduce illusion and bias, solve the

problem of knowledge scarcity and information lag in

https://orcid.org/0009-0000-1072-6946

the specialized knowledge-intensive tasks of large-

scale models, and improve the accuracy of the

Chinese text during the question-and-answer process

of large-scale models. The current solution is to use

Retrieval-Augmented Generation (RAG) enhanced

retrieval technology (Chen, 2024; Lewis, 2020),

vertical integration of professional domain related

database, combined with appropriate embedding

model, text information into vector information. The

vector information of each text block is reordered to

meet the input requirements of large models (such as

using the BGE Ranker library to fine-tune the text),

enhance the relevance of context retrieval under the

professional domain retrieval function, and finally

find the text block matching the user's question in the

knowledge base. However, in this process, it is

possible to encounter the problem of small

interference in the user's query text affecting the

results. There will be insufficient data expression,

164

Zhang, Y.

Retrieval-Augmented Generation Solutions for Typical Application Process Issues.

DOI: 10.5220/0012917400004508

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence (EMITI 2024), pages 164-167

ISBN: 978-989-758-713-9

improper text division, resulting in the system cannot

find the appropriate context information caused by

retrieval difficulties. These are the current RAG

encountered resistance in the application process.

How to reasonably combine the existing retrieves and

large language models is also a problem that needs to

be considered. (Xiong, 2024)

In order to solve the above problems, this paper

will focus on the latest techniques and algorithms in

search queries, reordering of embedded vector

libraries, truncation strategy algorithms (Xu, 2024),

and evaluation methods in different fields based on

existing RAG enhanced retrieval. The workflow of

RAG is shown in Figure 1. The current methods that

can improve the retrieval speed, robustness and

generalization of RAG in the application process are

summarized. The types of RAG frameworks, the

latest algorithms for queries, and generators in

searchers are also discussed comprehensively.

Combined with the large language model, this paper

discusses the technical development direction of

RAG's future application prospects and explores its

potential commercial value in AI application and

question answering system.

Figure 1: The workflow of RAG (Photo/Picture credit:

Original).

1.1

Gradient

Guided Prompt

Perturbation (GGPP)

The retrieval part of RAG may generate erroneous

results due to small errors prompted by users. The

new algorithm, Gradient Guided Prompt Perturbation

(GGPP), will optimize this type of problem. The

encoder converts the user's text information into an

embedding vector and connects the prefix to the

query. The goal of prefix optimization algorithm in

GGPP is to change the paragraph ranking in Large

Language Models (LLMs) based on Retrieval

Augmented Generation (RAG) extract the correct

paragraph from the top-K search result and elevate

the target paragraph to the top-K result. This

algorithm achieves this by minimizing the distance

between the target paragraph embedding vector and

the input query embedding vector, while maximizing

the distance between the original paragraph

embedding and the query embedding. The prefix

optimization algorithm adjusts the embedded

coordinates to increase similarity with the target

coordinates. If the query embedding is not made

closer to the target specific point, restore the

adjustment (Hu, 2024).

1.2 Boolean Agent RAG Setups

The Boolean Agent RAG (BARAG) configuration

has the goal of improving token efficiency when a

language model needs to decide if querying a vector

database will lead to more precise and relevant

responses. This is achieved by integrating a boolean

decision-making step that aims to optimize the use of

the model's built-in knowledge for answering queries

without the need for unnecessary database retrievals.

This method will likely lead to substantial reduction

in token usage, especially in practical applications

where database-retrieved text usually consumes most

tokens (Kenneweg, 2024).

1.3

GENRT

In the retrieval technology section of RAG enhanced

retrieval technology, how to effectively handle

reordering and truncation is the key issue. These two

questions involve whether the document sorting

model fully understands the semantic context of the

text information, and whether the truncation strategy

directly affects the efficiency and quality of answer

generation when facing long texts. However, single

channel reordering and truncation models often lead

to error accumulation.

Therefore, the algorithm in GenRT shown in

Figure 2 focuses on combining dynamic reordering

and static truncation and executing them

simultaneously, and innovates two adaptive loss

functions as core algorithms: one is step adaptive

attention loss, which optimizes the attention score in

each document of the rearranged table by calculating

the cross entropy of attention distribution, and the

other is step by step lambda loss, which aims to build

a scoring matrix for each model step, helping the

model find the best truncation point and generate

higher relevant documents. The training method

updates the rating matrix by adding penalty terms to

documents in the sequence that do not decrease in

correlation (Xu, 2024).

Retrieval-Augmented Generation Solutions for Typical Application Process Issues

165

Figure 2: The workflow of GenRT (Photo/Picture credit: Original).

2 INDUSTRIAL APPLICATION

Due to the excellent performance of RAG technology

in knowledge intensive tasks, the biomedical question

answering field is also beginning to use RAG

technology. Researchers have found that RAG

technology can effectively address the hallucinations

and outdated issues exhibited by LLM in medical

question and answer tasks. So, the researchers used

the MIRAG evaluation benchmark MEDRAG toolkit

to conduct large-scale experiments using different

combinations of corpora, retrievers, and LLM. The

MIRAG evaluation benchmark includes four

enhanced retrieval methods, including RAG and five

common medical corpora: MMLU Med, MedQA-

US, MedMCQA, PubMedQA *, BioASQ-Y/N. The

MEDRAG toolkit mainly consists of three parts:

Corpora, Retrievers, and LLMs. Attempt to find the

optimal solution between these combinations in the

application scenarios of medical question answering.

In the end, it was found that MEDRAG improved the

accuracy performance of six different LLMs in the

test set by 18% compared to chain of thought

promotion. Among them, chatgt-4 is the LLM with

the best performance, but gpt-3.5 is affected by RAG

in terms of accuracy and performance, with the

largest increase in numerical values. MEDRAG has

shown great potential in enhancing LLM's Zero Shot

Learning ability to answer medical questions, which

may be a more effective choice than conducting

larger scale pre training. Although supervised fine-

tuning methods may be more suitable, MEDRAG

remains a more flexible and cost-effective approach

to improving medical Q&A (Xiong, 2024).

Figure 3: The application workflow of RAG (Photo/Picture

credit: Original).

3 DISCUSSION

The current application of RAG has been

implemented in knowledge-intensive fields, and in

the application of complex tasks, such as: graph

understanding and question answering, code

generation. Applications in special fields include:

medical Q&A, disaster summary, textbook Q&A, etc.

It is believed that RAG will further enhance the

EMITI 2024 - International Conference on Engineering Management, Information Technology and Intelligence

166

commercial value of LLM in knowledge-intensive

industries in the near future. Its ability to solve model

illusion, update model information base in real time

and keep it private all show its potential to improve

the accuracy, reliability and stability of question

answering field, and make the generated text results

more realistic. These will lead to it becoming an

inseparable part of the future private deployment of ai

products, such as enterprise private document review,

industry bid assistance writing and other customized

functions. The great promise of the business sector

also means that more vertical-specific assessment

methods or test datasets is required. At the same time,

the dynamic information field such as finance and

news media to establish a regular data update process

to meet the needs of the industry. This process can

automatically complete the extraction, analysis and

update of new data to meet the needs of the industry.

RAG enhanced retrieval technology also

encounters limitations. For example, traditional

vector retrieval cannot represent logical reasoning

connections due to its embeddedness and lacks real

relevance and thought chain. The deficiency of

knowledge base context and the loss of key

information in the process of compressing paragraph

vector into single vector will inevitably lead to the

problem of knowledge waste. Who will dominate

RAG Retrieval and Generation more, and whether

their performance in different fields will make their

emphases different, these questions will directly lead

to whether rag will be oriented towards search or

agent in the future technological development path.

The constraints or balance points between these

should be focused, these are still unsolved challenges.

Challenges come with new technological

possibilities. For example, the use of knowledge

graph embedding makes the generated results more

interpretable and logical upward compatible, so that

logical reasoning is more accurate. For the pain points

of insufficient context in the knowledge base, the

enhancement of context information by using more

efficient document parsing tools can be ensured to

add relevant metadata to each paragraph of text.

4 CONCLUSION

This paper mainly summarizes RAG's recent

algorithms for improving retrieval efficiency and the

impact of weak interference brought by improving

user prompts on generated results, as well as the

introduction and discussion of vector data reranking

and truncation strategies. The current RAG

enhancement was introduced to enhance the retrieval

robustness and enhance the explainability of the

query method. The RAG application in the current

medical question answering system is introduced, and

the performance difference of its hybrid training

method in the application side is presented. By using

the evaluation tool suitable for the field of medical

question answering, the combination of LLM and

retrieval device suitable for this field is obtained.

Finally, the commercial application prospect and

future research direction of RAG are discussed. There

is reason to believe that RAG enhanced retrieval

technology will become an important part of the

privatization ai deployment boom in the future.

REFERENCES

Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K.,

... & Xie, X. 2023. A survey on evaluation of large

language models. ACM Transactions on Intelligent

Systems and Technology.

Chen, J., Lin, H., Han, X., & Sun, L. 2024. Benchmarking

large language models in retrieval-augmented

generation. In Proceedings of the AAAI Conference on

Artificial Intelligence (Vol. 38, No. 16, pp. 17754-

17762).

Hu, Z., Wang, C., Shu, Y., & Zhu, L. 2024. Prompt

perturbation in retrieval-augmented generation based

large language models. arXiv preprint

arXiv:2402.07179.

Kasneci, E., Seßler, K., Küchemann, S., Bannert, M.,

Dementieva, D., Fischer, F., ... & Kasneci, G. 2023.

ChatGPT for good? On opportunities and challenges of

large language models for education. Learning and

individual differences, 103, 102274.

Kenneweg, T., Kenneweg, P., & Hammer, B. 2024.

Retrieval Augmented Generation Systems: Automatic

Dataset Creation, Evaluation and Boolean Agent

Setup. arXiv preprint arXiv:2403.00820.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V.,

Goyal, N., ... & Kiela, D. 2020. Retrieval-augmented

generation for knowledge-intensive nlp tasks.

Advances in Neural Information Processing

Systems, 33, 9459-9474.

Xiong, G., Jin, Q., Lu, Z., & Zhang, A. 2024.

Benchmarking retrieval-augmented generation for

medicine. arXiv preprint arXiv:2402.13178.

Xu, S., Pang, L., Xu, J., Shen, H., & Cheng, X. 2024. List-

aware reranking-truncation joint model for search and

retrieval-augmented generation. arXiv preprint

arXiv:2402.02764.

Zhao, P., Zhang, H., Yu, Q., Wang, Z., Geng, Y., Fu, F., ...

& Cui, B. 2024. Retrieval-Augmented Generation for

AI-Generated Content: A Survey. arXiv preprint

arXiv:2402.19473.

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y.,

... & Wen, J. R. 2023. A survey of large language

models. arXiv preprint arXiv:2303.18223.

Retrieval-Augmented Generation Solutions for Typical Application Process Issues

167