Innovation in Geriatric Care: An AI Assistant with LLM Integration

Based on Health Guidelines

Juliana Basulo-Ribeiro

1a,*

, Nádia C. G. Matos

2,3 b

, Sabrina Magalhães Araujo

2,3,4 c

Nuno Capela

2,3 d

, Francisco Bischoff

2,4,5 e

, Leonor Teixeira

and Ricardo Cruz-Correia

2,4,5 g

Department of Economics, Management, Industrial Engineering and Tourism (DEGEIT),

Institute of Electronics and Informatics Engineering of Aveiro (IEETA), Intelligent Systems Associate Laboratory (LASI),

University of Aveiro, Aveiro, Portugal

Department of Community Medicine, Information and Health Decision Sciences (MEDCIDS), Faculty of Medicine,

University of Porto, Porto, Portugal

Health Data Science Ph.D. Program, Faculty of Medicine of the University of Porto, Porto, Portugal

CINTESIS - Center for Health Technology and Services Research, Faculty of Medicine, University of Porto, Porto,

Portugal

VirtualCare, Portugal

Keywords: Large Language Model, Artificial Intelligence, Healthcare, Geriatric Care.

Abstract: With the advance of artificial intelligence and natural language processing technology, a new tool is standing

out in the field of understanding and generating natural language in a sophisticated way: the Large Language

Model (LLM). According to several authors, LLMs can be used for various types of medical cases, providing

access to different sources of information, and have opened up countless opportunities in the healthcare sector.

This work aims to share the lessons learned during the process of developing an LLM-based assistant aimed

at specific pathologies that are more prevalent in the elderly in order to support caregivers, whether they are

private individuals, home care organizations, nursing homes or others. This study has a significant potential

impact on the community by providing access to detailed information on developing an LLM-based assistant.

1 INTRODUCTION

Language is vital for human communication and self-

expression, as well as being fundamental for

interaction between people and machines (Naveed et

al., 2023). With the advance of artificial intelligence

(AI) and natural language processing technology, a

new tool is coming to the fore in the field of

understanding and generating natural language in a

sophisticated way: the Large Language Model

(LLM). Language modeling is a long-standing

research topic, dating back to the 1950s, and today,

LLMs have evolved from statistical language models

https://orcid.org/0000-0002-3411-3519

https://orcid.org/0009-0007-7841-6574

https://orcid.org/0000-0003-1443-2106

https://orcid.org/0009-0007-7841-6574

https://orcid.org/0000-0002-5301-8672

https://orcid.org/0000-0002-7791-1932

https://orcid.org/0000-0002-3764-5158

to neural language models, pre-trained language

models and finally to large-scale LLMs (Minaee et

al., 2024). Examples of LLMs include OpenAI's GPT

(Generative Pre-trained Transformer), Google's T5

(Text-To-Text Transfer Transformer) and Google's

BERT (Bidirectional Encoder Representations from

Transformers) (Brown et al., 2020; Devlin et al.,

2018; Raffel et al., 2019).

Smith et al. (2024) points out 10 rules for using

LLMs in science, as these AI tools are set to change the

way we operate in many areas: (1) Recognize that

LLMs are powerful, but they can make mistakes; (2)

Confirm the information generated by the models with

Basulo-Ribeiro, J., Matos, N. C. G., Araujo, S. M., Capela, N., Bischoff, F., Teixeira, L. and Cruz-Correia, R.

Innovation in Geriatric Care: An AI Assistant with LLM Integration Based on Health Guidelines.

DOI: 10.5220/0013096100003911

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2025) - Volume 2: HEALTHINF, pages 361-371

ISBN: 978-989-758-731-3; ISSN: 2184-4305

361

reliable sources; (3) Train the model with reliable data

to ensure that it is representative and free of bias; (4)

Clearly specify the context so that the answers are more

accurate; (5) Be aware of the biases present in the data

used to train the model, and take steps to mitigate them;

(6) Collaborate and share knowledge, experiences and

challenges to achieve best practices for preparing

LLMs; (7) Maintain transparency by disclosing the

methodology, tools and limitations when presenting

work that incorporates LLMs; (8) Consider the ethical

implications of using language models; (9) Use models

as a support tool, but not as a substitute for critical

thinking and rigorous scientific methods; (10) Actively

collaborate to improve language models and promote

responsible and ethical AI practices.

Today, we can already imagine a future world

where health care is monitored not only by physicians

and nurses but also by solutions that integrate AI,

capable of understanding, interpreting, and even

predicting health needs. This is the world where

LLMs can play an extremely vital role in healthcare.

Behind the scenes of the digital revolution, these

models are shaping a new era of healthcare, where

communication between patients and healthcare

professionals reaches unprecedented levels of

personalization and efficiency. This is a topic

discussed in the literature, where Mehandru et al.

(2024) mention in their study that: "Recent

developments in large language models (LLMs) have

unlocked opportunities for healthcare, from

information synthesis to clinical decision support.".

LLM can help to improve patient care and medical

diagnosis (Nassiri & Akhloufi, 2024)

According to several authors, LLMs can be used

for various types of medical cases, giving access to

different sources of information and tools, including

clinical guidelines, databases with electronic health

records, among others, which shows the possibility

and importance of developing LLMs for health

based on guidelines (Mehandru et al., 2024; Park et

al., 2023; Yang et al., 2023). Smith et al. (2024)

point out that LLMs have the potential to transform

science, but they also present significant challenges

and risks. Thus, evaluating LLMs becomes crucial

in all contexts; however, it is in the medical context

that evaluation is most critical (Nassiri & Akhloufi,

2024). The accuracy, reliability and effectiveness of

LLMs are essential to ensure that the information

provided can be used safely and effectively in

healthcare, and in an environment where decisions

can have a significant impact on people's lives,

rigorous evaluation of LLMs becomes not only

necessary but indispensable (Mehandru et al., 2024;

Yang et al., 2023). Nassiri and Akhloufi (2024)

mention the importance of training the model using

specific health documents, all to reduce the risk of

producing incorrect information. A significant

ethical concern with the use of these models is the

risk of perpetuating biases and inaccuracies in

medical data and information, which can severely

impact patient care (Nassiri and Akhloufi, 2024). As

Nassiri and Akhloufi (2024) point out, there are

many "technical and ethical issues that need to be

resolved before LLMs can be used extensively in the

medical field".

It is crucial to emphasize the need for a

collaborative and multidisciplinary approach to this

type of AI-based technology, with interaction

between different specialists, such as: technology

professionals, healthcare professionals, and health

regulators. This collaboration not only promotes

innovation in healthcare but also ensures that these

technologies are implemented responsibly,

maximizing patient benefits and minimizing potential

risks. (Cascella et al., 2024; Piñeiro-Martín et al.,

2023; Thapa & Adhikari, 2023)

Population aging is one of the most significant

demographic phenomena of the 21st century, and as

the elderly population grows, so do the challenges

associated with managing health conditions specific

to this age group. Among these conditions, non-

oncological pathologies are particularly prevalent and

require special attention from both caregivers and

healthcare professionals. Often, a lack of knowledge

about these diseases and the best practices for

preventing or managing them results in complications

and a deterioration in the quality of life (Maresova et

al., 2019; Sun & Li, 2023; World Health

Organization, 2015, 2022).

This paper aims to present a project to develop an

assistant based on LLMs, which uses health

guidelines as a basis for supporting the care of the

elderly. This initiative follows the recommendations

proposed by Nassiri and Akhloufi (2024) in their

study, mentioned earlier in this section.

Consequently, this article is structured as follows:

section 2 describes the practical case carried out and

the discussion; and at the end, section 3 presents the

final remarks, which include the conclusion, the

contributions of this work, the limitations and future

work for the advancement of LLMs.

2 PRACTICAL STUDY

2.1 Goals and Methods

This article analyses the knowledge acquired

HEALTHINF 2025 - 18th International Conference on Health Informatics

362

during the development of an assistant designed to

deal with specific non-oncological pathologies that

commonly affect the elderly. The main objective is to

provide adequate support to carers, whether they are

private individuals, home care organisations, nursing

homes or others. In addition, this paper explores the

challenges encountered in developing a prototype of

a large language model (LLM) adapted to the care of

the elderly. It also seeks to understand how the

development process can reveal wider obstacles that

any research can face when working with scientific

documentation, including the specificities of LLMs

and the training techniques required for their effective

application.

To analyse the lessons learnt during the

development of the assistant, a qualitative method

was adopted, based on a documentary review and

critical reflection on the team's experiences

throughout the project.

The methodology was divided into three main

stages:

i. Data collection: Review of documents and

records related to the LLM development

project, including planning documents and

reports/documents of findings over time. In

addition, informal interviews were conducted

with team members involved in developing

the assistant to gain insights into the

challenges faced, strategies adopted, and

lessons learned.

ii. Data Analysis: Organization and

categorization of the data collected,

highlighting the main areas of learning

identified during the development process.

iii. Summary and Discussion: A summary of the

analysis results, highlighting the main lessons

learned throughout the development process.

In addition, a critical discussion was held on

the implications of these learnings for future

projects of this scope and practice in general.

Health professionals and technology developers were

involved in this process to promote this project's

effective and responsible development.

2.2 Results

2.2.1 Description of the Assistant

The work developed, an assistant based on an LLM,

is an innovative tool designed to support caregivers

of the elderly by providing detailed information on

prevalent diseases and medical guidelines. It uses

different datasets from the Transparency Portal and

INE Pordata, which can be updated every six months

and every year. The model acts as a conversational

assistant, able to answer questions and offer specific

educational material aimed at the most common

pathologies and care among the elderly. Among the

main beneficiaries are nursing homes, parish

councils, pharmacies, patient associations and home

care providers.

To guarantee the accuracy and relevance of the

information, this research was based on reliable

sources such as the World Health Organization

(WHO), PubMed and Elsevier, which ensures the

project's high technical feasibility. The tool's

innovation is evident, not only in the way it

transforms complex data into useful answers but also

in its ability to offer caregivers continuous support

24/7.

Figure 1: LLM "infoassistant" desktop mock-up.

Figure 2: LLM "infoassistant" mobile mock-up.

The tool's relevance to external entities is also

highlighted, as it facilitates the dissemination of

knowledge and good practices, contributing to

improving care provided to the elderly. This

combination of innovation, technical feasibility and

positive impact makes this project a valuable solution

Innovation in Geriatric Care: An AI Assistant with LLM Integration Based on Health Guidelines

363

in the field of geriatric care. Figure 1 and Figure 2

show a prototype of the natural language conversation

assistant that we intend to make available at the end

of the project in desktop and mobile versions,

respectively.

2.2.2 Approach to Preparing the Assistant

This subsection aims to present the action plan

conducted from the decision to develop this project to

its preparation and availability. The plan for this

project was divided into several work packages (WP)

to make it easier to manage and execute them,

allocating the different resources to each one.

WP1: Data Collection (Search and Download

of Guidelines) The first WP in the initial phase,

focused on searching for and downloading health

guidelines related to non-oncological pathologies

prevalent in the elderly population, using specialized

websites such as PubMed. These documents, which

are generally in PDF format, constitute the database

for the project. At a later stage, this task was

automated using Python technology, allowing

relevant documents to be identified, downloaded and

stored periodically. This systematic process, in line

with the requirements identified, reduces the

likelihood of errors and does not require human

intervention. The methodology involved creating

specific queries, processing the results, extracting

relevant data and storing the information. Figure 3

illustrates the workflow of this WP, with the 'Search'

stage representing the process of searching for

guidelines using MeSH (Medical Subject Headings)

terms; the 'Access to Research Databases' stage

returns the results of the previous task and identifies

the most relevant documents. The public API

(Application Programming Interface ) of the NCBI

Entrez system (https://www.ncbi.nlm.nih.gov/

books/NBK25501/ ) was utilized, providing access to

a wide range of Entrez databases, including PubMed,

PMC, Gene, Nuccore and Protein. The third stage,

Figure 3: Schematic representation of WP1.

'Download and Store', represents the process of

downloading and storing the guidelines, as well as

extracting their metadata. Finally, the ‘Structuring

Metadata’ stage aims to organize the metadata

according to the structure defined for the inputs of the

next work package, WP2.

WP2: Processing the Guidelines The second

WP aims to analyse and convert the PDF documents

collected (during the WP1 process) into text

(questions and answers) to extract specific

information from these documents. Group the

guidelines by disease, considering the International

Statistical Classification of Diseases and Related

Health Problems, 10th Revision (ICD-10) system and

for each disease coding, symptoms, signs, abnormal

findings, complaints, social circumstances and

external causes of injury or disease. With this set of

information, a range of questions and answers was

identified, which would serve as the basis for the

following WP.

WP3: Develop and Refine the AI Assistant

Considering that the assistant was designed to answer

questions about non-oncological pathologies

prevalent in the elderly population, based on the

guidelines collected and processed in the previous

work packages, this stage involved setting up a

workspace in the AnythingLLM (AnythingLLM,

2024) tool, a framework to facilitate designed to

facilitate interaction with natural language models,

delivering precise answers. Various approaches were

explored and combined during this stage, such as

fine-tuning, transfer learning, RAG (Retriever-

Augmented Generation), or even a combination of

these techniques. These methods, defined in detail

later in this document, allowed for the refinement and

adjustment of the assistant to ensure it could provide

useful and accurate answers to questions related to

non-oncological pathologies in the elderly population

based on the guidelines collected and processed. The

outcome of this WP is an assistant capable of

delivering relevant and precise responses. Another

output of this WP is an API that allows other systems

to access the developed assistant.

WP4: Making the Assistant Available The

last WP of the project involved developing an

interface that makes the assistant available, allowing

users to interact directly with it. This interface would

invoke the API developed in the previous WP to

allow the assistant to be incorporated into another

system, making it possible to clarify doubts, always

based on the health guidelines collected, thus

providing a useful tool for both caregivers,

professionals in the field and even the elderly

themselves.

HEALTHINF 2025 - 18th International Conference on Health Informatics

364

Table 1 summarizes this information, and Figure

4 illustrates it, considering the sequence of events

over the 4 WPs.

Table 1: Summary of each work package.

Work

Packages

Objective

Considerations/

Outputs

WP0: Project

Management

WP1: Search

and download

guidelines

i. find/identify the

guidelines;

ii. download the

guidelines;

iii. Put the guidelines

into a paste;

iv.Generate a list of

documents in the

folder with

information about

the guidelines;

v.Extra task:

Identify the main

illnesses and

needs of the

elderly within the

project's scope.

• Folder with

guidelines;

• List with

Timestamp, DOI,

Name, URL

• Various formats:

.pdf, .doc, .txt

WP2: Edit the

guidelines

i. Define the

variables to

extract;

ii. Extract the data

(the result is a

table);

iii. Translate the data

into a

question/answer

format (table).

• List of data by

guideline - 1st

phase in EN

• Context:

Population; gender,

ethnicity, age, etc...

• Decision

algorithm: what the

guideline proposes

as a solution

• Question/Answer

format - 1st phase

in EN

WP3:

Develop and

refine the AI

assistant

Use the WP2 data

and refine LLM with

the

questions/answers.

The output/result is a

service that, when

invoked, establishes

the dialog.

• API to send a

question and

receive a reply

• API for dialog

Figure 4: Schematic illustration of the LLM development

project.

2.2.3 Lessons Learned

This section aims to share lessons learned throughout

the development process of this project. By 'lessons

learned,' we mean the challenges identified during the

execution of each WP of the project and the

methodologies that were applied to address them,

including both the strategies that proved effective in

solving the problem and those that did not achieve the

expected results.

WP1: Search and Download Guidelines

• Efficient Research

Creating specific queries using MeSH terms ensures

that all relevant documents are identified, but failure

to improve them can jeopardize the quality and

relevance of the results returned.

• Data Processing

The conversion of content in .pdf format to text (i.e.,

encoding, special characters), tables, flowcharts and

figures sometimes leads to a lack of precision in the

information extracted, which can be accentuated by

different document formats and structures. This lack

of precision, although not completely unknown, takes

on greater relevance when we talk about the health

sector since inaccurate information can lead to advice

and/or decision-making that jeopardizes the health

and safety of patients. To mitigate this situation, it is

important to develop efficient algorithms to extract

and organize the necessary metadata automatically,

minimizing human intervention and errors. In

addition, it is necessary to define an effective method

for segmenting the data, such as the Chunking method

(Dozza et al., 2013). The method must divide the

content in such a way as to maintain the connection

between data that shares the same context; for

example, when extracting information from a health

document, the algorithm must automatically identify

and group together sections relating to the same

patient, diagnosis or treatment, ensuring that the

information extracted is accurate and contextualized.

• Storage and Organization

Storing the guidelines as well as their metadata in an

organized and accessible way proved to be a major

challenge, due to the periodic updating of the data

without redundancies, maintaining the integrity and

timeliness of the information.

• Tokens

Another challenge faced was the fact that many

research databases use tokens in the process of

downloading guidelines, and access tokens usually

have a limited lifetime, expiring after a certain period.

Automating this flow becomes complicated as this

Innovation in Geriatric Care: An AI Assistant with LLM Integration Based on Health Guidelines

365

requires tokens to be obtained and renewed

frequently. In addition, there may also be usage

limitations, i.e., limitations on the number of requests

that can be made within a certain period, which

requires careful management to ensure that the

defined limits are not exceeded. Thus, automating

tasks that depend on tokens adds a layer of

complexity to the data collection process. Managing

the validity and security of tokens, automating

renewal and preparing the system for potential

changes to third-party systems are essential strategies

for mitigating these types of challenges.

WP2: Edit the Guidelines

• Additional Section in the Guidelines

Since health guidelines do not have a standard

structure, we propose adding a section with specific

fields to improve their usefulness and make them

easier for algorithms to understand. By incorporating

these additional fields, the models could more easily

interpret the health guidelines, allowing them to be

better trained. This would ensure access to reliable

information in a clear and effective way. These fields

could include:

o Summary: A summary of the main

recommendations and conclusions of the guideline,

allowing a quick understanding of its content.

o Content Overview: A more detailed description

of the topics covered in the guideline, highlighting the

areas of focus and the main information provided.

o Purpose and Usefulness: A clear explanation of

the objectives of the guideline and how it can be used

in clinical practice or even in the reformulation of

health policies.

o Who is it for: An identification of the health

professionals, managers, researchers or other

stakeholders for whom the guideline has been

developed, helping to ensure that it is aimed at the

right audience.

o Structure of the Guideline: A description of how

the guideline is organized, including main sections,

subsections and how to navigate through the

document.

o Frequently Asked Questions: A list of common

questions about the guideline and its answers,

addressing concerns that users may have when

implementing its recommendations.

• Versions in the Guidelines

In addition, these guidelines should have clearly

identified versions so that algorithms and models can

easily check whether they are using the latest version

in order to ensure maximum effectiveness and

relevance of the content provided. Including a

versioning system would help ensure that the

information and recommendations are always up to

date, reflecting the most up-to-date information. This

approach would also make it possible to verify

changes over time, making it easier to understand

evolutions and updates in the guidelines. To do this,

the guidelines could have a section or be

accompanied by a document where it is possible to

have a summary of what has been

added/changed/removed, a "release note". This would

not only make it easier to compare the different

versions but would also help answer questions about

changes in procedures.

WP3: Train the AI Assistant

• Versions of LLM Models and Software

Version management is crucial in software and LLM

models to maintain a clear timeline of its

development and the functionalities that make it up.

Each version should record changes in functionality,

performance and bugs fixed. This makes it possible to

compare different versions to assess performance

improvements directly. This practice helps maintain

and update systems, provides transparency for end

users, and facilitates the reproduction of results in

scientific research.

• Tool Used to Develop the Wizard

For the project's development, the AnythingLLM

(AnythingLLM, 2024) tool was chosen because it is

an open-source platform offering significant

flexibility for customization and integration into

several projects. This solution supports various

language models, including proprietary models, such

as GPT-4, and open-source models, such as Llama

and Mistral (https://useanything.com/). It includes a

graphical user interface that facilitates interaction

with the language model, integration with various

data sources used to train and refine the models, and

customization based on the specific needs of your

application, all easily and intuitively. With

AnythingLLM, there is no need to worry about

privacy issues, a relevant point given that we are in

the health sector, as it already implements measures

to ensure that user data is treated securely and

privately.

• Settings at AnythingLLM

Within the AnythingLLM tool, there are several

optional parameters that, while not mandatory, should

be adjusted or defined, such as the model to be used

(explained in the next few topics), to customize the

model's behavior based on the specific needs of each

project. Below are some of the settings available in

the desktop version that can impact the model's

HEALTHINF 2025 - 18th International Conference on Health Informatics

366

behavior. Figure 3 shows a screenshot of the

workspace settings.

o Chat Mode: The "Chat" option, when active,

sends the model the entire context of the

conversation, i.e., the history of the conversation

in the window. The "Query" option sends only

the question asked by the user. Although context

is important, it also contributes to performance

degradation, so evaluating the objective and

finding a balance is important.

o Chat History: This is the number of

questions/answers that will be sent as a history to

the model whenever we are in "Chat" mode.

o Prompt: Here, we should put relevant

information and the instructions, in order to

contribute to improving the performance of our

assistant. For example, give a summary of the

content of the documents that will serve as a

knowledge base for the model we are using.

o LLM Temperature: This parameter relates to the

creativity of the LLM. It is set to a value between

0 and 1. The higher the value, the more creative

the model's responses will be. By default, the

software uses a value of 0.7, which is

recommended by most models. In the test phase,

we used lower values since we are in the health

field. However, the answers provided by the

model were very literal, lacking the

personalization and naturalness of natural

language. For this reason, we opted to restore the

default value.

Figure 5: Workspace settings for the desktop version of

AnythingLLM.

• Testing the Different Existing Models

The ability to quickly test different LLM models is

essential for iterating and improving the quality of

proposed solutions. Tools such as LM Studio

(Element Labs, 2023) make it possible to conduct

these tests quickly and intuitively, offering user-

friendly interfaces and detailed analysis of model

performance locally (on the machine). These tools

can be configured to run a series of tests, such as

assessing language comprehension, generating text

and the ability to answer questions, which provides

important metrics on the usefulness and quality of

each model tested. All of this makes it possible to

make better choices about which model(s) to use,

depending on the purpose you want to achieve since

the models are built for specific purposes, which

makes them more accurate for the subjects they were

developed for.

Another issue with models is that you must bear

in mind that there are models that only perform well

and efficiently in a specific language (e.g., EN)

because they have been built and trained in that

language and are, therefore, not multilingual. That is

why it is important to test different models so that you

can choose based on your use case.

• Choosing a Model

At an early stage of the project, a document was

drawn up with the pros and cons of the different

models available, such as Falcon - Technology

Innovation Institute (TII) in Abu Dhabi; Ihama 2 -

Meta; Bloom; MPT-7B - MosaicML; Vicunha-13B -

LMSYS ORG; GPT-3 - OpenAI; BERT; among

others. However, after choosing the tool, an

exhaustive analysis was also carried out of all the

existing models in the Desktop version of

AnythingLLM up to the date of development (May

2024), such as: LLaMA3 (Uncensored); LLaMA2

13B; LLaMA 3; CodeLlama 7B; Mistral 7B; Mistral

8x7B; Gemma 2B; Gemma 7B; Phi-2; Orca-Mini;

Orca-Mini 7B; Orca-Mini 13B; among others.

When choosing the model to use in LLM to

generate answers based on health guidelines

regarding the care of the elderly and the most

common pathologies in that age group, several factors

need to be taken into account, such as: the size of the

model, the ability to handle domain-specific

information and whether or not the model has been

optimized for health applications, as they may not

have the controls or training necessary to handle

sensitive and accurate medical/health information. In

addition, it is important to ensure that any model used

complies with local and international health data

privacy regulations, such as GDPR and HIPAA,

where applicable. Given the information found, it was

decided to choose the Orca-Mini 13B model.

• Orchestrating the Solution

Orchestration in LLM solutions involves managing

and automating various tasks and services that

comprise the LLM infrastructure, such as model

training, testing, and model deployment. Using

orchestration software like Flowise (Flowise, 2024)

allows you to manage these tasks efficiently. In this

case, Flowise was used for this purpose. This type of

Innovation in Geriatric Care: An AI Assistant with LLM Integration Based on Health Guidelines

367

solution also allows different models to be used in the

same solution, with questions being directed to the

model with the best performance for the specific task.

This ensures that the solution takes full advantage of

the different capabilities of each model, maximizing

the accuracy and efficiency of the answers provided.

• Model Training

An unclear concept in this type of project is the issue

of model training. In the context of this project, the

term "training" will be used in a broad sense, referring

both to the adjustment of model parameters (as in

Fine-Tuning) and to the adaptation of the model

through other techniques and approaches, which will

be detailed later in this document. Since a pre-trained

model was used, it was necessary to understand which

techniques and/or approaches were used to adapt the

model to the dataset, the guidelines.

Fine-Tuning is a technique that can be applied in

these situations where pre-trained models are used. It

involves further training the model using a specific

dataset to adjust and optimize its performance for a

particular task. This is achieved by adjusting the

model's parameters with an additional specific

dataset, the guidelines. Fine-tuning the base model

with specific data from the collected health guidelines

ensures that the model understands and responds to

the nuances and specificities of these guidelines. To

implement this technique, machine learning

frameworks such as PyTorch or TensorFlow are

recommended for adjusting the model. Libraries such

as Hugging Face Transformers are also very helpful

in this process.

Another technique is Transfer Learning, where

the model pre-trained on a specific task is adapted to

another related task. Instead of training the model

from scratch, the knowledge gained during the initial

training is reused, saving time and computational

resources.

Retrieval-Augmented Generation (RAG),

although not a training technique in the traditional

sense, is a powerful approach to improving the quality

of answers generated by language models during

inference, especially in question-and-answer systems.

It combines information retrieval and text generation.

Rather than relying solely on the language model to

generate answers, this approach first retrieves

relevant documents from a database and then uses this

information to generate more accurate and

contextually relevant answers. This enhances the

accuracy and relevance of responses, as the model

does not just rely on its prior training but also uses up-

to-date, context-specific information, ensuring that

answers are always based on the most relevant

documents by retrieving these guidelines before

generating an answer. When combined with training

techniques such as Fine-Tuning and Transfer

Learning, RAG can contribute to a highly effective

and accurate assistant, capable of providing informed

and relevant answers based on specific guidelines.

It is therefore important to understand that

“training” an LLM in this project involves several key

considerations, and the choice of the appropriate

techniques and approaches depends on various

factors, including the project's objectives, available

resources and the type of data. Below is a table

summarizing the techniques and their applicability.

Table 2: Summary of training techniques and their

applicability.

Technique Description Application

Transfer

Learning

Use a pre-trained

model as a starting

point and adjust it for

the specific task.

• When you have a

limited set of

specific data.

• To save time and

computing resources

by using pre-trained

models.

Fine-Tuning

Further train a pre-

trained model with a

specific data set to

improve performance

on a specific task.

• When you have

domain-specific data

and want to adapt a

pre-trained model to

improve accuracy

and relevance.

Supervised

Learning

Training a model

with labeled data,

with the goal of

learning a function

for mapping inputs to

outputs.

• When you have a

large set of labeled

data.

• For tasks such as

text classification,

answering specific

questions, or

sentiment analysis.

Unsupervise

d Learning

Training a model

with unlabeled data

to find hidden

patterns or structures.

• When the data is not

labeled.

• For tasks such as

clustering or

dimensionality

reduction.

Semi-

Supervised

Learning

Combining labeled

and unlabeled data to

improve model

performance.

• When you have a

limited amount of

labeled data but a

large amount of

unlabeled data.

• To increase model

accuracy with less

labeling effort.

Reinforceme

nt Learning

Training a model

where they learn to

make decisions

through rewards and

punishments.

• For tasks that

involve a sequence

of decisions.

WP4: Make Assistant Available

• Summoning the Assistant

Considering the specific characteristics of the

AnythingLLM framework, it is important to note that

invoking the wizard is only possible when using the

HEALTHINF 2025 - 18th International Conference on Health Informatics

368

contained version of the tool. Generally, the desktop

version is used by a single user or in an individual

development environment, where there is no room for

the scalability and management of multiple instances

required in a productive environment. So, given that

the aim was to enable multi-user system execution

and to "publish" the workspace on websites or other

platforms, we decided to replicate the development

process described in WP3 with the Docker version of

the tool. Throughout the process, it was possible to

see some differences, both in terms of the settings

available and in terms of customization, with greater

flexibility without significant impacts in terms of

complexity for the user.

Another point is that the tool incorporates a chat

widget that allows the workspace and its built-in

knowledge base to be readily displayed on a website.

Figure 6: An example of a script tag embeds.

2.3 Discussions

This article aims to report on work in progress to

create a health assistant based on guidelines that

support an LLM. This effort corroborates the idea put

forward by several authors that it is possible to create

virtual health assistants based on reliable information

(Mehandru et al., 2024; Park et al., 2023; Yang et al.,

2023).

Advances in innovative LLM technology have

shown significant potential to transform medical

practice by providing quality and accessible care to

healthcare professionals and patients, as noted by

several authors in the literature (Mehandru et al.,

2024; Nassiri & Akhloufi, 2024). However, due to the

sensitive and critical nature of the healthcare sector in

which they operate, it is imperative that these

assistants are rigorously assessed at a clinical level. In

their study, Nassiri and Akhloufi (2024) highlight the

importance of this clinical assessment for any AI-

based healthcare tool. In line with this thinking, our

project recognizes the need to carry out this

assessment to validate the development of this

healthcare assistant.

Another relevant point to highlight is the need to

continually train the model with new data and

updated guidelines to maintain the assistant's

relevance and accuracy, a critical component for the

long-term success of any AI tool in healthcare.

Finally, it is worth noting the importance of

multidisciplinary collaboration between technology

developers, healthcare professionals and regulatory

authorities to ensure that the development and

implementation of LLM-based healthcare assistants

are carried out in a responsible manner, as mentioned

by several authors in the literature (Cascella et al.,

2024; Piñeiro-Martín et al., 2023; Thapa & Adhikari,

2023). This collaborative approach safeguards patient

safety and well-being.

3 FINAL REMARKS

3.1 Conclusion

Today, significant progress has been made in the field

of LLMs. This progress results from the growing

demand for innovative solutions to combat many

daily challenges. These advances aim not only to

improve the performance of LLMs in challenging

tasks but also to explore new possibilities for

previously unexplored applications, which will have

a transformative impact on various areas of

knowledge.

The final product resulting from this work, which

will be presented to users, is more like a movie than a

static image due to the dynamic nature of LLMs.

Unlike an image, which remains the same over time,

LLMs are designed to learn and adapt through use and

interaction with users. This allows for continuous

improvement of the responses it provides.

In developing the assistant presented in this study,

the processes were initially conducted manually,

reflecting a more exploratory initial phase. As the

project progressed, there was a natural evolution

towards automation. This "maturation" was

evidenced by the progressive implementation of

automated systems in each WPs. This transition made

it possible to: optimize efficiency, reduce the time and

effort required, and increase the accuracy and

consistency of the results achieved.

3.2 Contributions and Implications

This study has a significant potential impact on the

community by providing access to detailed

information on how to prepare an assistant based on

LLMs. This resource is extremely useful today, not

only in the health area but in all sectors of activity.

This LLM is an important tool to help people in their

Innovation in Geriatric Care: An AI Assistant with LLM Integration Based on Health Guidelines

369

daily lives, facilitating access to relevant health

information quickly and effectively.

3.3 Limitations and Future Work

One of the limitations of this study is that it has not

been clinically validated. Therefore, in future work,

we would like to evaluate LLM clinically. The current

stage of development serves as an opportunity to

conduct tests and consequent improvements, which

will guarantee the effectiveness of the assistant when

it is introduced to the market.

ACKNOWLEDGEMENTS

This work was supported by Portuguese funds

through the Institute of Electronics and Informatics

Engineering of Aveiro (IEETA) (UIDB/00127/2020)

funded by national funds through FCT - Foundation

for Science and Technology. This work has also been

done under the scope of – and funded by – the Health

Data Science Ph.D. Program of the Faculty of

Medicine of the University of Porto, Portugal –

heads.med.up.pt. The authors are thankful to the

Program and its faculty for investing in their students

and funding the open-access publication of their

research. Additionally, this article was developed

under the Mobilizing Agenda No. 41 - HfPT –

HEALTH FROM PORTUGAL, co-funded by the

PRR and the European Union through the Next

Generation EU mechanism.

REFERENCES

AnythingLLM. (2024). AnythingLLM.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan,

J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G.,

Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G.,

Henighan, T., Child, R., Ramesh, A., Ziegler, D. M.,

Wu, J., Winter, C., … Amodei, D. (2020). Language

Models are Few-Shot Learners. http://arxiv.org/abs/

2005.14165

Cascella, M., Semeraro, F., Montomoli, J., Bellini, V.,

Piazza, O., & Bignami, E. (2024). The Breakthrough of

Large Language Models Release for Medical

Applications: 1-Year Timeline and Perspectives. In

Journal of Medical Systems (Vol. 48, Issue 1). Springer.

https://doi.org/10.1007/s10916-024-02045-3

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018).

BERT: Pre-training of Deep Bidirectional

Transformers for Language Understanding.

http://arxiv.org/abs/1810.04805

Dozza, M., Bärgman, J., & Lee, J. D. (2013). Chunking: A

procedure to improve naturalistic data analysis.

Accident Analysis and Prevention, 58, 309–317.

https://doi.org/10.1016/j.aap.2012.03.020

Element Labs, Inc. (2023). LM Studio (4).

Flowise. (2024). Flowise.

Maresova, P., Javanmardi, E., Barakovic, S., Barakovic

Husic, J., Tomsone, S., Krejcar, O., & Kuca, K. (2019).

Consequences of chronic diseases and other limitations

associated with old age - A scoping review. BMC

Public Health, 19(1). https://doi.org/10.1186/s12889-

019-7762-5

Mehandru, N., Miao, B. Y., Almaraz, E. R., Sushil, M.,

Butte, A. J., & Alaa, A. (2024). Evaluating large

language models as agents in the clinic. In npj Digital

Medicine (Vol. 7, Issue 1). Nature Research.

https://doi.org/10.1038/s41746-024-01083-y

Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M.,

Socher, R., Amatriain, X., & Gao, J. (2024). Large

Language Models: A Survey. http://arxiv.org/

abs/2402.06196

Nassiri, K., & Akhloufi, M. A. (2024). Recent Advances in

Large Language Models for Healthcare.

BioMedInformatics, 4(2), 1097–1143. https://doi.org/

10.3390/biomedinformatics4020062

Naveed, H., Khan, A. U., Qiu, S., Saqib, M., Anwar, S.,

Usman, M., Akhtar, N., Barnes, N., & Mian, A. (2023).

A Comprehensive Overview of Large Language

Models. http://arxiv.org/abs/2307.06435

Park, J. S., O’Brien, J., Cai, C. J., Morris, M. R., Liang, P.,

& Bernstein, M. S. (2023, October 29). Generative

Agents: Interactive Simulacra of Human Behavior.

UIST 2023 - Proceedings of the 36th Annual ACM

Symposium on User Interface Software and

Technology. https://doi.org/10.1145/3586183.3606763

Piñeiro-Martín, A., García-Mateo, C., Docío-Fernández,

L., & López-Pérez, M. del C. (2023). Ethical

Challenges in the Development of Virtual Assistants

(Switzerland), 12(14). https://doi.org/10.3390/

electronics12143170

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S.,

Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2019).

Exploring the Limits of Transfer Learning with a

Unified Text-to-Text Transformer

. http://arxiv.org/

abs/1910.10683

Smith, G. R., Bello, C., Bialic-Murphy, L., Clark, E.,

Delavaux, C. S., Lauriere, C. F. de, Hoogen, J. van den,

Lauber, T., Ma, H., Maynard, D. S., Mirman, M.,

Lidong, M., Rebindaine, D., Reek, J. E., Werden, L. K.,

Wu, Z., Yang, G., Zhao, Q., Zohner, C. M., &

Crowther, T. W. (2024). Ten simple rules for using

large language models in science, version 1.0. PLoS

Computational Biology, 20(1). https://doi.org/10.

1371/journal.pcbi.1011767

Sun, X., & Li, X. (2023). Editorial: Aging and chronic

disease: public health challenge and education reform.

Front Public Health, 16(2), 107–118.

https://doi.org/10.3389/fpubh.2023.1175898

HEALTHINF 2025 - 18th International Conference on Health Informatics

370

Thapa, S., & Adhikari, S. (2023). ChatGPT, Bard, and

Large Language Models for Biomedical Research:

Opportunities and Pitfalls. In Annals of Biomedical

Engineering (Vol. 51, Issue 12, pp. 2647–2651).

Springer. https://doi.org/10.1007/s10439-023-03284-0

World Health Organization. (2015). World report on

ageing and health.

World Health Organization. (2022). Ageing and health.

Yang, H., Yue, S., & He, Y. (2023). Auto-GPT for Online

Decision Making: Benchmarks and Additional

Opinions. http://arxiv.org/abs/2306.02224.

Innovation in Geriatric Care: An AI Assistant with LLM Integration Based on Health Guidelines

371