Multi-Agent AI System for Adaptive Cognitive Training in Elderly Care

Isabel Ferri-Molla

, Jordi Linares-Pellicer

, Carlos Aliaga-Torro

and Juan Izquierdo-Domenech

Valencian Research Institute for Artiﬁcial Intelligence (VRAIN), Universitat Polit

ecnica de Val

encia (UPV) Cam

ı de Vera,

{isfermol, jorlipel, calitor, juaizdom}@upv.es

Keywords:

Multi-Agent System, Artiﬁcial Intelligence, Large Language Models, Cognitive Training, Elderly Care.

Abstract:

The accelerated ageing of the global population presents signiﬁcant societal and healthcare challenges, par-

ticularly concerning cognitive decline in older adults. This paper introduces a multi-agent system designed

to stimulate and preserve cognitive abilities in elderly users through personalized exercises tailored to their

needs. The proposed system integrates a suite of specialized AI agents: Teacher, Critic, Conciliator, Per-

formance Evaluator, and Psychologist, each fulﬁlling speciﬁc roles to generate, validate, and adapt cognitive

exercises collaboratively. The system establishes a self-correcting feedback loop that mitigates errors and re-

duces hallucinations through multi-agent consensus mechanisms by employing specialized LLM-based agents

for generation, critique, evaluation, and emotional assessment. This approach enhances inference depth and

ensures the generation of reliable exercises and dynamic feedback. Two interaction modes, voice-based and

text-based, are implemented using state-of-the-art speech recognition and synthesis technologies, enhancing

accessibility for users with varying preferences and abilities. A user study evaluated the system’s usability and

effectiveness. Results indicate that the multi-agent architecture enhances cognitive engagement and provides

a personalized user experience. The system demonstrated efﬁcacy in addressing diverse cognitive needs, high-

lighting its potential as an adaptable tool for cognitive training in elderly care.

1 INTRODUCTION

Global ageing represents one of society’s most sig-

niﬁcant challenges in the coming decades. As life

expectancy increases worldwide, the demographic

shift towards an older population introduces signif-

icant economic, social, and healthcare implications.

This growing population segment requires innovative

approaches to address age-related challenges, such

as physical decline, cognitive deterioration, and in-

creased dependency while fostering well-being and

active participation in society. Among these chal-

lenges, preserving independence, enhancing qual-

ity of life, and maintaining cognitive functionality

emerge as top priorities.

In response to these needs, technology has be-

come crucial in developing solutions to support older

adults. Advances in digital tools, assistive technolo-

gies, and artiﬁcial intelligence (AI) enable new ap-

https://orcid.org/0009-0008-3608-9891

https://orcid.org/0000-0002-3315-1716

https://orcid.org/0009-0001-0426-886X

https://orcid.org/0000-0003-0076-7001

proaches to care, rehabilitation, and the prevention

of age-related decline. Technology not only facil-

itates better care for dependent individuals but also

empowers older adults to lead healthier and more au-

tonomous lives. Solutions such as telehealth plat-

forms, wearable devices, and smart home systems are

transforming traditional care models by offering re-

mote monitoring, personalised support, and early de-

tection of health-related issues (Wu et al., 2024), (Fer-

guson et al., 2021).

A key area of focus in this technological evo-

lution is cognitive health. Cognitive decline is one

of the most signiﬁcant concerns associated with age-

ing, as it directly impacts memory, decision-making,

and overall brain functionality. Cognitive exercises

are a proven strategy for maintaining mental sharp-

ness and mitigating the effects of ageing on the brain.

However, traditional approaches to cognitive stimula-

tion often need more personalisation and adaptability,

limiting their efﬁcacy and user engagement. This is

where AI emerges as a transformative tool.

With the capabilities of AI, it is now possi-

ble to design and implement cognitive exercises tai-

Ferri-Molla, I., Linares-Pellicer, J., Aliaga-Torro, C. and Izquierdo-Domenech, J.

Multi-Agent AI System for Adaptive Cognitive Training in Elderly Care.

DOI: 10.5220/0013397000003890

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 17th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2025) - Volume 1, pages 937-947

ISBN: 978-989-758-737-5; ISSN: 2184-433X

937

lored to the unique needs of individuals. AI sys-

tems can assess users’ cognitive abilities, adapt exer-

cises in real-time, and provide personalised feedback

that enhances engagement and effectiveness (Garcia-

Betances et al., 2015). AI systems can also offer a

deeper level of user interaction by incorporating emo-

tional intelligence. These systems can also analyse

the emotional tone of users, adjusting interactions dy-

namically to prevent frustration and foster a support-

ive, positive experience (Picard, 2010).

Despite the promising potential of AI-driven so-

lutions, challenges remain. Large language models

(LLMs) and foundational AI models, integral to many

of these tools, have limitations. These models are

prone to errors, such as generating inaccurate or ir-

relevant information, a phenomenon commonly re-

ferred to as ”hallucinations.” In the context of health-

care and cognitive support for older adults, such er-

rors could undermine trust, effectiveness, and safety.

Addressing these challenges requires robust solutions

that enhance the reliability and accuracy of AI sys-

tems (Moor et al., 2023).

Agentic systems offer a promising approach to

overcoming these limitations. By leveraging a collab-

orative framework of specialised AI agents, tasks can

be distributed across domains such as medicine, psy-

chology, education, and system coordination. Each

agent contributes expertise to a speciﬁc aspect of the

user experience, ensuring that outputs are accurate,

coherent, and highly personalised. This distributed ar-

chitecture not only mitigates the risk of errors but also

enhances the overall effectiveness of the intervention.

For example, psychological agents can determine the

user’s emotional states during exercises and adjust

the interface accordingly, while multi-agent system

components ensure integration and coordination of all

functionalities.

2 LITERATURE REVIEW

The fast growth of the global elderly population

has led to an increased prevalence of dementia and

age-related cognitive decline, posing signiﬁcant soci-

etal and healthcare challenges (Gates and Valenzuela,

2010). Cognitive decline often manifests in deterio-

ration of memory, attention, language, and executive

functions, undermining independence and quality of

life. Consequently, interventions that slow or mitigate

cognitive deterioration have gained substantial impor-

tance. Traditional cognitive training programs, rang-

ing from paper-and-pencil exercises to computerized

brain games, have shown promise in improving or

maintaining cognitive functions in older adults (Kelly

et al., 2014) (Kueider et al., 2012). However, the efﬁ-

cacy of these interventions remains a topic of debate,

as evidence for their long-term cognitive beneﬁts is

mixed and often limited by methodological concerns

(Simons et al., 2016). Moreover, these conventional

approaches typically offer limited adaptability, per-

sonalization, and scalability, reducing their long-term

engagement and overall efﬁcacy.

Recently, AI has emerged as a powerful tool to en-

hance dementia care and cognitive training method-

ologies. AI-driven analytics have signiﬁcantly im-

proved early detection and diagnosis of cognitive im-

pairments. Machine learning (ML) and Deep learning

(DL) models can integrate socio-demographic data,

neuroimaging results, speech patterns, and longitudi-

nal cognitive assessments to identify early markers of

dementia, thus enabling timely interventions (Graham

et al., 2020) (Li et al., 2015). These predictive mod-

els enhance diagnostic accuracy and reﬁne treatment

planning, guiding clinicians toward tailored strategies

that may delay disease progression and improve pa-

tient outcomes.

Beyond detection, AI-driven personalization has

revolutionized the delivery of cognitive interventions.

Adaptive systems that learn from user performance

and preferences over time are gradually replacing tra-

ditional one-size-ﬁts-all methods. This adaptability is

particularly evident in multi-agent AI systems, where

multiple specialized agents collaborate to design, val-

idate, and adapt cognitive exercises (Faraziani and

Eken, 2024). Such frameworks employ agents for

teaching, criticizing, evaluating, and providing psy-

chological support—each contributing distinct exper-

tise to produce a cohesive, user-centric intervention.

The real-time analysis of user behaviour and progress

allows dynamic adjustments to difﬁculty levels, con-

tent type, and emotional tone, ensuring that training

remains consistently challenging yet accessible.

A critical innovation in this area is the integration

of LLMs and advanced speech technologies into these

systems. Voice-based interfaces and Natural language

understanding (NLU) facilitate more inclusive en-

gagement, especially for individuals with sensory or

motor limitations (Aghajani et al., 2021). By simpli-

fying user-system interactions, voice-based platforms

can reduce barriers to participation and improve ad-

herence, allowing elderly users to access and beneﬁt

from cognitive exercises. As AI advances, these mul-

timodal interfaces, combining voice, text, and visual

cues, enhance user autonomy and foster greater long-

term participation in cognitive maintenance programs

(Faraziani and Eken, 2024).

Multi-agent AI systems also increasingly incorpo-

rate emotional and affective computing techniques.

EAA 2025 - Special Session on Emotions and Affective Agents

938

Emotional well-being is tightly linked to cognitive

health, and interventions that address affective states

may offer more holistic beneﬁts. AI-enabled robots

and embodied virtual agents, for instance, integrate

cognitive training tasks with simulated social interac-

tions to meet both the cognitive and emotional needs

of older adults (Gochoo et al., 2020). These socially

assistive robots can detect and respond to changes in

the user’s mood or frustration levels, providing en-

couragement, empathy, and motivational support. By

attending to the user’s emotional state, AI systems

can sustain user engagement over extended periods,

potentially enhancing intervention adherence and out-

comes.

Moreover, AI is expediting research advance-

ments in elderly cognitive care. Automated litera-

ture review agents, leveraging Natural language pro-

cessing (NLP) and ML-based summarization tech-

niques, can efﬁciently synthesize vast amounts of re-

search ﬁndings (Sami et al., 2024). These systems

help researchers stay current with the rapidly evolv-

ing evidence base and accelerate the development

of new, more effective cognitive intervention strate-

gies. Meta-analyses and systematic reviews generated

with AI support ensure practitioners and policymak-

ers can access the latest, high-quality evidence, guid-

ing better-informed decision-making in clinical and

community settings.

Despite these technological strides, current solu-

tions often target isolated aspects of cognitive sup-

port. For example, some systems focus on exercise

generation without adequately addressing emotional

or motivational support, while others emphasize de-

tection and monitoring without delivering actionable

training content (Castro et al., 2023). Integrating

multiple AI capabilities into a single cohesive frame-

work remains a key challenge, including personali-

sation, emotional adaptation, and multimodal inter-

action. Additionally, best practices for orchestrating

agent collaboration and mitigating conﬂicting outputs

are still evolving. There is a clear need for compre-

hensive multi-agent architectures that can seamlessly

synchronize these functionalities, thereby improving

user experience, accessibility, and training efﬁcacy.

This paper addresses these gaps by presenting a

novel multi-agent system for adaptive cognitive exer-

cise design in elderly care. Building on the advances

discussed in the literature, our system incorporates

specialized agents, such as the Teacher, the Critic, the

Conciliator, the Performance evaluator, and the Psy-

chologist, within the CrewAI (crewAI, 2024) frame-

work to deliver holistic, user-tailored interventions.

The system enables sophisticated agent collaboration

by leveraging LLM-based interfaces, speech recog-

nition, and emotional adaptation strategies while re-

maining accessible and engaging. This integrated ap-

proach is poised to advance the state of the art in AI-

driven cognitive training, offering a more comprehen-

sive, responsive, and human-centered solution for an

increasingly urgent public health challenge.

3 SYSTEM STRUCTURE

The proposed multi-agent system is designed to as-

sist older adults in stimulating their cognitive abilities

through exercises that focus on, but are not limited to,

six distinct areas: attention, memory, mental calcu-

lation, language, logical reasoning, and general prac-

tice.

Our approach prioritizes multi-agent orchestration

and LLMs over explicit symbolic or hybrid methods.

This enables greater adaptability and scalability, re-

ducing reliance on predeﬁned rules or rigid knowl-

edge representations.

The system leverages a multi-layered team of ded-

icated agents, each tasked with speciﬁc roles such

as exercise generation, critique, evaluation, and emo-

tional assessment, to continuously reﬁne both con-

tent and user interactions. This approach simulates a

”System 2”-like reasoning process, enabling the gen-

eration of high-quality exercises, real-time evaluation

of user performance, and emotional monitoring to

adapt future exercises to the user’s needs.

Crew AI, a Python-based open-source framework

designed to orchestrate multi-agent AI systems, has

been employed to orchestrate the multi-agent sys-

tem. It facilitates the creation of teams of autonomous

agents, each with speciﬁc roles, tools, and objectives,

collaborating efﬁciently to perform complex tasks.

Each agent assumes a speciﬁc role, fostering com-

munication and coordination between agents to share

information and delegate tasks efﬁciently. Its mod-

ular design facilitates the integration of customised

tools and LLMs, enabling adaptation to diverse ap-

plications and environments, including the dynamic

generation of exercises tailored to the user’s proﬁle.

Furthermore, it automates workﬂows by managing

task allocation and agent interactions, optimising pro-

cesses and enhancing operational efﬁciency.

Using LLMs to generate cognitive exercises offers

multiple beneﬁts supported by recent research (Kas-

neci et al., 2023). These models enable personalisa-

tion tailored to each user’s proﬁle and cognitive level,

enhancing learning effectiveness. Additionally, their

generative capabilities allow the creation of a wide va-

riety of exercises, avoiding repetition and maintaining

the user’s engagement (Ichien et al., 2024). LLMs can

Multi-Agent AI System for Adaptive Cognitive Training in Elderly Care

939

also evaluate responses and provide immediate feed-

back, fostering autonomous learning and real-time er-

ror correction (Letteri and Vittorini, 2024). Moreover,

they can simulate human cognitive processes (Aher

et al., 2023) and design exercises that stimulate spe-

ciﬁc functions such as memory, attention, and plan-

ning. Lastly, LLMs can generate high-quality educa-

tional content, offering detailed explanations and so-

lutions for complex exercises, facilitating understand-

ing challenging concepts, and enriching educational

material (Sarsa et al., 2022; Moore et al., 2023).

However, despite their increasing precision,

LLMs may exhibit hallucinations or provide unreli-

able information that aligns differently from the spe-

ciﬁc context sought. Hallucinations refer to generat-

ing content without factual grounding in the training

data. This phenomenon occurs when the model, with-

out sufﬁcient information, attempts to probabilisti-

cally complete patterns without a genuine understand-

ing of the content, resulting in fabricated or incorrect

responses.

The lack of adaptation to speciﬁc contexts arises

from the limitations of some models in adequately

interpreting the environment or the user’s particular

needs. This constraint may lead to irrelevant, inco-

herent, or poorly adjusted responses that do not meet

the requirements of the given scenario.

In the realm of creating and evaluating cognitive

exercises, it is essential to implement mechanisms

that control the outputs of language models to miti-

gate inaccuracies before they reach the end user. Such

inaccuracies can compromise the quality of the ex-

ercises generated, the interpretation of user perfor-

mance data, and, consequently, the system’s utility in

adapting future exercises or enhancing the accuracy

and feedback provided.

In this context, multi-agent systems emerge as a

highly recommended solution (Shinn et al., 2024).

These systems enable the development of architec-

tures involving agents with speciﬁc roles working in

a coordinated manner. Furthermore, it is possible to

establish groups of agents sharing the same role, in-

teracting with one another to reach a consensus before

presenting the information to the user. For instance, in

generating a cognitive exercise, multiple agents pow-

ered by language models can collaborate, reﬁne the

content, and validate its quality, ensuring that the ﬁ-

nal output is optimised to meet the user’s needs.

This multi-agent system approach offers two pri-

mary beneﬁts. Firstly, it enhances the robustness of

the generated content by reducing errors and ensuring

accuracy through a consensus-driven process. Sec-

ondly, it provides signiﬁcant ﬂexibility to adapt to

each user’s speciﬁc cognitive and individual needs,

delivering a personalised and adequate experience.

At the start of execution, the system gathers basic

information about the user, such as their name, age,

cognitive state, and speciﬁc needs. This information

forms the user’s speciﬁc context and is shared with

the various agents to be considered when performing

their respective tasks.

Based on the structure provided by the CrewAI

technology, an agent in our system will possess a role,

a goal, a backstory, and a language model to generate

appropriate responses.

The agent’s role deﬁnes its function within the

group, while its goal represents the individual ob-

jective that guides its decision-making process. The

backstory provides contextual information about the

agent, shaping its interactions and behaviour.

Each agent will execute tasks, which are speciﬁc

assignments the agent is responsible for completing.

These tasks will be deﬁned with a detailed descrip-

tion, the agent assigned to execute them, the expected

output upon task completion, and the relevant context.

The crews will also be established once the agents

and tasks have been deﬁned. A crew is a group of

agents collaborating to solve a set of tasks. Each crew

will have its own parameters, such as the agents in-

volved in the collaboration, the tasks to be addressed,

and the type of process to be followed.

The decision-making processes of the presented

multi-agent system are structured around sequential

task execution, leveraging consensus-driven mecha-

nisms and precise coordination protocols. Each task

is broken into smaller, manageable subtasks, which

are assigned to agents sequentially to ensure an efﬁ-

cient ﬂow of operations. At every step, agents inter-

act iteratively to validate and reﬁne outputs through

consensus mechanisms. For example, the critic agent

evaluates the outputs from the teacher agent against

established benchmarks. If conﬂicts arise, the con-

ciliator agent mediates to synthesize improvements,

ensuring robust and coherent results. Tasks are exe-

cuted in a predeﬁned sequence, where the output of

one agent becomes the input for the next. This struc-

ture minimizes redundancy and ensures a linear pro-

gression, enhancing both speed and accuracy. For in-

stance, after the teacher agent generates an exercise,

it is passed to the critic agent for review and then to

the performance evaluator agent for assessment.

3.1 System Agents

The system comprises various types of agents, each

with distinct functionalities and roles, which are enu-

merated as follows:

EAA 2025 - Special Session on Emotions and Affective Agents

940

3.1.1 Teacher Agent

The Teacher agent (TA) is responsible for generating

exercises tailored to the user. These exercises are de-

signed through prompting to be short, dynamic, and

speciﬁc, focusing on stimulating key cognitive skills

such as attention, memory, mental arithmetic, lan-

guage, and logical reasoning. Each exercise will be

adaptable to the user’s proﬁle and needs, prioritising

an appropriate level of difﬁculty to ensure completion

within one minute. Tasks will vary, including iden-

tifying distinct elements in a sequence, performing

simple calculations, completing logical patterns, or-

dering words or lists, and quick memory exercises.

3.1.2 Performance Evaluator Agent

The primary goal of the Performance evaluator agent

(PEA) is to assess the user’s responses to the exercises

objectively. This agent compares the user’s response

with the correct solution generated internally to deter-

mine whether the response is correct. Additionally, it

provides speciﬁc and concise feedback based on the

user’s context, the response given, and the character-

istics of the exercise.

3.1.3 Critic Agent

The Critic agent (CA) evaluates the responses gener-

ated by other agents to ensure they are appropriate and

fulﬁll the requested task. Its main objective is to guar-

antee the quality and relevance of responses within

the system by employing an analytical and critical ap-

proach. In this case, it validates the exercises created

by the TA to ensure the ﬁnal output is coherent and

adapted to the speciﬁc user context. Furthermore, it

reviews the corrections made by the agent to verify

that the feedback is accurate and suitably tailored to

the user’s speciﬁc characteristics.

3.1.4 Conciliator Agent

The Conciliator agent (CoA) is responsible for creat-

ing an improved version of a cognitive exercise, using

the original one and the suggested improvements as

a foundation. This agent mediates between the initial

exercise design and the critiques received, ensuring

adjustments are made to enhance quality, effective-

ness, and suitability. Its role is crucial in preventing

inﬁnite loops between the CA and TA. During the cre-

ation of exercises, the debate between these agents

may prolong the process; therefore, after several in-

teractions, the CoA is introduced to provide a resolu-

tion based on the exercises created by the TA and the

critiques made by the CA.

3.1.5 Psychologist Agent

The Psychologist agent (PsA) analyses the user’s

emotional state based on their exercise performance

and the information provided. This agent acts as

an empathetic observer, identifying emotions such as

frustration, interest, satisfaction, or others that may

arise during the user’s interaction with the system.

Feedback on these emotions is stored in a document.

4 METHODOLOGY

The system begins by creating an initial context for

the user based on key information such as their name,

age, and speciﬁc cognitive characteristics. Using this

context, the TA is tasked with designing an exercise

to train and enhance the user’s cognitive abilities.

The integration of AI in cognitive training for el-

derly users raises critical ethical concerns. Privacy is

a key issue, as the sensitive nature of personal and

health-related data makes it susceptible to misuse if

not properly safeguarded. To address this, the sys-

tem prioritizes data privacy and security by collecting

only essential user information, such as name, age,

and cognitive characteristics. Importantly, the name is

not transmitted externally and is used only within the

system to personalize the experience. Age and cogni-

tive information are anonymized before being sent to

the model, ensuring data protection.

Once the exercise has been generated, an interac-

tion is initiated between two agents: the TA and the

CA. The CA receives the designed exercise and the

user’s context, conducting a detailed evaluation to de-

termine whether the exercise is coherent and adheres

to the established criteria. At this stage, the CA may

either approve or reject the exercise. If the exercise is

rejected, the TA must create a new one, considering

the observations and suggestions provided by the CA.

If the exercise is approved following the initial in-

teraction between the TA and CA, the system will

present it to the user for completion. Conversely, if

the exercise is rejected again after the TA generates

a revised exercise and undergoes a new critique by

the CA, the CoA is introduced into the process. This

agent analyses both the revised exercise created by the

TA and the observations made by the CA, ultimately

producing a ﬁnal proposal that harmonises both per-

spectives while still considering the user’s speciﬁc

characteristics.

Through this iterative cycle, the three agents col-

laborate to achieve consensus, ensuring the exercise is

high quality, coherent, and tailored to the user’s needs.

After completing this process, the system will deliver

Multi-Agent AI System for Adaptive Cognitive Training in Elderly Care

941

the ﬁnal exercise to the user, allowing them to proceed

and complete it.

Once the user provides their response to the

exercise, the PEA takes action. As primary in-

puts, this agent receives the exercise prompt, the

user’s response, and the previously deﬁned user con-

text. Based on this information, the PEA determines

whether the user’s response is correct or incorrect.

Additionally, it tailors its explanation, the justiﬁca-

tion of its evaluation, and its manner of expression

to align with the user’s speciﬁc needs, ensuring that

the feedback is clear, understandable, and relevant to

their proﬁle.

Once the PEA delivers its verdict, it will share it

with the CA. The CA is provided with all the pertinent

information: the exercise prompt, the user’s context,

the user’s response, and the evaluation generated by

the PEA. Using this data, the CA conducts a thorough

analysis to determine whether the evaluation provided

by the PEA is appropriate, well-founded, and in line

with the previously established criteria.

Following the consensus reached between both

agents, a ﬁnal evaluation of the exercise is generated

and presented to the user. This evaluation not only

aims to inform the user about the correctness of their

response but also to provide a clear, well-justiﬁed, and

personalised explanation to help them better under-

stand the exercise and reﬂect on their performance.

Finally, the system interacts with the user once

again to gather their emotional feedback regarding

the exercise. At this stage, the user is asked how

they felt while completing the exercise, both in terms

of its composition and its level of difﬁculty. This

emotional feedback, combined with the user’s con-

text and their response to the exercise, is used to

generate a detailed analysis of their emotional state

based on their performance, which can subsequently

be used to adapt future exercises, ensuring they are

better aligned with the user’s cognitive abilities and

emotional well-being. The ﬁgure 1 presents a dia-

gram illustrating the system architecture and internal

interactions.

5 EXPERIMENTAL APPROACH

In terms of user interaction with the system, two ver-

sions have been developed: one where the interaction

is conducted through text and another where it is con-

ducted through voice.

In the voice interaction version, advanced speech

recognition and synthesis technologies enable a natu-

ral and accessible voice-based interaction. The Deep-

gram API powers both speech recognition and syn-

thesis functionalities (Deepgram, 2024), ensuring a

robust and efﬁcient implementation. These tools al-

low the user’s voice to be transformed into text and,

in turn, generate responses in auditory format, creat-

ing a continuous ﬂow of communication between the

user and the system. This approach enables interac-

tion for individuals who may have reading or writing

difﬁculties, which is common among older users.

Speech recognition is crucial in capturing the

user’s spoken words and converting them into text.

This process leverages DL models to achieve high

accuracy even under challenging acoustic conditions,

such as noisy environments or interacting with users

with varied accents. The ability to process speech in

real-time ensures that interactions remain immediate

and dynamic. Moreover, these models transcribe in-

dividual words and interpret complete phrases, taking

into account the context and intent behind the mes-

sage.

On the other hand, speech synthesis transforms

the text generated by the system into clear and nat-

ural audio. This not only facilitates the comprehen-

sion of instructions and feedback but also enhances

the user experience by incorporating human-like in-

tonations and rhythms.

The interaction ﬂow begins by informing the user

that the exercise is being generated. The exer-

cise is then read aloud, and the system awaits the

user’s input, which is spoken directly into the sys-

tem. The user’s voice is converted into text and pro-

cessed by intelligent agents to generate a response.

This response is subsequently transformed into audio

through speech synthesis, providing clear and person-

alised communication to the user. This process en-

ables the user to engage in an interactive experience

while enhancing the system’s usability and accessi-

bility, allowing individuals with varying abilities to

interact effectively with the technology.

Moreover, both voice and text interaction mod-

els have utilised language models to enhance the

agents’ ability to comprehend and generate appropri-

ate responses. For this purpose, GPT-4o was selected

(Achiam et al., 2023), as it is one of the state-of-the-

art LLMs. To ensure that the performance of the LLM

did not compromise the system’s effectiveness, we

consulted an LLM leaderboard, speciﬁcally the Chat-

bot Arena LLM Leaderboard (Chiang et al., 2024).

This leaderboard is based on an open platform that

evaluates and ranks AI models through user votes,

using the Bradley-Terry model to generate real-time

rankings. Consequently, we selected the top-ranked

LLM to evaluate the multi-agent system architecture

properly.

EAA 2025 - Special Session on Emotions and Affective Agents

942

Figure 1: Overview of the multi-agent system architecture.

6 SYSTEM EVALUATION

The practical experimentation of the system was con-

ducted with a group of 20 users aged between 54 and

83 years, most of whom were members of a senior

university program. These users had a certain level of

familiarity with technology, although not speciﬁcally

with systems like the one being tested. During the

tests, users were tasked with completing an entire in-

teraction cycle designed to evaluate the system’s per-

formance.

The group of 20 users was divided into two sub-

groups of 10 participants each. One subgroup per-

formed the tests using voice interaction, while the

other used text-based interaction. Efforts were made

to ensure that the average age of participants in both

groups was similar. The objective of this approach

was not only to test the multi-agent system but also

to assess how the different forms of interaction could

inﬂuence users’ perceptions of the system.

The experimental process followed a similar

structure for both groups.

Firstly, the system generated a personalised ex-

ercise tailored to each user’s speciﬁc characteristics,

which had been previously introduced. This level of

personalisation ensured that the generated tasks were

relevant and appropriately suited to the user’s proﬁle.

Once the exercise was presented, participants

completed the task using either text or voice inter-

action, depending on their assigned modality. Upon

task completion, the system automatically generated

a correction of the user’s responses and provided de-

tailed feedback on their performance. Subsequently,

users were asked to share their thoughts on the ex-

ercise. The system produced a psychological proﬁle

based on the exercise outcomes, user feedback, and

observed interaction patterns. This proﬁle included

an analysis of the emotional state experienced during

the process, identifying emotions such as satisfaction,

frustration, or motivation.

Following the experimentation phase, both groups

were given a questionnaire, with questions assessed

using a 5-point Likert scale, where one indicated

“strongly disagree” and ﬁve indicated “strongly

agree.”

The results for the subgroup that tested the sys-

tem using text-based interaction are presented in Ta-

ble 1, showing the mean and standard deviation for

each question. Table 2 presents the results for the

subgroup that used the system through voice-based

interaction, measuring the participants’ opinions and

responses.

As the analysis presented in the preceding tables

shows, the average score for text-based interaction is

4.63 out of 5, indicating a very positive overall opin-

ion. In contrast, this value decreases to 4.10 out of

5 for voice-based interaction. While this still reﬂects

a good predisposition towards using the system, it is

evident that users who interacted with the system via

text provided higher ratings.

Nevertheless, regarding whether users would

utilise the system frequently, the average response in

both cases exceeds 4 out of 5 on the Likert scale.

This demonstrates a strong general acceptance of the

system, regardless of the mode of interaction em-

ployed. It suggests that, although text-based interac-

tion is rated slightly higher than voice-based interac-

tion, both modes reveal a favourable user inclination.

In particular, the fact that the average responses

for anticipated frequency of use remain above 4 in

both cases highlights a positive perception and con-

ﬁdence in the system as a valuable and viable tool.

These ﬁndings imply that the system possesses

signiﬁcant potential for frequent and sustained adop-

Multi-Agent AI System for Adaptive Cognitive Training in Elderly Care

943

Table 1: Descriptive results of the Likert scale questionnaire for the text interaction system.

Item Mean Standard Deviation

I ﬁnd the system useful 4.83 0.41

I would use the system frequently 4.50 0.83

The system instructions were clear and easy to understand 4.67 0.52

I felt that the activities matched my abilities 4.50 0.83

I felt secure while using this system 4.00 0.63

I believe that with some practice, I could use this system on my own 4.83 0.41

I found it entertaining and pleasant to use this system 4.67 0.52

I believe the system can help keep my mind active 4.83 0.41

Table 2: Descriptive results of the Likert scale questionnaire for the voice interaction system.

Item Mean Standard Deviation

I ﬁnd the system useful 4.67 0.52

I would use the system frequently 4.17 0.41

The system instructions were clear and easy to understand 4.00 0.89

I felt that the activities matched my abilities 3.83 0.75

I felt secure while using this system 3.83 0.75

I believe that with some practice, I could use this system on my own 3.83 0.75

I found it entertaining and pleasant to use this system 4.00 0.63

I believe the system can help keep my mind active 4.33 0.52

tion despite the differences in user experience be-

tween the interaction modes. However, further reﬁne-

ments should be made to enhance the user experience,

particularly for voice-based interaction.

Furthermore, following user feedback, partici-

pants were asked open-ended questions regarding

their overall experience and suggestions for system

improvement. Among users of the text-based interac-

tion, the majority highlighted that the initial difﬁculty

of the exercises was somewhat high. This suggests

the need for further adjustments to ensure a smoother

learning curve, especially for older users who may re-

quire a more gradual progression in task complexity.

Additionally, some participants mentioned that the

correction feedback provided by the system was occa-

sionally too lengthy, which could hinder their ability

to process the information effectively.

Several notable challenges were identiﬁed for

users who interacted with the system via voice. The

exercises relied solely on auditory input, which cre-

ated speciﬁc difﬁculties for users. In memory-based

tasks, such as recalling words, participants found

it challenging to retain information presented only

through voice, as they lacked visual references to sup-

port their memory. While the voice interface was

designed to be fully auditory, participants expressed

that having to rely exclusively on spoken instruc-

tions made it more difﬁcult to process and remem-

ber multiple steps or complex exercise requirements.

A recurring suggestion was to complement the voice-

based interface with a visual display of the text on the

screen, as this dual-modality approach would provide

users with both auditory and visual cues, improving

comprehension and accessibility.

To evaluate whether signiﬁcant differences exist in

the perception of interaction between two modes (text

and voice), a Student’s t-test was applied to compare

the satisfaction means. The results showed that inter-

action via text received a signiﬁcantly higher rating

than voice. The analysis yielded a t-statistic of 3.19

and a p-value of 0.014, which allows us to afﬁrm, with

a 95% conﬁdence level, that the observed difference is

unlikely to have occurred by chance. These ﬁndings

suggest that participants perceive interaction through

text as a more positive and efﬁcient experience.

Additionally, the coefﬁcient of variation (CV) was

calculated to analyse the responses’ consistency and

assess the perceptions’ homogeneity. The results re-

vealed that voice interaction exhibits more signiﬁcant

variability in user opinions than text mode. For in-

stance, in the question “The system instructions were

clear and easy to understand”, the CV for text mode

was 11.13%, whereas for voice it reached 22.25%.

This difference suggests that users’ experience with

voice interaction is less consistent, possibly due to

factors such as difﬁculties in auditory comprehension,

the absence of visual support, or individual differ-

ences in the use of technology.

EAA 2025 - Special Session on Emotions and Affective Agents

944

7 SYSTEM LIMITATIONS

During the testing conducted throughout the exper-

imental process, the system, leveraging LLMs for

content creation and language comprehension along

with agentic systems to fulﬁll various roles, showed

promise according to most users. Multi-agent interac-

tions were especially beneﬁcial in regulating the out-

puts of LLMs, thereby reducing inaccuracies and hal-

lucinations. Despite these advantages, our evaluation

highlighted several limitations and proposed areas for

improvement that could enhance system functional-

ity and user experience. It was observed that the ini-

tial difﬁculty of the cognitive exercises could beneﬁt

from more precise adjustments. Implementing a pro-

gressive approach, starting with simpler exercises and

gradually increasing complexity based on user perfor-

mance and feedback, might be more effective. Such

a strategy would provide a more user-friendly intro-

duction to the system, especially for users with lim-

ited cognitive abilities, and support a more natural

and adaptive learning curve. This would help mitigate

frustration associated with complex initial exercises,

facilitating a smoother transition to more challenging

levels. Although initially designed to operate through

verbal commands for integration with home assis-

tants, there is a signiﬁcant opportunity to enhance user

interaction by adding a visual interface. This interface

would display exercise text, allowing users to both

listen to instructions and read them, thus improving

comprehension and performance. This feature would

be particularly beneﬁcial for users who require vi-

sual reinforcement to process information more effec-

tively.

The current evaluation involved a limited number

of participants due to resource constraints. Despite

these limitations, the insights gleaned provide a valu-

able foundation for understanding the system’s initial

usability and effectiveness. Moving forward, it would

be beneﬁcial to undertake a more comprehensive eval-

uation involving a broader user base to further vali-

date and reﬁne the system’s capabilities, thus enhanc-

ing the robustness of our ﬁndings.

Additionally, considering ethical and normative

aspects, it would be intriguing to develop an ethical

agent in future iterations of this project. This agent

will be designed to process regulatory documents, en-

suring that the outcomes of interactions within the

system, speciﬁcally, the ﬁnal exercise generated by

the TA after discussions with the CA and CoA, as well

as the responses from the PEA and PsA, are in strict

adherence to the applicable ethical standards and reg-

ulations. This approach not only reinforces the in-

tegrity of the system but also safeguards the interests

and rights of all participants involved.

The system is designed with a strong emphasis on

user autonomy and trust. Users are fully informed

about its purpose and capabilities and are explicitly

asked for their consent before engaging with it, pro-

moting transparency and conﬁdence in the system.

Furthermore, it is intended as a supportive tool for

cognitive training rather than a replacement for pro-

fessional medical judgment. Healthcare professionals

retain ultimate responsibility for interpreting the sys-

tem’s output and making treatment decisions, recom-

mending its use when it aligns with the speciﬁc needs

of their patients.

These improvements and considerations are in-

tended to reﬁne the system’s capabilities and ensure

a high-quality user experience in future deployments.

8 CONCLUSIONS

The system presented in this article demonstrates how

technology can assist and provide solutions with the

potential to mitigate some of the problems that our so-

ciety will inevitably face in the coming years, such as

the ageing of the population and the cognitive decline

experienced by individuals at advanced ages.

As shown in this article, the combination of lan-

guage models with multi-agent systems allows us

to harness the creative capabilities of these systems

while mitigating issues such as hallucination through

the interaction between agents.

The presented system demonstrates how integrat-

ing specialized agents with clearly deﬁned roles, such

as the TA, the CA, the CoA, the PEA, and the PsA,

plays a fundamental role in creating precise, user-

oriented cognitive exercises. This multi-agent, LLM-

driven strategy combines generation, critique, evalua-

tion, and emotional insights to form a self-correcting,

layered process. By advancing toward more System

2-like reasoning, the system reduces hallucinations

and ensures a structured, adaptive approach where

user needs and emotional states take center stage. The

additional computational effort is justiﬁed by consis-

tently producing reliable, well-considered outcomes

tailored to enhance cognitive engagement and effec-

tiveness.

On the one hand, the TA serves as the primary

guide and is responsible for generating the exercises.

At the same time, the CA acts as a constructive eval-

uator of the exercise created, identifying areas for im-

provement and providing detailed feedback, which ul-

timately ensures higher quality in the proposed exer-

cise.

The role of the CoA is crucial in integrating the

Multi-Agent AI System for Adaptive Cognitive Training in Elderly Care

945

responses of both the TA and the CA, ensuring that

the content remains comprehensible and appropri-

ately tailored to the user’s current level. In parallel,

the PEA provides real-time correction of the user’s re-

sponses, monitoring their performance and highlight-

ing potential errors.

Finally, the PsA adds a unique value to the sys-

tem by conducting an emotional analysis of the user

during the completion of the exercises. This function-

ality allows for detecting the user’s emotional state in

response to the exercises, which can be highly useful

for adapting both the difﬁculty level and the type of

exercises presented.

This holistic approach, which integrates cogni-

tive performance with emotional well-being, ensures

a more inclusive, effective, and personalised user ex-

perience.

The interaction modes, voice and text, are

thoughtfully designed to provide ﬂexibility and adapt-

ability, allowing users to engage with the system in

the way that suits them best. Voice-based interfaces

offer a hands-free, intuitive option, making them par-

ticularly helpful for individuals with limited literacy

or motor challenges, such as older adults or those

with physical impairments. Meanwhile, text-based

interactions ensure clarity and precision, appealing to

users who feel comfortable reading and typing, and

offering a straightforward and efﬁcient experience.

By accommodating diverse abilities and preferences,

these dual modes reﬂect a strong commitment to ac-

cessibility and inclusivity, creating opportunities for a

wide range of users to interact with the system effec-

tively.

Both systems received positive user evaluations,

with average scores exceeding 4 out of 5 on the Lik-

ert scale questions. These results reﬂect a high level

of acceptance of the system and a positive inclination

towards its use, regardless of the interaction modality

chosen.

In conclusion, this system represents a signiﬁ-

cant step forward in leveraging AI-driven multi-agent

frameworks to deliver personalised, adaptive, and ac-

cessible cognitive training solutions, potentially en-

hancing cognitive engagement and emotional well-

being in elderly users.

ACKNOWLEDGEMENTS

This work is partially supported by Generalitat

Valenciana, FPI grant CIACIF/2022/098 and CI-

PROM/2021/077

REFERENCES

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I.,

Aleman, F. L., Almeida, D., Altenschmidt, J., Altman,

S., Anadkat, S., et al. (2023). Gpt-4 technical report.

arXiv preprint arXiv:2303.08774.

Aghajani, M., Ben Abdessalem, H., and Frasson, C. (2021).

Voice emotion recognition in real time applications. In

Intelligent Tutoring Systems: 17th International Con-

ference, ITS 2021, Virtual Event, June 7–11, 2021,

Proceedings 17, pages 490–496. Springer.

Aher, G. V., Arriaga, R. I., and Kalai, A. T. (2023). Us-

ing large language models to simulate multiple hu-

mans and replicate human subject studies. In Interna-

tional Conference on Machine Learning, pages 337–

371. PMLR.

Castro, C. B., Costa, L., Dias, C. B., Chen, J., Hillebrandt,

H., Gardener, S. L., Brown, B. M., Loo, R., Garg, M.,

Rainey-Smith, S. R., et al. (2023). Multi-domain in-

terventions for dementia prevention–a systematic re-

view. The Journal of nutrition, health and aging,

27(12):1271–1280.

Chiang, W.-L., Zheng, L., Sheng, Y., Angelopoulos, A. N.,

Li, T., Li, D., Zhang, H., Zhu, B., Jordan, M., Gonza-

lez, J. E., et al. (2024). Chatbot arena: An open plat-

form for evaluating llms by human preference. arXiv

preprint arXiv:2403.04132.

crewAI (2024). Crewai: Framework for orchestrating role-

playing, autonomous ai agents.

Deepgram (2024). Deepgram speech recognition platform.

Faraziani, F. and Eken,

O. (2024). Enhancing cognitive

abilities and delaying cognitive decline in the elderly

through exercise-based health management systems.

International Journal of Sport Studies for Health,

7(2):13–22.

Ferguson, C., Hickman, L. D., Turkmani, S., Breen, P.,

Gargiulo, G., and Inglis, S. C. (2021). “wearables only

work on patients that wear them”: Barriers and facili-

tators to the adoption of wearable cardiac monitoring

technologies. Cardiovascular Digital Health Journal,

2(2):137–147.

Garcia-Betances, R. I., Jim

enez-Mixco, V., Arredondo,

M. T., and Cabrera-Umpi

errez, M. F. (2015). Us-

ing virtual reality for cognitive training of the elderly.

American Journal of Alzheimer’s Disease & Other

Dementias®, 30(1):49–54.

Gates, N. and Valenzuela, M. (2010). Cognitive exercise

and its role in cognitive function in older adults. Cur-

rent psychiatry reports, 12:20–27.

Gochoo, M., Vogan, A. A., Khalid, S., and Alnajjar, F.

(2020). Ai and robotics-based cognitive training for

elderly: A systematic review. In 2020 IEEE/ITU In-

ternational Conference on Artiﬁcial Intelligence for

Good (AI4G), pages 129–134. IEEE.

Graham, S. A., Lee, E. E., Jeste, D. V., Van Patten, R.,

Twamley, E. W., Nebeker, C., Yamada, Y., Kim, H.-

C., and Depp, C. A. (2020). Artiﬁcial intelligence

approaches to predicting and detecting cognitive de-

cline in older adults: A conceptual review. Psychiatry

research, 284:112732.

EAA 2025 - Special Session on Emotions and Affective Agents

946

Ichien, N., Bhatia, S., Ivanova, A., Webb, T., Grifﬁths, T.,

and Binz, M. (2024). Higher cognition in large lan-

guage models. In Proceedings of the Annual Meeting

of the Cognitive Science Society, volume 46.

Kasneci, E., Seßler, K., K

uchemann, S., Bannert, M.,

Dementieva, D., Fischer, F., Gasser, U., Groh, G.,

unnemann, S., H

ullermeier, E., et al. (2023). Chat-

gpt for good? on opportunities and challenges of large

language models for education. Learning and individ-

ual differences, 103:102274.

Kelly, M. E., Loughrey, D., Lawlor, B. A., Robertson, I. H.,

Walsh, C., and Brennan, S. (2014). The impact of

cognitive training and mental stimulation on cognitive

and everyday functioning of healthy older adults: a

systematic review and meta-analysis. Ageing research

reviews, 15:28–43.

Kueider, A. M., Parisi, J. M., Gross, A. L., and Re-

bok, G. W. (2012). Computerized cognitive training

with older adults: a systematic review. PloS one,

7(7):e40588.

Letteri, I. and Vittorini, P. (2024). Exploring the impact

of llm-generated feedback: Evaluation from profes-

sors and students in data science courses. In Inter-

national Conference in Methodologies and intelligent

Systems for Techhnology Enhanced Learning, pages

11–20. Springer.

Li, F., Tran, L., Thung, K.-H., Ji, S., Shen, D., and Li, J.

(2015). A robust deep model for improved classiﬁca-

tion of ad/mci patients. IEEE journal of biomedical

and health informatics, 19(5):1610–1616.

Moor, M., Banerjee, O., Abad, Z. S. H., Krumholz, H. M.,

Leskovec, J., Topol, E. J., and Rajpurkar, P. (2023).

Foundation models for generalist medical artiﬁcial in-

telligence. Nature, 616(7956):259–265.

Moore, S., Tong, R., Singh, A., Liu, Z., Hu, X., Lu, Y.,

Liang, J., Cao, C., Khosravi, H., Denny, P., et al.

(2023). Empowering education with llms-the next-

gen interface and content generation. In Interna-

tional Conference on Artiﬁcial Intelligence in Educa-

tion, pages 32–37. Springer.

Picard, R. W. (2010). Affective computing: from laugh-

ter to ieee. IEEE transactions on affective computing,

1(1):11–17.

Sami, A. M., Rasheed, Z., Kemell, K.-K., Waseem, M., Kil-

amo, T., Saari, M., Duc, A. N., Syst

a, K., and Abra-

hamsson, P. (2024). System for systematic literature

review using multiple ai agents: Concept and an em-

pirical evaluation. arXiv preprint arXiv:2403.08399.

Sarsa, S., Denny, P., Hellas, A., and Leinonen, J. (2022).

Automatic generation of programming exercises and

code explanations using large language models. In

Proceedings of the 2022 ACM Conference on Inter-

national Computing Education Research-Volume 1,

pages 27–43.

Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K., and

Yao, S. (2024). Reﬂexion: Language agents with ver-

bal reinforcement learning. Advances in Neural Infor-

mation Processing Systems, 36.

Simons, D. J., Boot, W. R., Charness, N., Gathercole,

S. E., Chabris, C. F., Hambrick, D. Z., and Stine-

Morrow, E. A. (2016). Do “brain-training” programs

work? Psychological science in the public interest,

17(3):103–186.

Wu, X., Freeman, S., Miyagi, M., Park, U., Nomura, K., and

Ebihara, S. (2024). Comprehensive geriatric assess-

ment in the era of telemedicine. Geriatrics & Geron-

tology International, 24:67–73.

Multi-Agent AI System for Adaptive Cognitive Training in Elderly Care

947