MOTIF: A Framework for Enhancing the Profiling Module of
Generative Agents that Simulate Human Behavior
Tibério Cerqueira
a
and Pamela Bezerra
b
Department of Computer Science, Recife Center for Advanced Studies and Systems (C.E.S.A.R), Recife, Brazil
Keywords:
Artificial Intelligence, LLM, Generative Agents, Human Behavior Simulation, Profiling.
Abstract:
Recent advances in Large Language Models (LLMs) have made the development of architectures that con-
vincingly simulate human behavior possible. These architectures give rise to generative agents (GA), a new
class of intelligent agents capable of carrying out human activities such as forming opinions, initiating dia-
logues, and planning the day. These experiences are stored as natural language and later transformed into
reflections, which are then used to guide future actions. Some of the advantages of GA are the ability to op-
erate in dynamic and open environments, interact with other agents in a more human-related way, and adapt
to changes. These agents, however, require a complex development process. Given this, this study proposes
MOTIF, a framework for facilitating and speeding up the initial stage of building these agents, known as pro-
filing. This stage is responsible for defining the agents’ identities and personalities. However, profiling is very
subjective and lacks a standard process, with some solutions manually writing each profile, while others use
LLMs. MOTIF combines both manual and LLM-based methods to enable the development of agents with
well-defined personalities and identities. Additionally, it provides a way of standardizing and formalizing the
profiling stage, creating the basis for future research in this field.
1 INTRODUCTION
In computer science, an agent is typically defined as a
software system situated in an environment, capable
of autonomous and proactive actions to achieve spe-
cific objectives. Agents can perceive and interact with
their surroundings, acquiring context information to
act on behalf of users or collaborate with other en-
tities. (Cetnarowicz, 2015). The main features that
distinguish agents from usual software systems are
their (1) autonomy for starting and finishing different
tasks, (2) capability of reacting to surrounding envi-
ronment, (3) ability to interact with other agents and
the user, and (4) code persistence, i.e., it runs continu-
ously. The most popular example of such systems are
chatbots and virtual assistants.
Recent advances in Deep Learning (DP), espe-
cially with the development of generative models,
such as the Large Language Models (LLMs), re-
shaped the way society produces and consume artifi-
cial intelligent (IA). Apart from processing immense
amounts of data and identifying complex patterns,
generative models are capable of creating new con-
a
https://orcid.org/0009-0001-5565-2255
b
https://orcid.org/0000-0002-5067-1617
tent in different formats, such as text and image. It is
worth noticing that the most popular generative solu-
tions, such as ChatGPT and Midjourney, use chatbots
to better interact with users. This not only resulted in
a fast adoption of these technologies and the increase
in popularity of agents, but also the creation of a new
type of agent: the generative agent (GA).
The term "generative agent" first appeared in
the innovative paper "Generative Agents: Interactive
Simulacra of Human Behaviour"(Park et al., 2023).
According to the authors, a GA is an agent that uses
generative models to simulate believable and relat-
able human behaviour. Instead of simply reacting to
the current user request’s, these agents have a com-
plex architecture that enable them to make inferences
about themselves and other agents, reflect about their
actions, and plan the future. They not only have a
memory mechanism, but also contain different per-
sonal information assigned to them, such a career, re-
lationships, interests and life goals. GA brings, there-
fore, great opportunities to support and advance dif-
ferent research areas, such as Pervasive Computing,
Human-Computer Interactions (HCI), and game de-
velopment. In HCI, for example, GA could be used
to simulate different users and facilitate software in-
terface tests and validations.
774
Cerqueira, T. and Bezerra, P.
MOTIF: A Framework for Enhancing the Profiling Module of Generative Agents that Simulate Human Behavior.
DOI: 10.5220/0013181800003890
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 17th International Conference on Agents and Artificial Intelligence (ICAART 2025) - Volume 3, pages 774-781
ISBN: 978-989-758-737-5; ISSN: 2184-433X
Proceedings Copyright © 2025 by SCITEPRESS – Science and Technology Publications, Lda.
Developing these agents, however, is a complex
tasks requiring multidisciplinary teams. A recent sur-
vey on the topic (Wang et al., 2024) identified four key
modules in most agents architectures: profiling, mem-
ory, planning, and action, which combined are re-
sponsible to provide the agents with personality, short
or long term memory, and thinking skills. Most of
the solutions use DP or LLM models to improve the
memory and/or planning modules, but little progress
was made to the profiling module, which is directly
responsible for defining the agents personality and be-
haviour. Additionally, there is no standard profiling
method. The main challenges faced by GA, however,
are related to the agent’s identity and behaviour, such
as realistic simulating different roles or properly in-
structing the agents how to behave according to soci-
etal norms.
Given the above, this paper proposes MOTIF, a
profiling framework for developing GA that emulate
the human behaviour. MOTIF aims at optimizing
the profiling process while making more human-like
and realistic agents, i.e., agents with more relatable
behaviour and less artificial responses and actions.
It combines two common profiling methods: hand-
crafting, which consist of manually describing the
agents’ character, with LLMs based models to bet-
ter process data and automate the simulation of the
described personality. This combination has the po-
tential of harnessing the advantages of handcrafting
(detailed and complete description of an agent) with
the facilities of LLMs (faster and more scalable way
of creating agents). Additionally, MOTIF extensively
explore different aspects of the human psychology,
such as emotions, traits, and goals, which produces
an holistic representation of the human experience.
This enables the agents to exhibit complex, human-
like behaviors in simulated environments, enhancing
the depth and realism of agent interactions. Finally,
as a platform-agnostic prototype, which can be eas-
ily integrated to other GA systems, MOTIF seeks to
develop a standard process for profiling.
This paper is organized as follows: Section 2 de-
tails the concepts of GA and profiling; Sections 3 and
4 describes MOTIF and the experiments performed,
respectively; and finally, Section 5 summarizes the
main conclusions and future work.
2 RELATED WORKS
2.1 Generative Agents
The authors of (Wang et al., 2024) bring an exten-
sive survey on recently published papers on Genera-
tive Agents (GA). The main goal of the study was to
identify how these agents’ are developed, evaluated
and applied. In terms of development, the authors
identified four common modules:
1. Profiling - defines the agents’ personality traits
and other psychological and social information,
such as career, relationships, and interests. Since
it defines the thinking-process and role of each
agent, it is an important aspect for decision-
making;
2. Memory - stores environment information which
are then used for future planning and decision
making. This module usually simulates the hu-
man short and long-term memories;
3. Planning - consists of strategies for planning and
tackling different tasks. It employs algorithms to
make adjustments to the initial plan if necessary;
4. Action - translates decisions into actions, each
one with specific goals, methods and expected re-
sults.
Despite great progress, (Wang et al., 2024) list
some challenges for developing convincing and re-
latable GA, most of which are related to the agents’
behaviour, such as (1) Realistically simulating spe-
cific roles, such as one based on career and age (e.g.,
a programmer usually does not have deep knowledge
on human anatomy); (2) The development of robust
prompts, i.e., high quality, clear, and precise instruc-
tions, to better guide the agents; (3) Hallucination, a
common problem to LLMs, which lead agents to cre-
ate false information that impact future decisions.
The work of (Park et al., 2023) try to solve these
challenges by expanding the use of LLMs for devel-
oping more complex memory and planning modules.
The key point of this work is to transform memories
into high level reflections about the environment and
the self. These reflections are then used to plan deci-
sions and actions. Additionally, this flow of data from
memory to reflection and then to plan is dynamic and
cyclical, with actions and plans becoming memories
again.
Other studies expands on this approach. The work
of (Li et al., 2023) focus on human collaboration and
logical thinking to propose an alternative method for
memory retrieval. Meanwhile, the work in (Wang
et al., 2023) implements the thinking-process defined
by the seminal psychology book "Thinking, Fast and
Slow" to better model the way humans think and adapt
to their environment. These works, however, focus on
the memory and planning modules, completely ignor-
ing the profiling module, which is directly related to
the agents personality and the challenge of developing
robust prompts as mentioned in (Wang et al., 2024).
MOTIF: A Framework for Enhancing the Profiling Module of Generative Agents that Simulate Human Behavior
775
2.2 Profiling
The main goal of profiling is to define the roles, per-
sonality, and characteristics of agents, which signifi-
cantly influence their behavior and interactions within
simulated environments. Agent profiles typically in-
clude three key categories of information: basic de-
mographic data, psychological traits, and social infor-
mation.
Recent studies emphasize the importance of pro-
filing in creating well-defined agent personalities, as
assigning specific roles to autonomous agents can en-
hance their effectiveness in representing their desig-
nated roles (Chen et al., 2023). However, as far as our
knowledge goes, little progress was made in this mod-
ule. Most papers on GA either don’t mention profil-
ing or briefly explain the process used. Additionally,
the lack of standardization in agent profiling has led
to the emergence of diverse methods, ranging from
manual approaches (handcrafting) to those utilizing
LLMs for automation (LLM-based models). These
methods face challenges in scalability, diversity and
precision. Issues like bias, overly formal communi-
cation, and endogeneity further complicate the simu-
lation of realistic behaviors (Park et al., 2023), (Gui
and Toubia, 2023). While handcrafted profiles offer
detail, they are laborious and expensive, as lengthy
backstories are often needed to generate believable
agents (Lin et al., 2023). These limitations hinder the
efficiency and depth required for high-quality simula-
tions
Some of the works that propose improvements for
this module are (Lin et al., 2023), (Shao et al., 2023),
and (Wang et al., 2023). The authors of (Lin et al.,
2023) developed an intuitive interface (GUI) to sup-
port handcrafting. Meanwhile, the work in (Shao
et al., 2023) proposes a new automated process named
"Experience Upload" to simulate historical charac-
ters. In this process, the profile of famous personali-
ties, such as Cleopatra or Shakespeare, are collected
through web scrapping from sources like Wikipedia.
Their life experiences are then extracted and pro-
cessed using LLM to instruct the agents how to talk
and behave accordingly. Finally, (Wang et al., 2023)
use the famous Maslow Hierarchy (Maslow, 1943) to
incorporate the basic human needs and emotions into
handcrafting. This approach was one of the firsts to
bring psychology and emotion models to better guide
the profiling module.
3 MOTIF: A FRAMEWORK FOR
GENERATIVE AGENT
PROFILING
To address the profilling challenges previously dis-
cussed (Section 2.2), this paper proposes MOTIF, a
novel framework designed to unify and structure the
creation of profiles for GA simulating human behav-
iors. MOTIF combines both handcrafting and LLM-
based methods through an intuitive interface to fa-
cilitate the process (Section 3.6). The handcrafting
step consist of five stages: (1)’Who am I?’, (2) At-
tributes, (3) Traits, (4) Emotions, and (5) Goals, each
capturing different aspects of human behaviour and
psychology. These stages enables a more structured
method of exploring different personalities while pro-
viding complexity to the agents. As described in the
following Sections (3.1 to 3.5) many of these stages
uses options, checkboxes, and grading scales to col-
lect information. After completing these stages, the
information is then passed to an LLM through prompt
engineering. This automatic step instructs the GA to
behave accordingly to the human aspects informed in
the first stage.
The framework aims, therefore, to streamline the
profile creation process, enhance consistency and re-
peatability of agent behaviors, and enable fine-tuning
of agent characteristics for specific simulation re-
quirements. The five stages are described as follows.
3.1 "Who Am I?"
The “Who Am I?” stage serves as the initial step in
defining the agent’s identity, providing the core foun-
dation for its profile. This stage begins with a free-
form text description that includes demographic de-
tails such as name, gender, age, and occupation, along
with more nuanced elements like background and key
life events. These characteristics shape how the agent
interacts within the simulation. For example, users
may include details about an agent’s achievements,
struggles, or significant personal experiences, as these
enrich the profile and enhance the agent’s realism
(Chen et al., 2023). This step is crucial for defining
an agent’s personality, contributing to more believable
and credible behavior during simulations.
While users are encouraged to include as much de-
tail as possible (minimum of 200 characters is recom-
mended), the focus is on capturing the essence of the
agent, rather than relying on length alone to achieve
realism. MOTIF’s proposed graphical user interface
(GUI) (Section 3.6) supports this process by offering
suggestions and prompts to help users develop com-
prehensive and creative descriptions for their agents.
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
776
3.2 Attributes
The second stage, Attributes,” introduces a quantita-
tive approach to defining an agent’s personality, draw-
ing inspiration from character creation systems in
role-playing games (RPG) such as THE SIMS. Users
assess the agent’s characteristics across 20 distinct at-
tributes, which are divided into three key categories:
Emotional, Intellectual, and Social. Each attribute is
scored on a scale from 0 to 10, providing a structured
way to represent the agent’s traits. This method al-
lows for the creation of more complex, multifaceted
personalities.
Emotional attributes, such as empathy or cru-
elty, influence how agents perceive and emotionally
respond to situations. Intellectual attributes, like in-
telligence and curiosity, shape the agent’s cognitive
abilities and problem-solving skills. Lastly, social
attributes, including charm and sincerity, determine
how well the agent interacts with others. By quanti-
fying these characteristics, the Attributes” stage en-
hances the efficiency and consistency of profile cre-
ation, ensuring that agents exhibit coherent and be-
lievable behavior across various scenarios an interac-
tions.
3.3 Traits
MOTIF’s third stage, named “Traits, is based on the
Five-Factor Model of personality (Goldberg, 1990),
a widely accepted approach in psychology that as-
sesses personality across ve key dimensions: Open-
ness to Experience, Conscientiousness, Extraversion,
Agreeableness, and Neuroticism. Each trait is quan-
tified on a scale from 0 to 5, providing a granular
representation of the agent’s personality. This al-
lows MOTIF to ground the agent’s behavior in an em-
pirically supported model, ensuring that the agent’s
traits are comprehensive and psychologically consis-
tent and enhancing the realism of agent simulations.
Openness assesses creativity and a willingness
to explore new ideas, while Conscientiousness re-
flects an agent’s level of discipline and organization.
Extraversion evaluates sociability and assertiveness,
while Agreeableness captures empathy and cooper-
ation. Finally, Neuroticism measures emotional sta-
bility and the ability to handle stress.
3.4 Emotions
The fourth stage focuses on refining the agents’ emo-
tional sensitivity by simulating a range of emotions
based on the influential work of Paul Ekman and
Robert Plutchik (Ekman, 1992), (PLUTCHIK, 1980).
This stage provides a quantitative model for assess-
ing emotional responses, where each of the eight ba-
sic emotions (Joy, Sadness, Fear, Trust, Surprise,
Anger, Anticipation, and Disgust) are evaluated on
a scale from 0 to 10. Emotions influence decision-
making and social interactions, and the ability to cus-
tomize emotional sensitivity helps create agents that
display more complex and human-like behavior in
simulated environments. The quantitative approach
provides developers with a powerful tool to adjust the
emotional range of agents and a better control over
how intensely an agent experiences and reacts to these
emotions.
3.5 Goals
The fifth and final stage, “Goals, defines the am-
bitions and objectives that drive an agent’s behav-
ior, mirroring human motivations. This stage cap-
tures the agent’s primary and secondary goals, in
which primary goals are long-term, overarching as-
pirations (e.g., “Graduate from medical school”), and
secondary goals are shorter-term or situation-specific
objectives (e.g., “Organize a graduation party”). To-
gether, these goals reflect personal and professional
aspirations, guiding, therefore, the agent’s decision-
making processes and its actions in various simulated
environments. It also enriches the simulation by en-
suring that the agent’s behavior remains consistent
with its aspirations.
3.6 Interface Prototype
To enhance MOTIF’s usability, a medium-fidelity
graphical user interface (GUI) prototype was devel-
oped using Adobe XD. This prototype summarizes
how users might interact with MOTIF to construct
GA profiles, providing a user-friendly approach to
specifying input parameters and offering greater con-
figuration flexibility.
The prototype features five screens, each corre-
sponding to a stage of the MOTIF framework, ensur-
ing a structured progression through the profile cre-
ation process. Each screen includes clear instructions
and supporting text, guiding users through the nu-
ances of each stage. This design choice aims to make
the complex process of agent profile creation more in-
tuitive and accessible to a wide range of users. Figure
1 shows the screen designed for the traits stage. Click
here to access the MOTIF’s interface prototype.
MOTIF: A Framework for Enhancing the Profiling Module of Generative Agents that Simulate Human Behavior
777
Figure 1: MOTIF Interface Prototype Stage III: Traits.
4 EXPERIMENTS
To evaluate MOTIF, two types of tests were per-
formed: (1) system tests (Section 4.1) and (2) us-
ability tests (Section 4.2). System tests aims to ob-
serve MOTIF’s performance in efficiently generating
rich and relatable agents. This tests consist of using
an LLM to simulate an agent developed using MO-
TIF and through a series of scenarios and questions
verify if the agent is acting according to the instruc-
tions provided. Meanwhile, usability tests exams if
the proposed interface do help users during the profil-
ing stage.
4.1 System Tests
System tests were conducted to assess the frame-
work’s effectiveness in creating believable and coher-
ent agent profiles, evaluating both personality struc-
turing and the language model’s ability to maintain
designated roles. Tests were performed using Ope-
nAI Playground with GPT-4o, developing three dis-
tinct agents through MOTIF.
Given the lack of standardized evaluation methods
for generative agents, we developed two novel test-
ing approaches: (1) ethical dilemma scenarios to as-
sess decision-making consistency, and (2) psychome-
tric assessments using the Big Five personality test.
These methods were designed to both validate MO-
TIF and contribute to broader evaluation methodolo-
gies for generative agents.
The testing process comprised four stages: envi-
ronment setup, system initialization, character de-
velopment, and testing scenarios. The environment
was configured to promote realistic responses while
preventing hallucination (temperature: 1.1, max to-
kens: 1000, top-p: 1, frequency penalty: 0.1, presence
penalty: 0).
4.1.1 System Initialization
To properly instruct the LLM model about MOTIF
and the tests to be executed, the following initializa-
tion prompt was developed:
"You are tasked with simulating a character based
on the detailed profile provided. Adhere strictly to
the defined personality and background. Remain in
character at all times, unless a message begins with
’Analysis Mode. When in Analysis Mode, switch to
your standard ChatGPT voice to address analytical
inquiries or provide clarifications outside of the char-
acter simulation. Profile Overview: [description of
each stage of MOTIF (Section 3)]. Simulate a char-
acter based on the following profile: [the data given
on each of the stages]"
This prompt prepares the model for adopting the
framework, including a description of the character
it would simulate and instructions to stay in charac-
ter unless the "Analysis Mode" command is given.
The "Analysis Mode" functionality was introduced
because, during initial tests, it became apparent that
there was a need to ask meta-analytical questions, i.e.,
questions that require the model to explain the logic
or motivation behind certain character actions or re-
sponses. This functionality enables, therefore, a more
comprehensive evaluation of the framework, enabling
researchers to delve into the model’s decision-making
processes and gain insights into how the character’s
actions and responses are generated.
4.1.2 Characters and Tests
Three distinct characters were developed using MO-
TIF: Anne (20, engineering student), Tina (25, physi-
cian), and Mariana (27, lawyer). Each character was
designed with a comprehensive profile including per-
sonality traits, attributes, emotions, and goals, as de-
tailed in the framework (Section 3).
Anne represents a dedicated but anxious stu-
dent balancing academic pressures, Tina embodies a
career-focused physician with diverse interests, and
Mariana embodies a recently graduated lawyer fo-
cused on social causes and professional development.
To evaluate behavioral consistency, each character
was assigned an "alter ego" with contrasting person-
ality traits while maintaining their basic demographic
background.
The evaluation comprised two components: ethi-
cal dilemma scenarios and psychometric assessments.
For ethical testing, characters faced moral situa-
tions involving workplace ethics, resource allocation,
and professional loyalty. The Big Five Personality
Test (Goldberg, 1992), using the version available at
Open-Source Psychometrics Project, provided quan-
titative assessment through a 50-question evaluation.
Results were compared against predefined MOTIF
traits to verify the model’s ability to maintain consis-
tent personalities during extended interactions.
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
778
As noted by (Wang et al., 2024), GA evaluation
typically employs two types of metrics: objective
metrics that assess answer accuracy and task perfor-
mance, and subjective metrics that evaluate response
quality and agent interactions. In our evaluation,
the psychometric tests served as objective metrics by
measuring trait accuracy, while the ethical dilemmas
provided subjective metrics by assessing decision-
making consistency and response quality.
4.1.3 Ethical Dilemma Scenarios
The experiments involved three original ethical
dilemmas specifically designed by the author to test
the consistency and depth of the simulated personal-
ities. Each dilemma was crafted to present a specific
moral challenge that would engage different aspects
of the characters’ defined personalities and values.
In the first scenario, engineering student Anne
faced an ethical challenge in the workplace when
she discovered that her only work friend was steal-
ing recyclable materials from the company. The core
dilemma centered on the conflict between loyalty to
a friend and professional integrity, complicated by
the fact that the stolen items were already designated
for disposal. This situation tested how the character
would balance personal relationships with corporate
ethics, particularly given Anne’s high empathy and
loyalty traits. Both the original Anne and her "alter
ego" chose not to report the theft directly, although
their reasoning differed, revealing a tendency in the
language model to avoid extreme unethical actions
even when simulating negative personalities.
The second test presented lawyer Mariana with a
resource allocation dilemma: choosing between us-
ing funds for career advancement through a high-
profile departmental project or supporting a commu-
nity financial literacy program. The central challenge
lay in weighing personal and professional growth
against social responsibility, directly testing the char-
acter’s defined values and priorities. Mariana chose
to prioritize social benefit over personal gain, align-
ing with her defined altruistic traits, while her "al-
ter ego" opted for career advancement, demonstrat-
ing how variations in personality attributes influenced
decision-making.
In the final test, Dr. Tina faced a professional
ethics dilemma when choosing between recommend-
ing a close friend with adequate qualifications or a
more experienced but less personally connected can-
didate for a position. This scenario tested the balance
between personal loyalty and professional responsi-
bility, particularly challenging given Tina’s high em-
pathy and loyalty traits combined with her career am-
bitions. Surprisingly, both Tina and her "alter ego"
selected the more qualified candidate, albeit with dif-
ferent rationales, highlighting the model’s ability to
consider multiple factors in decision-making, includ-
ing professional ethics and long-term consequences.
Overall, these tests demonstrated the framework’s
capacity to generate nuanced, context-sensitive be-
haviors that generally aligned with the defined per-
sonality profiles, while also revealing limitations not
only within the framework but also in the LLM used.
4.1.4 Psychometric Assessments
The quantitative analysis involved administering the
Big Five Personality Test to each simulated character
and comparing the results to their predefined trait val-
ues in the MOTIF framework. This approach assessed
how accurately the language model maintained con-
sistent personality traits across extended interactions.
The test results were normalized to a 0-5 scale to align
with the framework’s initial trait definitions. Graphi-
cal comparisons (Figures 2 to 7) were made between
the user-defined profiles (represented by a blue line
labeled "User Profile") and the test outcomes (repre-
sented by an orange line labeled "Test Result"). The
proximity of these lines indicates the degree of adher-
ence to the intended personality traits.
In Test 1 with Anne (Figure 2), results showed
close alignment between the user-defined profile and
test outcomes for most traits. The largest discrepancy
was observed in Conscientiousness, with a difference
of approximately 1.5 points. Anne’s "alter ego" (Fig-
ure 3) demonstrated even closer alignment across all
traits, possibly due to the less nuanced nature of its
personality configuration.
Figure 2: Psychometric test result of ’Good’ Anne.
For Test 2 with Mariana (Figure 4), the closest
alignments were observed in Extraversion and Con-
scientiousness, with differences of less than 1 point.
However, larger discrepancies of up to 2 points were
noted in Neuroticism and Agreeableness. These vari-
ations were attributed to the model’s interpretation of
other character attributes, such as high empathy and
social involvement. Mariana’s "alter ego" (Figure 5)
again showed closer overall alignment.
MOTIF: A Framework for Enhancing the Profiling Module of Generative Agents that Simulate Human Behavior
779
Figure 3: Psychometric test result of ’Bad’ Anne.
Figure 4: Psychometric test result of ’Good’ Mariana.
Figure 5: Psychometric test result of ’Bad’ Mariana.
Test 3 with Dr. Tina (Figure 6) yielded promis-
ing results in Extraversion and Openness, with dif-
ferences of about 0.5 points. Similar to Mariana,
larger discrepancies were observed in Neuroticism
and Agreeableness, reaching up to 2.5 points for Neu-
roticism. These differences were hypothesized to re-
sult from the model’s consideration of other character
traits like confidence and patience. Dr. Tina’s "alter
ego" (Figure 7) exhibited the closest alignment among
all tests, further supporting the observation that less
complex personalities were more consistently simu-
lated by the model.
4.2 Usability Tests
Usability tests were conducted with five professional
designers to evaluate MOTIF interface’s effectiveness
and areas for improvement. The testing group con-
sisted of two UI designers with two years of profes-
sional experience in interface development, two re-
cent design graduates, and one designer currently pur-
Figure 6: Psychometric test result of ’Good’ Tina.
Figure 7: Psychometric test result of ’Bad’ Tina.
suing a master’s degree in design, all holding bache-
lor’s degrees in design. The designers rated the in-
terface’s ease of use and intuitiveness (how easily
new users could understand and navigate the inter-
face) with scores ranging from 3 to 5 out of 5, with
an average of 3.8. When evaluating the overall design
quality on a scale of 1 to 10, considering aspects such
as information hierarchy, accessibility, color scheme,
typography, and visual organization, the interface re-
ceived scores between 6 and 8. The designers partic-
ularly appreciated the clear stage progression indica-
tors and well-structured customization tables through-
out the interface. The design was generally consid-
ered appropriate for creating profiles, with designers
noting that the interface successfully fulfills its pri-
mary purpose while maintaining a professional, albeit
austere, appearance.
Several key areas for improvement were identified
through the feedback. The visual hierarchy needs re-
finement, with suggestions for improving text spac-
ing, adjusting heading sizes, and implementing a con-
sistent grid system across all pages. Designers rec-
ommended enhancing visual appeal through more vi-
brant color schemes while maintaining professional-
ism. A significant concern was raised about the nav-
igational structure in Stage 2 (Attributes), where at-
tributes are currently displayed through a sequential
arrow-based system. Designers noted that this se-
quential presentation could cause users to miss im-
portant information, as they might not realize there
are additional attributes to review. They recom-
mended replacing this with either drop-down menus
or a side-by-side listing of all attributes, making all
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
780
options immediately visible to users. Additional sug-
gestions included incorporating more icons for better
user guidance and reorganizing lengthy text sections
into collapsible menus to reduce visual clutter. These
improvements would enhance user experience while
maintaining the interface’s functionality for creating
generative agent profiles.
5 CONCLUSIONS
This paper proposes MOTIF, a framework for en-
hancing the profiling module of generative agents
(GA) that simulate human behaviour. MOTIF com-
bines handcrafting and LLM-based methods to make
the profiling stage faster and more intuitive, while
still creating agents with realistic personalities. The
framework consists of 5 stages expanding on differ-
ent aspects of human behaviour and psychology and
employs a graphical interface to better guide the user
in describing the desired agents.
Experiments showed promising results, validating
MOTIF’s effectiveness in creating consistent and be-
lievable agent personalities. The ethical dilemma tests
showed that agents made decisions aligned with their
designated characteristics, while the psychometric as-
sessments revealed a strong positive correlation be-
tween the agents’ responses and their predefined trait
values.
Nevertheless, this research requires a more com-
prehensive analysis of the framework to better vali-
date its performance. Additionally, it is import to in-
tegrate MOTIF with existing GA platforms to observe
the gains it brings to different simulations. Given
this, future research can be summarized as: (1) Ex-
plore different personalities by creating a more di-
verse group of characters to observe the framework’s
ability to produce a variety of behaviours; (2) De-
velop more scenarios, with and without ethical dilem-
mas, to observe the many responses a agent can have
under different circumstances; (3) The use of other
psychological tests, such as Myer-Briggs (MBTI); (4)
A comparison with other methods, specially hand-
crafting ones, to observe if MOTIF uses less tokens
and time to produce similar results (currently, MOTIF
uses 400 tokens per agent generation) and finally (5)
Integrate MOTIF to a existing GA architecture, such
as the one in proposed in (Park et al., 2023) to ob-
serve how the agents behave collectively and evolve
overtime;
REFERENCES
Cetnarowicz, K. (2015). Introduction to the Subject of an
Agent in Computer Science, pages 1–5. Springer In-
ternational Publishing, Cham.
Chen, W., Su, Y., Zuo, J., Yang, C., Yuan, C., Chan, C.-
M., Yu, H., Lu, Y., Hung, Y.-H., Qian, C., Qin, Y.,
Cong, X., Xie, R., Liu, Z., Sun, M., and Zhou, J.
(2023). Agentverse: Facilitating multi-agent collab-
oration and exploring emergent behaviors.
Ekman, P. (1992). Are there basic emotions? Psychol. Rev.,
99(3):550–553.
Goldberg, L. R. (1990). An alternative “description of per-
sonality”: the big-five factor structure. J. Pers. Soc.
Psychol., 59(6):1216–1229.
Goldberg, L. R. (1992). The development of markers for the
big-five factor structure. Psychological Assessment,
4(1):26–42.
Gui, G. and Toubia, O. (2023). The challenge of using llms
to simulate human behavior: A causal inference per-
spective. SSRN Electronic Journal.
Li, Y., Zhang, Y., and Sun, L. (2023). Metaagents: Simu-
lating interactions of human behaviors for llm-based
task-oriented coordination via collaborative genera-
tive agents. arXiv preprint arXiv:2310.06500.
Lin, J., Zhao, H., Zhang, A., Wu, Y., Ping, H., and Chen,
Q. (2023). Agentsims: An open-source sandbox
for large language model evaluation. arXiv preprint
arXiv:2308.04026.
Maslow, A. (1943). A theory of human motivation. Psy-
chological Review google schola, 2:21–28.
Park, J. S., O’Brien, J., Cai, C. J., Morris, M. R., Liang, P.,
and Bernstein, M. S. (2023). Generative agents: Inter-
active simulacra of human behavior. In Proceedings
of the 36th Annual ACM Symposium on User Inter-
face Software and Technology, UIST ’23, New York,
NY, USA. Association for Computing Machinery.
PLUTCHIK, R. (1980). Chapter 1 - a general psychoevolu-
tionary theory of emotion. In Plutchik, R. and Keller-
man, H., editors, Theories of Emotion, pages 3–33.
Academic Press.
Shao, Y., Li, L., Dai, J., and Qiu, X. (2023). Character-
llm: A trainable agent for role-playing. arXiv preprint
arXiv:2310.10158.
Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang,
J., Chen, Z., Tang, J., Chen, X., Lin, Y., Zhao, W. X.,
Wei, Z., and Wen, J. (2024). A survey on large lan-
guage model based autonomous agents. Frontiers of
Computer Science, 18(6).
Wang, Z., Chiu, Y. Y., and Chiu, Y. C. (2023). Humanoid
agents: Platform for simulating human-like generative
agents. arXiv preprint arXiv:2310.05418.
MOTIF: A Framework for Enhancing the Profiling Module of Generative Agents that Simulate Human Behavior
781