MOTIF: A Framework for Enhancing the Proﬁling Module of

Generative Agents that Simulate Human Behavior

Tibério Cerqueira

and Pamela Bezerra

Department of Computer Science, Recife Center for Advanced Studies and Systems (C.E.S.A.R), Recife, Brazil

Keywords:

Artiﬁcial Intelligence, LLM, Generative Agents, Human Behavior Simulation, Proﬁling.

Abstract:

Recent advances in Large Language Models (LLMs) have made the development of architectures that con-

vincingly simulate human behavior possible. These architectures give rise to generative agents (GA), a new

class of intelligent agents capable of carrying out human activities such as forming opinions, initiating dia-

logues, and planning the day. These experiences are stored as natural language and later transformed into

reﬂections, which are then used to guide future actions. Some of the advantages of GA are the ability to op-

erate in dynamic and open environments, interact with other agents in a more human-related way, and adapt

to changes. These agents, however, require a complex development process. Given this, this study proposes

MOTIF, a framework for facilitating and speeding up the initial stage of building these agents, known as pro-

ﬁling. This stage is responsible for deﬁning the agents’ identities and personalities. However, proﬁling is very

subjective and lacks a standard process, with some solutions manually writing each proﬁle, while others use

LLMs. MOTIF combines both manual and LLM-based methods to enable the development of agents with

well-deﬁned personalities and identities. Additionally, it provides a way of standardizing and formalizing the

proﬁling stage, creating the basis for future research in this ﬁeld.

1 INTRODUCTION

In computer science, an agent is typically deﬁned as a

software system situated in an environment, capable

of autonomous and proactive actions to achieve spe-

ciﬁc objectives. Agents can perceive and interact with

their surroundings, acquiring context information to

act on behalf of users or collaborate with other en-

tities. (Cetnarowicz, 2015). The main features that

distinguish agents from usual software systems are

their (1) autonomy for starting and ﬁnishing different

tasks, (2) capability of reacting to surrounding envi-

ronment, (3) ability to interact with other agents and

the user, and (4) code persistence, i.e., it runs continu-

ously. The most popular example of such systems are

chatbots and virtual assistants.

Recent advances in Deep Learning (DP), espe-

cially with the development of generative models,

such as the Large Language Models (LLMs), re-

shaped the way society produces and consume artiﬁ-

cial intelligent (IA). Apart from processing immense

amounts of data and identifying complex patterns,

generative models are capable of creating new con-

https://orcid.org/0009-0001-5565-2255

https://orcid.org/0000-0002-5067-1617

tent in different formats, such as text and image. It is

worth noticing that the most popular generative solu-

tions, such as ChatGPT and Midjourney, use chatbots

to better interact with users. This not only resulted in

a fast adoption of these technologies and the increase

in popularity of agents, but also the creation of a new

type of agent: the generative agent (GA).

The term "generative agent" ﬁrst appeared in

the innovative paper "Generative Agents: Interactive

Simulacra of Human Behaviour"(Park et al., 2023).

According to the authors, a GA is an agent that uses

generative models to simulate believable and relat-

able human behaviour. Instead of simply reacting to

the current user request’s, these agents have a com-

plex architecture that enable them to make inferences

about themselves and other agents, reﬂect about their

actions, and plan the future. They not only have a

memory mechanism, but also contain different per-

sonal information assigned to them, such a career, re-

lationships, interests and life goals. GA brings, there-

fore, great opportunities to support and advance dif-

ferent research areas, such as Pervasive Computing,

Human-Computer Interactions (HCI), and game de-

velopment. In HCI, for example, GA could be used

to simulate different users and facilitate software in-

terface tests and validations.

774

Cerqueira, T. and Bezerra, P.

MOTIF: A Framework for Enhancing the Proﬁling Module of Generative Agents that Simulate Human Behavior.

DOI: 10.5220/0013181800003890

In Proceedings of the 17th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2025) - Volume 3, pages 774-781

ISBN: 978-989-758-737-5; ISSN: 2184-433X

Developing these agents, however, is a complex

tasks requiring multidisciplinary teams. A recent sur-

vey on the topic (Wang et al., 2024) identiﬁed four key

modules in most agents architectures: proﬁling, mem-

ory, planning, and action, which combined are re-

sponsible to provide the agents with personality, short

or long term memory, and thinking skills. Most of

the solutions use DP or LLM models to improve the

memory and/or planning modules, but little progress

was made to the proﬁling module, which is directly

responsible for deﬁning the agents personality and be-

haviour. Additionally, there is no standard proﬁling

method. The main challenges faced by GA, however,

are related to the agent’s identity and behaviour, such

as realistic simulating different roles or properly in-

structing the agents how to behave according to soci-

etal norms.

Given the above, this paper proposes MOTIF, a

proﬁling framework for developing GA that emulate

the human behaviour. MOTIF aims at optimizing

the proﬁling process while making more human-like

and realistic agents, i.e., agents with more relatable

behaviour and less artiﬁcial responses and actions.

It combines two common proﬁling methods: hand-

crafting, which consist of manually describing the

agents’ character, with LLMs based models to bet-

ter process data and automate the simulation of the

described personality. This combination has the po-

tential of harnessing the advantages of handcrafting

(detailed and complete description of an agent) with

the facilities of LLMs (faster and more scalable way

of creating agents). Additionally, MOTIF extensively

explore different aspects of the human psychology,

such as emotions, traits, and goals, which produces

an holistic representation of the human experience.

This enables the agents to exhibit complex, human-

like behaviors in simulated environments, enhancing

the depth and realism of agent interactions. Finally,

as a platform-agnostic prototype, which can be eas-

ily integrated to other GA systems, MOTIF seeks to

develop a standard process for proﬁling.

This paper is organized as follows: Section 2 de-

tails the concepts of GA and proﬁling; Sections 3 and

4 describes MOTIF and the experiments performed,

respectively; and ﬁnally, Section 5 summarizes the

main conclusions and future work.

2 RELATED WORKS

2.1 Generative Agents

The authors of (Wang et al., 2024) bring an exten-

sive survey on recently published papers on Genera-

tive Agents (GA). The main goal of the study was to

identify how these agents’ are developed, evaluated

and applied. In terms of development, the authors

identiﬁed four common modules:

1. Proﬁling - deﬁnes the agents’ personality traits

and other psychological and social information,

such as career, relationships, and interests. Since

it deﬁnes the thinking-process and role of each

agent, it is an important aspect for decision-

making;

2. Memory - stores environment information which

are then used for future planning and decision

making. This module usually simulates the hu-

man short and long-term memories;

3. Planning - consists of strategies for planning and

tackling different tasks. It employs algorithms to

make adjustments to the initial plan if necessary;

4. Action - translates decisions into actions, each

one with speciﬁc goals, methods and expected re-

sults.

Despite great progress, (Wang et al., 2024) list

some challenges for developing convincing and re-

latable GA, most of which are related to the agents’

behaviour, such as (1) Realistically simulating spe-

ciﬁc roles, such as one based on career and age (e.g.,

a programmer usually does not have deep knowledge

on human anatomy); (2) The development of robust

prompts, i.e., high quality, clear, and precise instruc-

tions, to better guide the agents; (3) Hallucination, a

common problem to LLMs, which lead agents to cre-

ate false information that impact future decisions.

The work of (Park et al., 2023) try to solve these

challenges by expanding the use of LLMs for devel-

oping more complex memory and planning modules.

The key point of this work is to transform memories

into high level reﬂections about the environment and

the self. These reﬂections are then used to plan deci-

sions and actions. Additionally, this ﬂow of data from

memory to reﬂection and then to plan is dynamic and

cyclical, with actions and plans becoming memories

again.

Other studies expands on this approach. The work

of (Li et al., 2023) focus on human collaboration and

logical thinking to propose an alternative method for

memory retrieval. Meanwhile, the work in (Wang

et al., 2023) implements the thinking-process deﬁned

by the seminal psychology book "Thinking, Fast and

Slow" to better model the way humans think and adapt

to their environment. These works, however, focus on

the memory and planning modules, completely ignor-

ing the proﬁling module, which is directly related to

the agents personality and the challenge of developing

robust prompts as mentioned in (Wang et al., 2024).

MOTIF: A Framework for Enhancing the Proﬁling Module of Generative Agents that Simulate Human Behavior

775

2.2 Proﬁling

The main goal of proﬁling is to deﬁne the roles, per-

sonality, and characteristics of agents, which signiﬁ-

cantly inﬂuence their behavior and interactions within

simulated environments. Agent proﬁles typically in-

clude three key categories of information: basic de-

mographic data, psychological traits, and social infor-

mation.

Recent studies emphasize the importance of pro-

ﬁling in creating well-deﬁned agent personalities, as

assigning speciﬁc roles to autonomous agents can en-

hance their effectiveness in representing their desig-

nated roles (Chen et al., 2023). However, as far as our

knowledge goes, little progress was made in this mod-

ule. Most papers on GA either don’t mention proﬁl-

ing or brieﬂy explain the process used. Additionally,

the lack of standardization in agent proﬁling has led

to the emergence of diverse methods, ranging from

manual approaches (handcrafting) to those utilizing

LLMs for automation (LLM-based models). These

methods face challenges in scalability, diversity and

precision. Issues like bias, overly formal communi-

cation, and endogeneity further complicate the simu-

lation of realistic behaviors (Park et al., 2023), (Gui

and Toubia, 2023). While handcrafted proﬁles offer

detail, they are laborious and expensive, as lengthy

backstories are often needed to generate believable

agents (Lin et al., 2023). These limitations hinder the

efﬁciency and depth required for high-quality simula-

tions

Some of the works that propose improvements for

this module are (Lin et al., 2023), (Shao et al., 2023),

and (Wang et al., 2023). The authors of (Lin et al.,

2023) developed an intuitive interface (GUI) to sup-

port handcrafting. Meanwhile, the work in (Shao

et al., 2023) proposes a new automated process named

"Experience Upload" to simulate historical charac-

ters. In this process, the proﬁle of famous personali-

ties, such as Cleopatra or Shakespeare, are collected

through web scrapping from sources like Wikipedia.

Their life experiences are then extracted and pro-

cessed using LLM to instruct the agents how to talk

and behave accordingly. Finally, (Wang et al., 2023)

use the famous Maslow Hierarchy (Maslow, 1943) to

incorporate the basic human needs and emotions into

handcrafting. This approach was one of the ﬁrsts to

bring psychology and emotion models to better guide

the proﬁling module.

3 MOTIF: A FRAMEWORK FOR

GENERATIVE AGENT

PROFILING

To address the proﬁlling challenges previously dis-

cussed (Section 2.2), this paper proposes MOTIF, a

novel framework designed to unify and structure the

creation of proﬁles for GA simulating human behav-

iors. MOTIF combines both handcrafting and LLM-

based methods through an intuitive interface to fa-

cilitate the process (Section 3.6). The handcrafting

step consist of ﬁve stages: (1)’Who am I?’, (2) At-

tributes, (3) Traits, (4) Emotions, and (5) Goals, each

capturing different aspects of human behaviour and

psychology. These stages enables a more structured

method of exploring different personalities while pro-

viding complexity to the agents. As described in the

following Sections (3.1 to 3.5) many of these stages

uses options, checkboxes, and grading scales to col-

lect information. After completing these stages, the

information is then passed to an LLM through prompt

engineering. This automatic step instructs the GA to

behave accordingly to the human aspects informed in

the ﬁrst stage.

The framework aims, therefore, to streamline the

proﬁle creation process, enhance consistency and re-

peatability of agent behaviors, and enable ﬁne-tuning

of agent characteristics for speciﬁc simulation re-

quirements. The ﬁve stages are described as follows.

3.1 "Who Am I?"

The “Who Am I?” stage serves as the initial step in

deﬁning the agent’s identity, providing the core foun-

dation for its proﬁle. This stage begins with a free-

form text description that includes demographic de-

tails such as name, gender, age, and occupation, along

with more nuanced elements like background and key

life events. These characteristics shape how the agent

interacts within the simulation. For example, users

may include details about an agent’s achievements,

struggles, or signiﬁcant personal experiences, as these

enrich the proﬁle and enhance the agent’s realism

(Chen et al., 2023). This step is crucial for deﬁning

an agent’s personality, contributing to more believable

and credible behavior during simulations.

While users are encouraged to include as much de-

tail as possible (minimum of 200 characters is recom-

mended), the focus is on capturing the essence of the

agent, rather than relying on length alone to achieve

realism. MOTIF’s proposed graphical user interface

(GUI) (Section 3.6) supports this process by offering

suggestions and prompts to help users develop com-

prehensive and creative descriptions for their agents.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

776

3.2 Attributes

The second stage, “Attributes,” introduces a quantita-

tive approach to deﬁning an agent’s personality, draw-

ing inspiration from character creation systems in

role-playing games (RPG) such as THE SIMS. Users

assess the agent’s characteristics across 20 distinct at-

tributes, which are divided into three key categories:

Emotional, Intellectual, and Social. Each attribute is

scored on a scale from 0 to 10, providing a structured

way to represent the agent’s traits. This method al-

lows for the creation of more complex, multifaceted

personalities.

Emotional attributes, such as empathy or cru-

elty, inﬂuence how agents perceive and emotionally

respond to situations. Intellectual attributes, like in-

telligence and curiosity, shape the agent’s cognitive

abilities and problem-solving skills. Lastly, social

attributes, including charm and sincerity, determine

how well the agent interacts with others. By quanti-

fying these characteristics, the “Attributes” stage en-

hances the efﬁciency and consistency of proﬁle cre-

ation, ensuring that agents exhibit coherent and be-

lievable behavior across various scenarios an interac-

tions.

3.3 Traits

MOTIF’s third stage, named “Traits,” is based on the

Five-Factor Model of personality (Goldberg, 1990),

a widely accepted approach in psychology that as-

sesses personality across ﬁve key dimensions: Open-

ness to Experience, Conscientiousness, Extraversion,

Agreeableness, and Neuroticism. Each trait is quan-

tiﬁed on a scale from 0 to 5, providing a granular

representation of the agent’s personality. This al-

lows MOTIF to ground the agent’s behavior in an em-

pirically supported model, ensuring that the agent’s

traits are comprehensive and psychologically consis-

tent and enhancing the realism of agent simulations.

Openness assesses creativity and a willingness

to explore new ideas, while Conscientiousness re-

ﬂects an agent’s level of discipline and organization.

Extraversion evaluates sociability and assertiveness,

while Agreeableness captures empathy and cooper-

ation. Finally, Neuroticism measures emotional sta-

bility and the ability to handle stress.

3.4 Emotions

The fourth stage focuses on reﬁning the agents’ emo-

tional sensitivity by simulating a range of emotions

based on the inﬂuential work of Paul Ekman and

Robert Plutchik (Ekman, 1992), (PLUTCHIK, 1980).

This stage provides a quantitative model for assess-

ing emotional responses, where each of the eight ba-

sic emotions (Joy, Sadness, Fear, Trust, Surprise,

Anger, Anticipation, and Disgust) are evaluated on

a scale from 0 to 10. Emotions inﬂuence decision-

making and social interactions, and the ability to cus-

tomize emotional sensitivity helps create agents that

display more complex and human-like behavior in

simulated environments. The quantitative approach

provides developers with a powerful tool to adjust the

emotional range of agents and a better control over

how intensely an agent experiences and reacts to these

emotions.

3.5 Goals

The ﬁfth and ﬁnal stage, “Goals,” deﬁnes the am-

bitions and objectives that drive an agent’s behav-

ior, mirroring human motivations. This stage cap-

tures the agent’s primary and secondary goals, in

which primary goals are long-term, overarching as-

pirations (e.g., “Graduate from medical school”), and

secondary goals are shorter-term or situation-speciﬁc

objectives (e.g., “Organize a graduation party”). To-

gether, these goals reﬂect personal and professional

aspirations, guiding, therefore, the agent’s decision-

making processes and its actions in various simulated

environments. It also enriches the simulation by en-

suring that the agent’s behavior remains consistent

with its aspirations.

3.6 Interface Prototype

To enhance MOTIF’s usability, a medium-ﬁdelity

graphical user interface (GUI) prototype was devel-

oped using Adobe XD. This prototype summarizes

how users might interact with MOTIF to construct

GA proﬁles, providing a user-friendly approach to

specifying input parameters and offering greater con-

ﬁguration ﬂexibility.

The prototype features ﬁve screens, each corre-

sponding to a stage of the MOTIF framework, ensur-

ing a structured progression through the proﬁle cre-

ation process. Each screen includes clear instructions

and supporting text, guiding users through the nu-

ances of each stage. This design choice aims to make

the complex process of agent proﬁle creation more in-

tuitive and accessible to a wide range of users. Figure

1 shows the screen designed for the traits stage. Click

here to access the MOTIF’s interface prototype.

MOTIF: A Framework for Enhancing the Proﬁling Module of Generative Agents that Simulate Human Behavior

777

Figure 1: MOTIF Interface Prototype Stage III: Traits.

4 EXPERIMENTS

To evaluate MOTIF, two types of tests were per-

formed: (1) system tests (Section 4.1) and (2) us-

ability tests (Section 4.2). System tests aims to ob-

serve MOTIF’s performance in efﬁciently generating

rich and relatable agents. This tests consist of using

an LLM to simulate an agent developed using MO-

TIF and through a series of scenarios and questions

verify if the agent is acting according to the instruc-

tions provided. Meanwhile, usability tests exams if

the proposed interface do help users during the proﬁl-

ing stage.

4.1 System Tests

System tests were conducted to assess the frame-

work’s effectiveness in creating believable and coher-

ent agent proﬁles, evaluating both personality struc-

turing and the language model’s ability to maintain

designated roles. Tests were performed using Ope-

nAI Playground with GPT-4o, developing three dis-

tinct agents through MOTIF.

Given the lack of standardized evaluation methods

for generative agents, we developed two novel test-

ing approaches: (1) ethical dilemma scenarios to as-

sess decision-making consistency, and (2) psychome-

tric assessments using the Big Five personality test.

These methods were designed to both validate MO-

TIF and contribute to broader evaluation methodolo-

gies for generative agents.

The testing process comprised four stages: envi-

ronment setup, system initialization, character de-

velopment, and testing scenarios. The environment

was conﬁgured to promote realistic responses while

preventing hallucination (temperature: 1.1, max to-

kens: 1000, top-p: 1, frequency penalty: 0.1, presence

penalty: 0).

4.1.1 System Initialization

To properly instruct the LLM model about MOTIF

and the tests to be executed, the following initializa-

tion prompt was developed:

"You are tasked with simulating a character based

on the detailed proﬁle provided. Adhere strictly to

the deﬁned personality and background. Remain in

character at all times, unless a message begins with

’Analysis Mode.’ When in Analysis Mode, switch to

your standard ChatGPT voice to address analytical

inquiries or provide clariﬁcations outside of the char-

acter simulation. Proﬁle Overview: [description of

each stage of MOTIF (Section 3)]. Simulate a char-

acter based on the following proﬁle: [the data given

on each of the stages]"

This prompt prepares the model for adopting the

framework, including a description of the character

it would simulate and instructions to stay in charac-

ter unless the "Analysis Mode" command is given.

The "Analysis Mode" functionality was introduced

because, during initial tests, it became apparent that

there was a need to ask meta-analytical questions, i.e.,

questions that require the model to explain the logic

or motivation behind certain character actions or re-

sponses. This functionality enables, therefore, a more

comprehensive evaluation of the framework, enabling

researchers to delve into the model’s decision-making

processes and gain insights into how the character’s

actions and responses are generated.

4.1.2 Characters and Tests

Three distinct characters were developed using MO-

TIF: Anne (20, engineering student), Tina (25, physi-

cian), and Mariana (27, lawyer). Each character was

designed with a comprehensive proﬁle including per-

sonality traits, attributes, emotions, and goals, as de-

tailed in the framework (Section 3).

Anne represents a dedicated but anxious stu-

dent balancing academic pressures, Tina embodies a

career-focused physician with diverse interests, and

Mariana embodies a recently graduated lawyer fo-

cused on social causes and professional development.

To evaluate behavioral consistency, each character

was assigned an "alter ego" with contrasting person-

ality traits while maintaining their basic demographic

background.

The evaluation comprised two components: ethi-

cal dilemma scenarios and psychometric assessments.

For ethical testing, characters faced moral situa-

tions involving workplace ethics, resource allocation,

and professional loyalty. The Big Five Personality

Test (Goldberg, 1992), using the version available at

Open-Source Psychometrics Project, provided quan-

titative assessment through a 50-question evaluation.

Results were compared against predeﬁned MOTIF

traits to verify the model’s ability to maintain consis-

tent personalities during extended interactions.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

778

As noted by (Wang et al., 2024), GA evaluation

typically employs two types of metrics: objective

metrics that assess answer accuracy and task perfor-

mance, and subjective metrics that evaluate response

quality and agent interactions. In our evaluation,

the psychometric tests served as objective metrics by

measuring trait accuracy, while the ethical dilemmas

provided subjective metrics by assessing decision-

making consistency and response quality.

4.1.3 Ethical Dilemma Scenarios

The experiments involved three original ethical

dilemmas speciﬁcally designed by the author to test

the consistency and depth of the simulated personal-

ities. Each dilemma was crafted to present a speciﬁc

moral challenge that would engage different aspects

of the characters’ deﬁned personalities and values.

In the ﬁrst scenario, engineering student Anne

faced an ethical challenge in the workplace when

she discovered that her only work friend was steal-

ing recyclable materials from the company. The core

dilemma centered on the conﬂict between loyalty to

a friend and professional integrity, complicated by

the fact that the stolen items were already designated

for disposal. This situation tested how the character

would balance personal relationships with corporate

ethics, particularly given Anne’s high empathy and

loyalty traits. Both the original Anne and her "alter

ego" chose not to report the theft directly, although

their reasoning differed, revealing a tendency in the

language model to avoid extreme unethical actions

even when simulating negative personalities.

The second test presented lawyer Mariana with a

resource allocation dilemma: choosing between us-

ing funds for career advancement through a high-

proﬁle departmental project or supporting a commu-

nity ﬁnancial literacy program. The central challenge

lay in weighing personal and professional growth

against social responsibility, directly testing the char-

acter’s deﬁned values and priorities. Mariana chose

to prioritize social beneﬁt over personal gain, align-

ing with her deﬁned altruistic traits, while her "al-

ter ego" opted for career advancement, demonstrat-

ing how variations in personality attributes inﬂuenced

decision-making.

In the ﬁnal test, Dr. Tina faced a professional

ethics dilemma when choosing between recommend-

ing a close friend with adequate qualiﬁcations or a

more experienced but less personally connected can-

didate for a position. This scenario tested the balance

between personal loyalty and professional responsi-

bility, particularly challenging given Tina’s high em-

pathy and loyalty traits combined with her career am-

bitions. Surprisingly, both Tina and her "alter ego"

selected the more qualiﬁed candidate, albeit with dif-

ferent rationales, highlighting the model’s ability to

consider multiple factors in decision-making, includ-

ing professional ethics and long-term consequences.

Overall, these tests demonstrated the framework’s

capacity to generate nuanced, context-sensitive be-

haviors that generally aligned with the deﬁned per-

sonality proﬁles, while also revealing limitations not

only within the framework but also in the LLM used.

4.1.4 Psychometric Assessments

The quantitative analysis involved administering the

Big Five Personality Test to each simulated character

and comparing the results to their predeﬁned trait val-

ues in the MOTIF framework. This approach assessed

how accurately the language model maintained con-

sistent personality traits across extended interactions.

The test results were normalized to a 0-5 scale to align

with the framework’s initial trait deﬁnitions. Graphi-

cal comparisons (Figures 2 to 7) were made between

the user-deﬁned proﬁles (represented by a blue line

labeled "User Proﬁle") and the test outcomes (repre-

sented by an orange line labeled "Test Result"). The

proximity of these lines indicates the degree of adher-

ence to the intended personality traits.

In Test 1 with Anne (Figure 2), results showed

close alignment between the user-deﬁned proﬁle and

test outcomes for most traits. The largest discrepancy

was observed in Conscientiousness, with a difference

of approximately 1.5 points. Anne’s "alter ego" (Fig-

ure 3) demonstrated even closer alignment across all

traits, possibly due to the less nuanced nature of its

personality conﬁguration.

Figure 2: Psychometric test result of ’Good’ Anne.

For Test 2 with Mariana (Figure 4), the closest

alignments were observed in Extraversion and Con-

scientiousness, with differences of less than 1 point.

However, larger discrepancies of up to 2 points were

noted in Neuroticism and Agreeableness. These vari-

ations were attributed to the model’s interpretation of

other character attributes, such as high empathy and

social involvement. Mariana’s "alter ego" (Figure 5)

again showed closer overall alignment.

MOTIF: A Framework for Enhancing the Proﬁling Module of Generative Agents that Simulate Human Behavior

779

Figure 3: Psychometric test result of ’Bad’ Anne.

Figure 4: Psychometric test result of ’Good’ Mariana.

Figure 5: Psychometric test result of ’Bad’ Mariana.

Test 3 with Dr. Tina (Figure 6) yielded promis-

ing results in Extraversion and Openness, with dif-

ferences of about 0.5 points. Similar to Mariana,

larger discrepancies were observed in Neuroticism

and Agreeableness, reaching up to 2.5 points for Neu-

roticism. These differences were hypothesized to re-

sult from the model’s consideration of other character

traits like conﬁdence and patience. Dr. Tina’s "alter

ego" (Figure 7) exhibited the closest alignment among

all tests, further supporting the observation that less

complex personalities were more consistently simu-

lated by the model.

4.2 Usability Tests

Usability tests were conducted with ﬁve professional

designers to evaluate MOTIF interface’s effectiveness

and areas for improvement. The testing group con-

sisted of two UI designers with two years of profes-

sional experience in interface development, two re-

cent design graduates, and one designer currently pur-

Figure 6: Psychometric test result of ’Good’ Tina.

Figure 7: Psychometric test result of ’Bad’ Tina.

suing a master’s degree in design, all holding bache-

lor’s degrees in design. The designers rated the in-

terface’s ease of use and intuitiveness (how easily

new users could understand and navigate the inter-

face) with scores ranging from 3 to 5 out of 5, with

an average of 3.8. When evaluating the overall design

quality on a scale of 1 to 10, considering aspects such

as information hierarchy, accessibility, color scheme,

typography, and visual organization, the interface re-

ceived scores between 6 and 8. The designers partic-

ularly appreciated the clear stage progression indica-

tors and well-structured customization tables through-

out the interface. The design was generally consid-

ered appropriate for creating proﬁles, with designers

noting that the interface successfully fulﬁlls its pri-

mary purpose while maintaining a professional, albeit

austere, appearance.

Several key areas for improvement were identiﬁed

through the feedback. The visual hierarchy needs re-

ﬁnement, with suggestions for improving text spac-

ing, adjusting heading sizes, and implementing a con-

sistent grid system across all pages. Designers rec-

ommended enhancing visual appeal through more vi-

brant color schemes while maintaining professional-

ism. A signiﬁcant concern was raised about the nav-

igational structure in Stage 2 (Attributes), where at-

tributes are currently displayed through a sequential

arrow-based system. Designers noted that this se-

quential presentation could cause users to miss im-

portant information, as they might not realize there

are additional attributes to review. They recom-

mended replacing this with either drop-down menus

or a side-by-side listing of all attributes, making all

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

780

options immediately visible to users. Additional sug-

gestions included incorporating more icons for better

user guidance and reorganizing lengthy text sections

into collapsible menus to reduce visual clutter. These

improvements would enhance user experience while

maintaining the interface’s functionality for creating

generative agent proﬁles.

5 CONCLUSIONS

This paper proposes MOTIF, a framework for en-

hancing the proﬁling module of generative agents

(GA) that simulate human behaviour. MOTIF com-

bines handcrafting and LLM-based methods to make

the proﬁling stage faster and more intuitive, while

still creating agents with realistic personalities. The

framework consists of 5 stages expanding on differ-

ent aspects of human behaviour and psychology and

employs a graphical interface to better guide the user

in describing the desired agents.

Experiments showed promising results, validating

MOTIF’s effectiveness in creating consistent and be-

lievable agent personalities. The ethical dilemma tests

showed that agents made decisions aligned with their

designated characteristics, while the psychometric as-

sessments revealed a strong positive correlation be-

tween the agents’ responses and their predeﬁned trait

values.

Nevertheless, this research requires a more com-

prehensive analysis of the framework to better vali-

date its performance. Additionally, it is import to in-

tegrate MOTIF with existing GA platforms to observe

the gains it brings to different simulations. Given

this, future research can be summarized as: (1) Ex-

plore different personalities by creating a more di-

verse group of characters to observe the framework’s

ability to produce a variety of behaviours; (2) De-

velop more scenarios, with and without ethical dilem-

mas, to observe the many responses a agent can have

under different circumstances; (3) The use of other

psychological tests, such as Myer-Briggs (MBTI); (4)

A comparison with other methods, specially hand-

crafting ones, to observe if MOTIF uses less tokens

and time to produce similar results (currently, MOTIF

uses 400 tokens per agent generation) and ﬁnally (5)

Integrate MOTIF to a existing GA architecture, such

as the one in proposed in (Park et al., 2023) to ob-

serve how the agents behave collectively and evolve

overtime;

REFERENCES

Cetnarowicz, K. (2015). Introduction to the Subject of an

Agent in Computer Science, pages 1–5. Springer In-

ternational Publishing, Cham.

Chen, W., Su, Y., Zuo, J., Yang, C., Yuan, C., Chan, C.-

M., Yu, H., Lu, Y., Hung, Y.-H., Qian, C., Qin, Y.,

Cong, X., Xie, R., Liu, Z., Sun, M., and Zhou, J.

(2023). Agentverse: Facilitating multi-agent collab-

oration and exploring emergent behaviors.

Ekman, P. (1992). Are there basic emotions? Psychol. Rev.,

99(3):550–553.

Goldberg, L. R. (1990). An alternative “description of per-

sonality”: the big-ﬁve factor structure. J. Pers. Soc.

Psychol., 59(6):1216–1229.

Goldberg, L. R. (1992). The development of markers for the

big-ﬁve factor structure. Psychological Assessment,

4(1):26–42.

Gui, G. and Toubia, O. (2023). The challenge of using llms

to simulate human behavior: A causal inference per-

spective. SSRN Electronic Journal.

Li, Y., Zhang, Y., and Sun, L. (2023). Metaagents: Simu-

lating interactions of human behaviors for llm-based

task-oriented coordination via collaborative genera-

tive agents. arXiv preprint arXiv:2310.06500.

Lin, J., Zhao, H., Zhang, A., Wu, Y., Ping, H., and Chen,

Q. (2023). Agentsims: An open-source sandbox

for large language model evaluation. arXiv preprint

arXiv:2308.04026.

Maslow, A. (1943). A theory of human motivation. Psy-

chological Review google schola, 2:21–28.

Park, J. S., O’Brien, J., Cai, C. J., Morris, M. R., Liang, P.,

and Bernstein, M. S. (2023). Generative agents: Inter-

active simulacra of human behavior. In Proceedings

of the 36th Annual ACM Symposium on User Inter-

face Software and Technology, UIST ’23, New York,

NY, USA. Association for Computing Machinery.

PLUTCHIK, R. (1980). Chapter 1 - a general psychoevolu-

tionary theory of emotion. In Plutchik, R. and Keller-

man, H., editors, Theories of Emotion, pages 3–33.

Academic Press.

Shao, Y., Li, L., Dai, J., and Qiu, X. (2023). Character-

llm: A trainable agent for role-playing. arXiv preprint

arXiv:2310.10158.

Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang,

J., Chen, Z., Tang, J., Chen, X., Lin, Y., Zhao, W. X.,

Wei, Z., and Wen, J. (2024). A survey on large lan-

guage model based autonomous agents. Frontiers of

Computer Science, 18(6).

Wang, Z., Chiu, Y. Y., and Chiu, Y. C. (2023). Humanoid

agents: Platform for simulating human-like generative

agents. arXiv preprint arXiv:2310.05418.

MOTIF: A Framework for Enhancing the Proﬁling Module of Generative Agents that Simulate Human Behavior

781