Impact of Team Models in Hierarchical Human-Agent Decision-Making

Teams

Ranjani Narayanan

and Karen M. Feigh

School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, U.S.A.

{rnarayanan35, karen.feigh}@gatech.edu

Keywords:

Mental Models, Decision Support, Human-AI Teaming, Decision-Making.

Abstract:

With increasing opportunities for the integration of AI-based decision support tools, human interactions with

AI systems must be studied under different contexts. The quality of interactions is said to improve when

humans have a mental model (MM) of their AI teammates. This study tests this hypothesis for a hierarchical

human-AI triad in a decision-making setting. The impact of humans’ MM of AI on various performance

metrics is studied in the presence and absence of mutually compatible MMs between the two agents. Mutually

incompatible models lead to dissonance between the agents, causing an overall degradation in their joint

activity. Results showed that operators exercised increased vigilance when they did not have a MM of their

AI teammates, but having an accurate MM improved decision utility over time without reducing speed or

increasing participants’ task load. It also led to greater acceptance of the AI’s suggestions without inducing

biases towards the AI. Additionally, operators reported lesser effort and mental demand and had more accurate

judgments of the relative competence of the two agents during dissonance. These ﬁndings motivate further

research on understanding how different levels of MMs between humans and their AI teammates leads to

different outcomes in complex collaborative settings.

1 INTRODUCTION

Literature suggests that the inclusion of AI as a team-

mate, as opposed to a mere tool, would enhance per-

formance, the quality of interactions, and humans’

satisfaction of working with the AI, as a teammate

would be able to adapt to their human counterparts

(Andrews et al., 2023; O’Neill et al., 2022). Team

members must have “clearly deﬁned differentiated

roles and responsibilities, hold task-relevant knowl-

edge, and be interdependent (i.e., must rely on one

another to accomplish goals). Teams can be distin-

guished from groups, in which members have homo-

geneous expertise, roles, and responsibilities” (Con-

verse et al., 1993; Orasanu and Salas, 1993). Per this

deﬁnition, the portrayal of humans and AI collectives

as teams is dubious (Groom and Nass, 2007) as their

interactions have largely been studied as groups.

A team decision process “involves gathering, pro-

cessing, integrating, and communicating information

in support of arriving at a task-relevant decision”

(Converse et al., 1993). They state that this activity

https://orcid.org/0000-0003-4351-6501

https://orcid.org/0000-0002-0281-7634

does not always require consensus between individ-

uals, nor are all individuals involved in all aspects of

the process. The role of members is to ﬁlter out irrele-

vant information, apply their knowledge and skills to

solve role-speciﬁc problems, communicate the rele-

vant information with other team members, and make

recommendations. With this deﬁnition, a simple hier-

archy would entail assigning authority of the ﬁnal de-

cision to a single individual, while the team supports

the decision-maker with assessments and information

as per the given situation. Due to a host of legal, ethi-

cal, moral, safety, and trust-related issues surrounding

entrusting authority to automation (Awad et al., 2018;

Kingston, 2018; Bartneck et al., 2021), we posit that

humans will likely continue to be placed at the apex

of the decision-making ladder for safety-critical situ-

ations. Human-AI or AI-only teams would function

towards supporting the decision-maker.

Mental Models have been used as a basis for

understanding human-human and human-automation

interactions. Johnson-Laird originally described a

MM as the mechanism by which humans understand

the world, creating working models in their minds.

These are abstract, long-term knowledge structures

used to describe, explain, and predict the world

452

Narayanan, R. and Feigh, K. M.

Impact of Team Models in Hierarchical Human-Agent Decision-Making Teams.

DOI: 10.5220/0013097400003912

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2025) - Volume 1: GRAPP, HUCAPP

and IVAPP, pages 452-463

ISBN: 978-989-758-728-3; ISSN: 2184-4321

around (Johnson-Laird, 1983). Shared Mental Mod-

els (SMMs) extend the MM concept to teams. The

SMM theory hypothesizes that if individuals in a team

possess similar MMs of the shared task and of each

other they can effectively establish mutual awareness

that helps in inter-agent information sharing and ex-

pectation setting. It also fosters trust and accountabil-

ity in teammates and facilitates anticipatory behavior

during difﬁcult situations without the need for explicit

communication (Kleinman and Serfaty, 1989). These

capabilities are crucial for effective collaboration be-

tween teammates, especially when they are funda-

mentally different such as humans and AI agents.

(Converse et al., 1993) laid out four components

that constitute a MM: 1) Equipment Model, 2) Task

Model, 3) Team Interaction Model, and 4) Team

Member Model. This taxonomy has been reduced

based on the two content-speciﬁc domains: 1) the

Task model and 2) the Team Model (Scheutz et al.,

2017). The former consists of task procedures, possi-

ble outcomes and how to handle them, technical sys-

tems involved in performing the task, and how to per-

form them. On the other hand, the Team Model is

comprised of tendencies, beliefs, and personalities of

team members, how the team is structured, its roles,

modes, and frequency of communications. SMMs

beneﬁt the team by enabling members to form accu-

rate explanations and expectations of the task, coor-

dinate their actions, and adapt their behavior to ac-

commodate the demands of the task and other team

members (Mathieu et al., 2000). However, the com-

plexity and stability of these models are not the same.

Team Models are less stable because they are dynamic

and not just dependent on the situation but also on

the team members involved. Individuals have high

variability in their prior MMs, past experiences, ed-

ucation, socio-economic backgrounds, and individual

personalities, all of which inﬂuence their MM forma-

tion (Converse et al., 1993). Given the necessity to

establish a SMM between humans and AI agents for

successful teamwork, their Team Models make an in-

teresting object of study and warrant attention.

Studies in human-AI teams primarily focus on

three factors: 1) the AI’s MM of the task and team-

mate (human), 2) the human’s MM of the task and

teammate (AI), and 3) the joint (shared) MMs. Within

this framework, we aim to study the utility of ac-

curate Team Models among decision-makers, espe-

cially during inter-agent dissonance. Dissonance be-

tween agents may occur due to inadequate calibration,

faulty information sources, different policies for train-

ing (preferences), improper distribution of informa-

tion, etc. In real-life settings, with growing team sizes

and complicated decision workﬂows, where only key

information may be shared, lack of communication

and effective information exchange may lead to mutu-

ally incompatible MMs between members of a team.

This creates ‘dissonance’ between them. The role of

the operator must be to identify and overcome the un-

desirable effects of such dissonance. Team Models

in human-AI dyads have been beneﬁcial in reducing

operator bias, improving performance, workload, and

trust, as the operator understands the circumstances

under which the AI is reliable (Kulesza et al., 2012;

Walsh et al., 2024; Yang and Dorneich, 2018). This

enables complementary behavior and augments per-

formance that neither humans nor AI can achieve

alone (Kamar, 2016; Kamar et al., 2012).

A recent study (Walsh et al., 2024) provided em-

pirical evidence that mutual understanding between

humans and AI teammates positively inﬂuences per-

formance and workload for decision-making when

humans’ Task Models are imperfect. The work also

identiﬁed the beneﬁts and trade-offs associated with

the human or the AI possessing an accurate Team

Model of the other. However, as with most of these

studies, this work focused on human-AI dyads with

members acting as ‘groupmates’ thereby simulating

relatively simple aspects of collaboration. The dis-

tribution of information within these dyads has been

homogeneous as members have shared common roles

and responsibilities. There is little inter-dependency

and greater overlap, leaving little opportunity to un-

derstand the true aspects of teaming. Therefore, em-

pirical research in human-AI teams must be con-

ducted for multi-agent complex teams to verify if

prior ﬁndings hold across different settings.

In this study, we expand the deﬁnition of mixed-

agent teams by studying a triad of a human decision-

maker equipped with two AI agents for a geospatial

decision-making task. The agents are given informa-

tion speciﬁc to their assignments, while the human

has high-level task information to monitor the agents.

Interdependent activities between the two agents in-

volve ﬁltering out irrelevant information and provid-

ing the human decision-maker with relevant informa-

tion. This can only be achieved when they operate

with mutually compatible ‘MMs’. The two low-level

agents must also generate decision recommendations,

for which they will have a ﬁner-grained task under-

standing than their human manager.

This study aims to empirically verify the utility

of Team Models in hierarchical human-agent (triad)

teams by answering the following research questions:

• RQ1. How does providing a (human) decision-

maker/manager with an accurate Team Model af-

fect performance in mixed hierarchical teams?

• RQ2. How do Team Models affect the decision-

Impact of Team Models in Hierarchical Human-Agent Decision-Making Teams

453

maker’s workload and perception of the AI agents

within a hierarchical team?

2 RELATED WORKS

The focus of design of socio-technical systems

has started shifting towards not just making agents

smarter, i.e. improving their task-related capabilities,

but taking a human-centered approach by trying to un-

derstand humans’ cognitive needs while coordinating

and cooperating with AI agents at large. In this sec-

tion, we brieﬂy visit the concept of mental models to

highlight the role of humans’ understanding of their

AI teammates and how explainable AI has been used

to bridge the gap between humans’ understanding and

the true nature of the AI systems they interact with.

2.1 Mental Models

Humans create mental models of any system they

interact with (Norman, 2013), including AI agents

(Kulesza et al., 2012; Tullio et al., 2007). These mod-

els are not static knowledge structures. They change

on continuous interaction with the system. However,

they are sparse and parsimonious, leading the user to

form useful approximations that they can rely on to

understand the target system (Norman, 2013). Hu-

mans’ mental models can be broadly classiﬁed into

Task or Team Models, depending on the major com-

ponents constituting them. While Task models are

relatively more stable, Team Models are more sus-

ceptible to change (Converse et al., 1993). This is

because Team Models constitute team-speciﬁc char-

acteristics such as the roles, responsibilities, and in-

teraction patterns between team members. It also de-

pends on the situation and the particular team mem-

bers involved. However, as the role of human opera-

tors transitions from handling the task at hand to su-

pervising and coordinating with multiple agents, hu-

mans’ mental models of the AI i.e., their Team Mod-

els will be a major determinant of team success. They

will help humans discern the conditions under which

AI systems may be relied upon, and help them iden-

tify the circumstances within which the operator has

to exercise increased caution and monitor the agents

more closely. (Kulesza et al., 2012) demonstrated

that participants who were able to create sound men-

tal models of a recommender system’s reasoning and

those who most improved their mental models made

the system operate to their satisfaction. A lack of

understanding of the AI agents has shown to induce

algorithmic aversion (Dietvorst et al., 2014), lead-

ing the human to completely avoid using the agents’

suggestions. Alternatively, humans may excessively

rely on AI systems and base their ﬁnal decisions on

the AI’s ﬂawed recommendations, which is termed as

“automation bias” (Mosier and Skitka, 1999; Cum-

mings, 2004; Wagner et al., 2018; Robinette et al.,

2017).

Norman emphasized the role of the designer in

bridging the gap between the conceptual model of

the target system and the actual mental model that

users develop as a result of their interactions (Nor-

man, 2013). To improve the user’s mental model of

the AI, the designer may employ better instruction,

training, improvements in design workﬂows, or pro-

vide the user with different kinds of explanations. A

humans’ mental model of the AI may constitute an

understanding of the underlying algorithm (Kulesza

et al., 2012; Kulesza et al., 2013), its predictions

(Ribeiro et al., 2016), or error boundaries (Bansal

et al., 2019). Crucially, the goal of designers must

be to understand how to reinforce the development of

some or most of these aspects of MMs. In the follow-

ing section, we will address the role of explanations

in improving humans’ MMs of their AI teammates.

2.2 Improving Mental Models

ML algorithms, particularly Deep Neural Networks

(DNNs), work as black-box models that do not pro-

vide much traceability of the predictions or outcomes.

Unlike simplistic rule-based approaches that allow

operators to trace back the steps leading to a deci-

sion, DNNs are complex and opaque. This lack of

interpretability, a key requirement for human reason-

ing, precludes the development of any reliable mental

model of the AI system. Providing explanations to

the user has been a popular approach taken by most

AI researchers and developers. (Mueller et al., 2019)

state that MMs help users reason with the system

they interact with, leading to a generation of explana-

tions. Explanations (provided during system interac-

tion), in turn, help facilitate the development of MMs

by proposing causal relationships or directing user at-

tention. However, there is no guarantee of the effec-

tiveness of an explanation if it does not consider the

user by understanding what they want and what they

already know (Andrews et al., 2023).

For example, one of the factors that inﬂuence the

calibration of humans’ expectations of an AI is pro-

viding explanations for failures, either through natu-

ral language methods (Das et al., 2021), conﬁdence

scores (Zhang et al., 2020), or both. (Yin et al., 2019)

found that a model’s stated accuracy has a signiﬁ-

cant effect on people’s trust even after observing a

high accuracy in practice. However, if a model’s ob-

HUCAPP 2025 - 9th International Conference on Human Computer Interaction Theory and Applications

454

served accuracy is low, then after observing this ac-

curacy, the stated accuracy has at most a very small

effect on people’s trust in the model. But there is a

catch. The stated accuracy may not always correctly

represent an algorithm’s performance. Humans are

less likely to follow the AI recommendation when the

stated score is an incorrect estimate of risk. Addi-

tionally, humans have trouble processing probabilis-

tic information that signiﬁes conﬁdence calibrations

of some AI models (Handmer and Proudley, 2007).

While explanations have a functional role in improv-

ing transparency and understanding, it is not yet clear

under which conditions they improve collaboration

and human productivity (Doshi-Velez and Kim, 2017;

Poursabzi-Sangdeh et al., 2021).

As we transition to multi-agent settings, there is

a need to identify and understand what aspects of the

MM need to be strengthened to enable operators to

function effectively within the team. For example,

a domain expert may be well-equipped to form reli-

able Task Models, but a lack of familiarity with AI-

based DSS will lead to misalignment between their

expectations and what the AI is truly capable of. In

a decision hierarchy, the manager who is responsi-

ble for coordinating with the other agents does not

have access to all the relevant information to inform

their decisions. They may also not have access to the

raw information, or the exact values that the individ-

ual agents assign to key information. Explanations, in

the form of salient features from the raw information

is sometimes used to inform the operators for better

situational assessments (Andrews et al., 2023). Fur-

ther, in time-sensitive situations, they may be unable

to decompose the activities of the individual agents

to identify the root cause of any failures. Therefore,

managers must be able to learn and adapt through ex-

ternal observation, feedback from the environment,

and continuous interaction with the task and team-

mates.

In this study, we hypothesize that providing hu-

man managers with accurate prior MMs of their AI

teammates will equip them to better judge the agents’

capabilities. We believe that they will enable the oper-

ator to reconcile any differences between two agents,

due to dissonance in their alignment/calibration, and

achieve task-related objectives more effectively.

3 METHODOLOGY

This section describes the experimental task, interface

design, and the metrics used to understand the impact

of Team Models on decision-making activity.

3.1 Experimental Task

Participants in our experiment are tasked as decision-

makers (team managers) in a disaster-relief team re-

sponsible for delivering resources to afﬂicted regions

in a storm-struck city. The goal is to place two key

resources, i.e. Food and Generator, based on infor-

mation about the city’s Population distribution and

Power Outage conditions, respectively. Two addi-

tional sources of information, i.e. No-go and Flood-

ing conditions, must be used to inform the placement

of both resources. Two AI agents assist the partici-

pants, each providing suggestions for the placement

of Food and the Generator based on signiﬁcantly de-

tailed data from these information sources. To avoid

information overload, the agents provide a condensed

map of the common information sources i.e., No-

go and Flooding conditions to their manager. This

condensed map is a composite of the two raw infor-

mation sources and is subject to variation between

the two agents based on whether they have mutually

compatible Team Models. In the presence of Team

Models, the raw inputs are processed similarly by

the two agents and the composite map presented by

both agents to the participant is consistent. This is an

underlying feature of Team Models because mutual

awareness of how a partner may perceive certain in-

formation drives consistent and compatible behaviors

in teams. During dissonance, each agent provides its

own version of the composite map, resulting in con-

ﬂicting versions of the same raw information, thereby

simulating an adverse effect of dissonance that may

lead to confounding the end user. The participant may

accept or override either agent’s suggestion and place

the resource at their discretion for each task.

Information sources presented to the participant

are in the form of 8x8 grid maps across which re-

sources, that are 2x2 grids, must be placed. Each

grid on the map corresponds to a speciﬁc color that

indicates the utility of that location. The four colors

(Green, Yellow, Orange, and Black) are ordered based

on their relative utility assigned to the grids. Green

and Black grids have the highest and lowest utilities

respectively. Participants must place the grids in such

a way that the maximum 2x2 utility area for the rel-

evant maps is serviced. They are scored on the com-

bined utility for placement of both resources.

The experiment was conducted to study the in-

ﬂuence of the human manager’s Team Model under

two main conditions: (1) when there was no inter-

agent dissonance, and (2) during inter-agent disso-

nance. Both versions of the AI agents are designed

to have a 50% success rate, to prevent any systemic

bias in the participant’s interpretation of their compe-

Impact of Team Models in Hierarchical Human-Agent Decision-Making Teams

455

tencies. The sequence in which the AI agents err is

randomized but kept consistent across all the exper-

imental groups. The second independent variable is

the presence (or absence) of the participants’ Team

Model of the agents. Team Models in participants

are instilled through instructions. Participants with a

Team Model are told about the mutual inconsistency

between the agents and their effect on the compos-

ite maps presented. The participants without a Team

Model are given no such instruction. Team Models

give them the required understanding to identify and

diagnose the source of the agents’ errors and eval-

uate their capabilities in providing assistance. The

utility of the participant’s Team Model is evaluated

during both conditions: with and without inter-agent

team dissonance. Thus, the design of our experiment

is a 2x2 fully crossed between-subject experiment.

This setup is intended to simulate situations under

which team dissonance leads to improper coordina-

tion among AI team members and their managers.

3.2 Experimental Interface

The experimental interface consists of three main ar-

eas shown in Figure 1. The left-hand side consists of

the two information attributes along with the compos-

ite maps generated by each AI agent, contained within

the ‘Data Sources’ section. The middle section shows

the grid map of the affected regions in the city, and

the right-hand side includes controls to submit the re-

source placement. Panels in ‘Data Sources’ allow the

user to view the heat-map overlay and the associated

utility of each grid location on the map. Two (dot-

ted) icons in blue and red are overlaid on the heatmap

to indicate the AI’s suggestions for the placement of

Food and Generator. Two (solid line) markers are pro-

vided to the participant for their ﬁnal placement of the

resources. A color scale indicating the relative utili-

ties of each grid is displayed adjacent to the map.

The composite map is a tool that helps the par-

ticipants visualize all or different combinations of in-

formation attributes at once. In this case, each AI

agent synthesizes information from No-go and Flood-

ing data sources to provide the user with a condensed

representation of all the information.

The ﬁrst

instance of this was described by (Illingworth and

Feigh, 2021). Information from the composite map

along with Population or Power Outage data sources

The use of the term GenAI in the toggle panel on the

interface indicates that the composite map presented to the

user is generated by the AI agent responsible for the place-

ment of the Generator. This term is not to be confused with

the popular term ’GenAI’ for ‘Generative AI’. The distinc-

tion has been made clear to the user through instructions.

must be combined (using linear superposition) to de-

termine the appropriate location to place Food and

Generator respectively. Without Team Models, the

two agents would alternatively provide the decision-

maker with an incorrect composite map for 6 out of 12

tasks in total. With mutually compatible Team Mod-

els, the two composite maps are identical but are in-

correct for 6 out of 12 tasks, depriving the user of

the correct information from the No-Go and Flooding

maps.

Once participants submit their ﬁnal decision, feed-

back in the form of the individual utility (for each re-

source) and joint utility is provided to the user. They

then proceed to the next task. Information sources get

updated for every task.

3.3 Procedure

Participants start by signing an online consent form

and proceed to read a set of instructions. Once the

instructions are completed, participants are tested for

their task understanding using a questionnaire. Only

on clearing this test are they allowed to proceed. They

are provided with ﬁve training tasks that mimic the

actual tasks they will be faced with. On comple-

tion of training, participants in the treatment group

with Team Models are provided instructions about the

agents. They then proceed to complete the main ex-

perimental tasks. On completion, participants rate the

AI along several dimensions described in section 3.4.

They are also asked to rate their workload using the

NASA-TLX (Hart and Staveland, 1988).

3.4 Measures

This section describes the subjective and objective

metrics used to answer our research questions.

Team performance evaluates the accuracy of the

users’ decisions based on the utility of the selected

location. The utility for placement of each resource is

the linear superposition of the utilities from the rele-

vant maps for that location. Task scoring is the arith-

metic average for the placement of both resources.

The scores are normalized (min-max) to the range

of 0-100%. Along with decision utility, task load is

objectively measured using instances of users’ infor-

mation access. Information access for each attribute

is represented by the number of clicks on that infor-

mation source. The sum of all such clicks acts as an

objective, proxy measure for the effort expended by

the user in evaluating the agents’ suggestions toward

making the ﬁnal decision. In the same vein, the time

(to the nearest second) to make each decision is mea-

sured from the time the information is displayed on

HUCAPP 2025 - 9th International Conference on Human Computer Interaction Theory and Applications

456

Table 1: Experiment Design: Treatment Groups.

Participant

Team Model

Inter-agent

Team

Dissonance

No Dissonance Dissonance

Team Model

Participants have MM

of the agents;

agents have MM alignment

Participants have MMs

of the agents;

agent MMs are not

mutually aligned

No Team Model

Participant does not have

MMs of agents;

agents have MM alignment

Neither participants have

MMs of agents,

nor do agents have MM alignment

Figure 1: Gorilla Experimental Interface: (1) The toggle panel to the left indicates information sources. (2) The map at the

center shows the affected areas in the city. AI Suggestions in the form of red and blue dotted icons are placed at different

locations on the map. Blue and Red solid markers indicate resources for placement. Color scaling to the right indicates the

relative utility associated with each color. (3) Tools on the right consist of a ‘Submit’ button for submitting responses.

the screen until the user hits the ‘submit’ button to

transition to the next task.

Along with these performance metrics, users’

agreement levels with the AI agent are measured as

the proportion of instances (from the 12 tasks) for

which the user agreed with either AI’s suggestions.

This represents the participants’ tendency to utilize

the AI. The appropriateness of these agreement lev-

els was measured by the proportion of instances for

which the user agreed with an AI’s suggestion when

it was correct and disagreed with the AI’s suggestion

when it was incorrect. The overall agreement levels

are a measure of calibrated reliance on AI.

Users’ experience of workload during their in-

teraction with the AI team was measured using the

NASA TLX. We used a scale of 1-21 for recording

responses for each sub-scale. To identify users’ per-

ceptions of each teammate, participants were asked

to rate their experience of working with each agent

across the following dimensions: Intuitiveness (not

intuitive - intuitive), Conﬁdence (not conﬁdent - con-

ﬁdent), Competency (incompetent - competent), Ben-

eﬁt in teaming (useless - beneﬁcial), (Frequency of)

Trouble in Decision Making (never - always), Task

Understanding (mis-understood - understood), Will-

ingness to Work (together) again (never - deﬁnitely).

Responses to these questions were recorded on a Lik-

ert Scale of 0-5 with the lowest and highest ratings for

each metric indicated within the parentheses.

3.5 Participants

Data were collected from 80 participants with 20 par-

ticipants in each experimental group. The male-to-

female ratio in our participant pool was 50:29 and 1

participant was undesignated. Ages ranged between

20 to 69 with a median age of 36 years. All partic-

ipants resided in the United States and reported ﬂu-

ency in English. There were no reports of color blind-

ness. We used an online experiment-building plat-

form called Gorilla and conducted recruitment us-

ing the online crowd-sourcing platform, Proliﬁc. A

Impact of Team Models in Hierarchical Human-Agent Decision-Making Teams

457

simpler version of this setup was ﬁrst presented by

(Walsh and Feigh, 2021). Additional compensation

of 25% was provided to the top 10% performers to

encourage high-effort participation. The study was

IRB-approved at Georgia Institute of Technology.

4 RESULTS

We explored the impact of the human decision-

maker’s Team Model, during both conditions of inter-

agent dissonance. The Kruskal-Wallis test was used

to determine statistical differences between groups.

A comparison of team performance between

groups (see Fig. 2a) without inter-agent dissonance

indicated little effect of providing participants with

a Team Model (χ

(1) = 0.042, p = 0.8378). Curi-

ously, the lack of Team Models between agents led

participants without Team Models to make better de-

cisions on average (χ

(1) = 9.28, p < 0.01). How-

ever, improvements over the 12 tasks (slope = 1.029±

0.4, p < 0.05) were signiﬁcant among participants

with Team Models. These improvements were more

gradual and insigniﬁcant among participants without

Team Models (slope = 0.567 ± 0.4, p = 0.1390). Fig-

ure 3a indicates the trends for performance over the

12 tasks. Here, we see little differences in the ini-

tial performance levels among participants with Team

Models during dissonance and the participants paired

with AI agents without dissonance. This lack of dif-

ference in the initial stages can be attributed to an ‘il-

lusion of understanding’. Despite access to the cor-

rect version of the composite maps, this illusion may

create initial degradation in performance which was

quickly overcome with increased interactions. Sur-

prisingly, participants without Team Models had im-

proved scores. This may be due to increased vigi-

lance on their behalf, which led to active monitoring

(as we will see from information access and time to

make decisions) as they perceived inconsistency in

the information provided by the two agents. Test-

ing for an extended number of trials is necessary to

determine if these effects persist. Conversely, Team

Models only had a marginal effect on performance

for human-agent teams without dissonance because

when agents were incorrect, the decision-maker did

not have access to the correct version of the compos-

ite maps. They would have had to rely on underlying

task heuristics, which may be sparse leading to sub-

optimal outcomes.

The effect of possessing Team Models in partici-

pants had signiﬁcant effects on their task load (infor-

mation access) (during dissonance: χ

(1) = 8.28, p <

0.01, without dissonance: χ

(1) = 31.03, p < 0.001).

Table 2: Summary of results from Kruskal Wallis test

for comparison of performance metrics in decision-makers

with and without Team Model.

Metric Condition χ

(1) p

Team

Performance

Dissonance

0.042 0.8378

Dissonance 9.28 < 0.01

Information

Access

Dissonance

31.03 < 0.001

Dissonance 8.28 < 0.01

Time Taken

Dissonance

50.25 < 0.001

Dissonance 31.86 < 0.001

Agreement

with AI

Dissonance

15.43 < 0.001

Dissonance 6.40 < 0.05

The time required by the participants to arrive at

the ﬁnal decision was also signiﬁcantly reduced with

Team Models (during dissonance: χ

(1) = 31.86, p <

0.001, without dissonance: χ

(1) = 50.25, p <

0.001). However, the amount of time taken per de-

cision also declined over the 12 tasks among par-

ticipants who did not have a Team Model but were

equipped with non-dissonant agents that had mutually

compatible MMs, (slope = −1.74 ± 0.5, p < 0.001).

The ease for participants to recognize the behavior

due to the coordinated activity of the two agents may

explain this trend. Participants spent less time mak-

ing decisions when they possessed Team Models and

also beneﬁted (with increased task and team familiar-

ity) as long as Team Models existed between the two

AI agents causing them to be non-dissonant.

A comparison of the reliance of the manager on

AI agents revealed a signiﬁcant rise in the agree-

ment levels when participants had an accurate Team

Model (Figure 4). To measure the level of reliance

on the AI agents, we measure the agreement levels

between participants’ ﬁnal decisions and either AI’s

suggestions. During dissonance, this difference is

of the order χ

(1) = 6.40, p < 0.05, whereas with-

out dissonance, the difference is of the order χ

(1) =

15.43, p < 0.001. We also measured how appropri-

ate the reliance was. Notably, there was no difference

when the participant was provided with a Team Model

of the agents. They were more accepting of the AI’s

suggestions when they had a Team Model while being

wary of its suggestions when they did not possess a

Team Model. Providing a Team Model did not lead to

blind compliance as participants were better equipped

to identify when to follow either AI’s suggestions.

Lastly, we measured participants’ subjective

workload using the NASA-TLX. Team Models in-

ﬂuenced operators’ workload when faced with con-

ﬂicting information as the two agents were uncoordi-

HUCAPP 2025 - 9th International Conference on Human Computer Interaction Theory and Applications

458

nated. Participants with Team Models reported signif-

icantly less Mental Demand (χ

(1) = 5.38, p < 0.05)

and Effort (χ

(1) = 5.43, p < 0.05).

Along with workload, participants were asked to

rate the Agents across several dimensions to assess

the perception of their Teammates Team Models did

not affect the ratings given to either agent along any of

the speciﬁed dimensions. However, participants with-

out a Team Model sensed greater frequency in the oc-

currence of errors for the AI that made suggestions for

the placement of Food (χ

(1) = 4.36, p < 0.05) dur-

ing inter-agent team dissonance. The lack of Team

Models may lead to a tendency among users to as-

sess two equally competent agents differently, espe-

cially when there is dissonance among members of

the team. This is not surprising because, within our

experiment, dissonance was designed to simulate a

condition where there was disagreement between the

agents on performing an interdependent task of syn-

thesizing common pieces of information. Without a

Team Model, operators would need more task train-

ing and experience with the two agents to recognize

that the two agents are similar in competency.

5 DISCUSSION

With the rising development and integration of AI-

based agents in a variety of task settings, it is nec-

essary to understand how their interactions with hu-

mans will evolve. While traditional human-AI team-

ing continues to be studied under simplistic decision

workﬂows, multi-operator settings would evolve into

a complex reorganization of activities that constitute a

hierarchy. Individuals operating in such teams would

have varied roles, and responsibilities, where estab-

lishing SMMs in all aspects of the collaboration will

prove increasingly challenging. Especially, as AI sys-

tems remain largely opaque, a detailed understand-

ing of the AI teammate or lack thereof may have se-

vere consequences in collaborative output. This study

aimed to understand whether accurate Team Models

of the AI serve the purpose of aiding the decision-

maker within a hierarchical setting, especially during

dissonance between the members of the team.

We recognized that dissonance in teams led to

surprising outcomes in terms of the decision utility.

Inconsistency in the mutual perception and integra-

tion of information between the two agents led to in-

creased vigilance among decision-makers without ac-

curate Team Models. Providing a mental model of

their teammates led to an initial ‘illusion of under-

standing’, which led to some initial reductions in deci-

sion utility. However, the long-term beneﬁts of Team

Models are that they help decision-makers quickly

identify and correct inter-agent inconsistencies to im-

prove decision outcomes and close any performance

gaps. We urge researchers to verify these results in

other operational environments to empirically iden-

tify the utility of Team Models in multi-agent settings.

In situations where the risks associated with deci-

sion outcomes are severe, increased training must be

exercised between members for MMs between them

to form and converge. Convergence does not mean

an overlap of information within MMs. Especially

in multi-agent settings, MM convergence should aim

towards arriving at shared expectations between in-

dividuals of a team for any given situation (Cannon-

Bowers and Salas, 1990). Such training may also in-

clude ways for the decision-makers to verify the ve-

racity of information the agents operate on, to im-

prove transparency. As we transition to larger teams,

dissonance is more likely to occur especially in sce-

narios where communication is restrictive. Alterna-

tive ways for the decision-maker to tackle such incon-

sistencies may be useful in achieving better outcomes.

For tasks that prioritize speed over decision utili-

ties, mutual predictability in agentic behavior may be

a driver of high performance. This is because indi-

viduals without accurate Team Models also tend to

improve their decision-making speed with increased

familiarity with the task and teammates, without nec-

essarily compromising on the quality of the decisions.

Although providing Team Models beneﬁts the opera-

tors by reducing their overall task loads, in situations

where achieving an accurate, thorough, and detailed

understanding of the teammates is challenging, im-

proving the predictability of AI teammates may yield

sufﬁciently desirable outcomes.

Overall, the beneﬁts of providing Team Models

among decision-makers are signiﬁcant in terms of the

quality of ensuing interactions. The use of AI-based

tools can be maximized only when users are more

likely to use them rather than indiscriminately avoid-

ing them. Providing a Team Model helps improve

the acceptance of the agents’ suggestions without cre-

ating over-reliance or complacency among individ-

uals. Decision-makers can perceive AI errors more

thoroughly which would eliminate any potential au-

tomation biases (Mosier and Skitka, 1999). In real-

ity, as ‘MMs’ between agents diverge, communica-

tion breaks down, and it becomes difﬁcult for agents

to describe their state and their actions in a way that

the decision-maker will understand, thereby introduc-

ing a breakdown of transparency (Scali and Macredie,

2019). In addition to reducing mental demand and ef-

fort perceived by decision-makers, during dissonance,

having reliable Team Models lends more accurate and

Impact of Team Models in Hierarchical Human-Agent Decision-Making Teams

459

(a) (b) (c)

Figure 2: Comparison of performance metrics measured across both conditions of dissonance (a) Team performance in terms

of decision utility; (b) Information access (mouse clicks); (c) Time taken per decision.

(a) (b) (c)

Figure 3: Comparison of trends in performance metrics measured across both conditions of dissonance (a) Team performance

in terms of decision utility; (b) Information access (mouse clicks); (c) Time taken per decision.

Figure 4: Overall agreement levels with either AI agent.

equitable perceptions by humans of their AI team-

mates. Researchers in the ﬁeld of Explainable AI

should aim to identify what aspects of the humans’

Figure 5: Comparison of ratings for workload dimensions.

mental models of the AI need reinforcement, such

that the explanations deliver the maximum positive

impact toward the interaction between the human and

AI agents, without overwhelming the user with addi-

tional information.

6 LIMITATIONS

While our study provides insight into the interactions

between humans and their automated teammates in a

HUCAPP 2025 - 9th International Conference on Human Computer Interaction Theory and Applications

460

hierarchical decision setting, we wish to shed light on

some limitations in the design choices for our study.

This study was designed for crowd workers to un-

derstand decision-making in a hierarchical triad of

human-agent teams. The task was simpliﬁed and

training was provided to equip participants with suf-

ﬁcient understanding to complete the task. We also

acknowledge that our decision environment may not

have accurately simulated the varied levels of risks

associated with real-life decision environments. This

restricts our ability to fully capture the true attitudes

and preferences that experts may demonstrate during

the interaction. To understand the naturalistic deci-

sion processes among domain experts and the role

of accurate Team Models in facilitating the interac-

tion process, we recommend that researchers and de-

sign practitioners study the validity of our ﬁndings in

higher-ﬁdelity testing environments, where they may

replicate the risks associated with the task. Our ﬁnd-

ings serve an instructional value, in that they prove

the existence of some of the underlying trends that

we may observe in real-life situations.

Participants in our study also had access to high-

level relevant information, such that they may use to

improve the decisions of their AI teammates. This

design choice was made to identify whether learn-

ing occurs within the duration of our study. In cases

where there are intangibles, due to a non-uniform dis-

tribution of key information, where the user cannot

possibly make up for performance, the decision util-

ity levels as seen in the No Team Model and Dis-

sonance condition of our study may not hold valid.

In such situations, the user has to have a better un-

derstanding of the AI teammate or must be provided

with other means of verifying the ﬁdelity of AI rec-

ommendations, to prevent any degradation in team

performance. In disaster-relief situations, where the

decision utility may map to the number of lives saved,

even small increments in the quality of decisions may

have major consequences. Thus, the success of col-

laborative decision-making should be studied in the

context of the environment.

We also believe that extending the number of trials

will help us develop better insights into how some of

our performance metrics may evolve over longer pe-

riods. Our study was designed to be completed within

an hour (maximum time taken by users within a pilot

study, from instruction to post-study questionnaires),

to ensure participant engagement and retention. The

design of the study, through instructions and training,

ensured that participants could identify, learn, and

adapt through increased engagement with the task and

agents. However, we hesitate to say if the differences

between groups for any of the performance metrics

will be sustained or if they will level out with greater

experience in performing the task. Future work must

study how the observed trends evolve over extended

periods to determine the existence of long-term gains

in the interaction of these MM levels between humans

and AI.

Finally, we acknowledge that the deﬁnition of

Team Models within our study was limited to the

user’s understanding of the agents’ proclivity to mak-

ing errors and how mutually incompatible agents pro-

cessed information differently. Team Models are

multi-dimensional and could be represented/modeled

in different ways. Studies incorporating the other di-

mensions, such as teammate skills, communication

patterns, preferences, etc. will help researchers un-

derstand what aspects of Team Models best improve

upon the performance in hybrid teams. It is also worth

investigating which aspects of the Team Model are

prone to degradation and how design workﬂows, ex-

planations, and other mechanisms may improve the

humans’ MMs of AI.

7 CONCLUSION

This work is the ﬁrst in a series of studies that will

study the impact of teammate MMs in hierarchical

human-AI teams for decision-making. We studied the

impact of providing the decision-maker with an ac-

curate Team Model, when there was inter-agent dis-

sonance and when there was not. With partial and

abstracted Task Models at the apex of the decision hi-

erarchy, the lack of teammate understanding leads to

greater vigilance, task load, and reduced speed on be-

half of the decision-maker during dissonance. Team

Models help decision-makers close performance gaps

without causing greater stress or reducing their speed

of decision-making. They help improve humans’ per-

ception of the agents and reduce the subjective work-

load, especially during dissonance. These ﬁndings

drive the importance of designing AI-based decision-

support tools that not only support task-based needs

but also drive transparency in operations leading to

better MM formation among its users. As AI-powered

tools proliferate, studies should focus on the interac-

tion effects between decision-makers’ teammate un-

derstanding in complex settings and various task en-

vironments to drive better design choices in creating

intelligent decision-support tools.

Impact of Team Models in Hierarchical Human-Agent Decision-Making Teams

461

ACKNOWLEDGEMENTS

This work was supported by the Ofﬁce of Naval Re-

search Command Decision Making Program under

Contract N00014-24-1-2135. The results do not re-

ﬂect the ofﬁcial position of this agency.

REFERENCES

Andrews, R. W., Lilly, J. M., Srivastava, D., and Feigh,

K. M. (2023). The role of shared mental models in

human-ai teams: a theoretical review. Theoretical Is-

sues in Ergonomics Science, 24(2):129–175.

Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shar-

iff, A., Bonnefon, J.-F., and Rahwan, I. (2018). The

moral machine experiment. Nature, 563:59–64.

Bansal, G., Nushi, B., Kamar, E., Lasecki, W. S., Weld,

D. S., and Horvitz, E. (2019). Beyond accuracy:

The role of mental models in human-ai team perfor-

mance. In AAAI Conference on Human Computation

& Crowdsourcing.

Bartneck, C., L

utge, C., Wagner, A., and Welsh, S. (2021).

An Introduction to Ethics in Robotics and AI.

Cannon-Bowers, J. and Salas, E. (1990). Cognitive psychol-

ogy and team training: Training shared mental models

and complex systems. Human Factors Society Bul-

letin, pages 1–4.

Converse, S., Cannon-Bowers, J. A., and Salas, E. (1993).

Shared mental models in expert team decision mak-

ing. Individual and group decision making: Current

issues, 221:221–46.

Cummings, M. (2004). Automation bias in intelligent time

critical decision support systems. Collection of Tech-

nical Papers - AIAA 1st Intelligent Systems Technical

Conference, 2.

Das, D., Banerjee, S., and Chernova, S. (2021). Ex-

plainable ai for robot failures. In Proceedings of

the 2021 ACM/IEEE International Conference on

Human-Robot Interaction. ACM.

Dietvorst, B., Simmons, J., and Massey, C. (2014). Algo-

rithm aversion: People erroneously avoid algorithms

after seeing them err. Journal of experimental psy-

chology. General, 144.

Doshi-Velez, F. and Kim, B. (2017). Towards a rigorous

science of interpretable machine learning.

Groom, V. and Nass, C. (2007). Can robots be teammates?

benchmarks in human-robot teams. Interaction Stud-

ies, 8:483–500.

Handmer, J. and Proudley, B. (2007). Communicating un-

certainty via probabilities: The case of weather fore-

casts. Environmental Hazards, 7:79–87.

Hart, S. G. and Staveland, L. E. (1988). Development

of nasa-tlx (task load index): Results of empirical

and theoretical research. Human mental workload,

1(3):139–183.

Illingworth, D. A. and Feigh, K. M. (2021). Impact map-

ping for geospatial reasoning and decision making.

Human Factors, page 0018720821999021.

Johnson-Laird, P. N. (1983). Mental models: Towards

a cognitive science of language, inference, and con-

sciousness. Number 6. Harvard University Press.

Kamar, E. (2016). Directions in hybrid intelligence: Com-

plementing ai systems with human intelligence. In

Proceedings of the Twenty-Fifth International Joint

Conference on Artiﬁcial Intelligence, IJCAI’16, page

4070–4073. AAAI Press.

Kamar, E., Hacker, S., and Horvitz, E. (2012). Com-

bining human and machine intelligence in large-

scale crowdsourcing. In Proceedings of the 11th In-

ternational Conference on Autonomous Agents and

Multiagent Systems - Volume 1, AAMAS ’12, page

467–474, Richland, SC. International Foundation for

Autonomous Agents and Multiagent Systems.

Kingston, J. (2018). Artiﬁcial intelligence and legal liabil-

ity.

Kleinman, D. L. and Serfaty, D. (1989). Team performance

assessment in distributed decision making. In Pro-

ceedings of the symposium on interactive networked

simulation for training, pages 22–27. University of

Central Florida Orlando, FL.

Kulesza, T., Stumpf, S., Burnett, M., and Kwan, I. (2012).

Tell me more? the effects of mental model soundness

on personalizing an intelligent agent. In Proceedings

of the SIGCHI Conference on Human Factors in Com-

puting Systems, CHI ’12, page 1–10, New York, NY,

USA. Association for Computing Machinery.

Kulesza, T., Stumpf, S., Burnett, M., Yang, S., Kwan, I.,

and Wong, W.-K. (2013). Too much, too little, or

just right? ways explanations impact end users’ men-

tal models. In 2013 IEEE Symposium on Visual Lan-

guages and Human Centric Computing, pages 3–10.

Mathieu, J. E., Heffner, T. S., Goodwin, G. F., Salas, E.,

and Cannon-Bowers, J. A. (2000). The inﬂuence of

shared mental models on team process and perfor-

mance. Journal of applied psychology, 85(2):273.

Mosier, K. L. and Skitka, L. J. (1999). Automation use and

automation bias. Proceedings of the Human Factors

and Ergonomics Society Annual Meeting, 43(3):344–

348.

Mueller, S. T., Hoffman, R. R., Clancey, W. J., Emrey, A.,

and Klein, G. (2019). Explanation in human-ai sys-

tems: A literature meta-review, synopsis of key ideas

and publications, and bibliography for explainable ai.

ArXiv, abs/1902.01876.

Norman, D. A. (2013). The Design of Everyday Things.

MIT Press.

Orasanu, J. M. and Salas, E. (1993). Team decision mak-

ing in complex environments. page 327–345. Ablex

Publishing.

O’Neill, T., McNeese, N., Barron, A., and Schelble, B.

(2022). Human–autonomy teaming: A review and

analysis of the empirical literature. Human Factors,

64(5):904–938. PMID: 33092417.

Poursabzi-Sangdeh, Forough, Goldstein, G, D., Hofman,

J. M., Wortman Vaughan, J. W., and Wallach, H.

(2021). Manipulating and measuring model inter-

pretability. In Proceedings of the 2021 CHI Confer-

ence on Human Factors in Computing Systems, CHI

HUCAPP 2025 - 9th International Conference on Human Computer Interaction Theory and Applications

462

’21, New York, NY, USA. Association for Computing

Machinery.

Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ”why

should i trust you?”: Explaining the predictions of any

classiﬁer. In Proceedings of the 22nd ACM SIGKDD

International Conference on Knowledge Discovery

and Data Mining, KDD ’16, page 1135–1144, New

York, NY, USA. Association for Computing Machin-

ery.

Robinette, P., Howard, A., and Wagner, A. (2017). Concep-

tualizing overtrust in robots: Why do people trust a

robot that previously failed?, pages 129–155. Springer

International Publishing.

Scali, G. and Macredie, R. (2019). Shared mental models as

a way of managing transparency in complex human-

autonomy teaming.

Scheutz, M., DeLoach, S. A., and Adams, J. A. (2017).

A framework for developing and using shared mental

models in human-agent teams. Journal of Cognitive

Engineering and Decision Making, 11(3):203–224.

Tullio, J., Dey, A. K., Chalecki, J., and Fogarty, J. (2007).

How it works: A ﬁeld study of non-technical users in-

teracting with an intelligent system. In Proceedings of

the SIGCHI Conference on Human Factors in Com-

puting Systems, CHI ’07, page 31–40, New York, NY,

USA. Association for Computing Machinery.

Wagner, A. R., Borenstein, J., and Howard, A. (2018).

Overtrust in the robotic age. Communications of the

ACM, 61(9):22–24.

Walsh, S., Narayanan, R., and Feigh, K. M. (2024). The

role of shared mental models in ai-advised decision

support. (under review).

Walsh, S. E. and Feigh, K. M. (2021). Differentiating ‘hu-

man in the loop’ decision process. In 2021 IEEE In-

ternational Conference on Systems, Man, and Cyber-

netics (SMC), pages 3129–3133.

Yang, E. and Dorneich, M. C. (2018). Affect-aware adap-

tive tutoring based on human–automation etiquette

strategies. Human factors, 60(4):510–526.

Yin, M., Vaughan, J., and Wallach, H. (2019). Understand-

ing the effect of accuracy on trust in machine learning

models. pages 1–12.

Zhang, Y., Liao, Q. V., and Bellamy, R. K. E. (2020). Ef-

fect of conﬁdence and explanation on accuracy and

trust calibration in AI-assisted decision making. In

Proceedings of the 2020 Conference on Fairness, Ac-

countability, and Transparency. ACM.

Impact of Team Models in Hierarchical Human-Agent Decision-Making Teams

463