The Negotiator: Interactive Hostage-Taking Training Simulation

Pierre-Benjamin Monaco

1 a

, Per Backlund

2 b

and St

ephane Gobron

3 c

Master of Software Engineering, University of Applied Sciences and Arts Western Switzerland (HES-SO),

Avenue de Provence 7, Lausanne, Switzerland

Department of Game Technologies, School of Computer Science, University of Sk

ovde, Sk

ovde, Sweden

HE-Arc School of Engineering, University of Applied Sciences and Arts Western Switzerland (HES-SO),

Espace de l’Europe 11, Neuch

atel, Switzerland

Keywords:

Mobile Phone, Natural Language Processing, Hostage-Taking, 3D, Immersion, Negotiation,

Decision-Making, Human-Machine Interface, Virtual Reality, Multimodal Systems.

Abstract:

High-stakes professions like negotiators and pilots utilize simulations for risk-free training, enhancing skills

and decision-making. Traditional simulations, while effective, are resource-intensive. Computer simulations

offer a scalable alternative but lack realism and sensory engagement, impacting immersion. This study ex-

plores mobile phones’ role in simulation-based learning, using a 3D hostage-taking scenario to improve im-

mersion through realistic interactions. The simulation integrates a detailed environment and interactive ele-

ments, and demonstrate the potential of mobile technology to enhance training in critical ﬁelds by making

simulations more lifelike and engaging. This paper is presented alongside a twin paper, both part of the same

project Master ﬁnal project. The second paper introduces a different version of the simulation using a Large

Language Model (like ChatGPT) for generating dialogue freely and interactively. It also discusses the results

of a study comparing immersion levels between the two versions.

1 INTRODUCTION

Physical world

Nego�ator

– App trainee

Computer to get

informa�on

Interven�on

forces colleague

Hostage

Hostage-taker

Mobile phone –

trainee’s phone

QR-code to reach

hostage-taker

Simula�on scenes: nego�ator main view, hostage-taker scene

Figure 1: Overview of the negotiation simulation: the user on the left is in the physical world (e.g. a classroom) with (1) a

computer with which he uses the application and (2) his cell phone to converse with the simulated hostage-taker.

High-risk professions, especially in the ﬁelds of

healthcare, aviation, and security, do not allow for

errors. The consequences of such errors can be ex-

tremely serious and sometimes even lead to the death

https://orcid.org/0000-0002-4487-8195

https://orcid.org/0000-0001-9287-9507

https://orcid.org/0000-0003-3796-6928

of one or more individuals. The question then arises:

“Is it possible to train these skills without risking the

lives of the people involved?”. The answer is sim-

ple: “Simulation”. Training simulations are effec-

tive tools for experiencing complex situations and are

widely used in the aforementioned ﬁelds (Bienstock

and Heuer, 2022), as well as many others (Valiente

242

Monaco, P., Backlund, P. and Gobron, S.

The Negotiator: Interactive Hostage-Taking Training Simulation.

DOI: 10.5220/0012720300003758

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 14th International Conference on Simulation and Modeling Methodologies, Technologies and Applications (SIMULTECH 2024), pages 242-249

ISBN: 978-989-758-708-5; ISSN: 2184-2841

et al., 2019). In a simulation, it is possible to make

mistakes without suffering the sometimes disastrous

consequences that would result. Based on the con-

cept of Experimental learning, educational or train-

ing simulations allow for the integration of behaviors

requiring quick and automatic actions and/or reﬂec-

tions through experience and repetition (Salas et al.,

2009). Today, a surgeon can practice making inci-

sions on a realistic mannequin, a ﬁreﬁghter can train

to enter a burning house and ﬁnd victims amid the

smoke, and a pilot can practice landing a commer-

cial airplane in disastrous weather conditions with

engines failing. These situations are practiced us-

ing more or less digitized simulations. The closer

a simulation is to reality, the more effective it is as

a training tool (Bunker, 1978). Fidelity can be dis-

tinguished into two types: sensory ﬁdelity and cog-

nitive ﬁdelity (Hochmitz and Yuviler-Gavish, 2011).

Depending on the educational/training objectives, it

is important to consider both aspects. The effort re-

quired to increase realism generally follows an expo-

nential trend, and depending on available resources,

sometimes choices must be made: “How much is too

much?”. The educational effectiveness of a simula-

tion is strongly linked to the level of immersion of

the participants, and the realism of a simulation con-

tributes greatly to this (MacLean et al., 2019). Non-

digitized or minimally digitized simulations often re-

quire the engagement of many resources, human, ﬁ-

nancial, and temporal. The preparation of a simula-

tion can take years, and the implementation of them is

expensive, especially in the case of a full-scale simu-

lation (Simpson and Oser, 2003). On the other hand, a

digital simulation can be just as expensive to develop

but proportionally requires fewer resources for imple-

mentation because it involves a program that can be

repeated indeﬁnitely, like a video game.

This paper explores the novel possibility of us-

ing mobile phones as an interaction tool with virtual

characters in a training simulation. This simulation

is set in a hostage-taking scenario where the partici-

pant is a police negotiator who must interact with the

hostage-taker. In this type of scenario, cognitive ﬁ-

delity is primarily sought; the dialogue must be co-

herent. However, the sensory component should not

be overlooked, as learning is always more effective

when multiple senses are stimulated (Baines, 2008).

The question is whether it is possible to simulate the

natural use of a mobile phone and also whether this

contributes to realism and ultimately, immersion. Tra-

ditionally, a negotiation simulation or more broadly,

dialogue with a virtual character relies on dialogue

trees (Lobo et al., 2022). These simulations typically

offer a simple “chat” interface where the participant

can read the reactions of the simulated character and

respond by clicking among various choices available

to them (Bel-Enguix et al., 2009). A simulation of this

type was conducted before this project, where the mo-

bile phone was already used as an interface (Monaco

et al., 2023). The use of it seemed to increase immer-

sion when the virtual character’s phrases were prere-

corded and played on the phone. The realism of the

situation was lost right after hearing the character’s

speech because to respond, one had to click buttons on

the phone screen instead of speaking. Building on the

insight of using phones more naturally, this research

project emerged. The ﬁndings are shared across two

twin papers. This paper focuses on the design and im-

plementation of a negotiation simulation that innova-

tively employs mobile phones for natural interaction

with a virtual hostage-taker (Figure 1). The compan-

ion paper delves into a new dialogue generation sys-

tem using LLMs (like ChatGPT) and compares im-

mersion levels between versions entitled “Interactive

storytelling apps - increasing immersion and realism

with artiﬁcial intelligence?” (Monaco et al., 2024).

2 CONCEPTION

The development of this simulation begins with iden-

tifying the critical components necessary for its cre-

ation. Such simulations require a speciﬁc area, vary-

ing from an indoor space to an outdoor environment,

designed to replicate the real-world scenario being

simulated. Moreover, the involvement of actors and

specialists, who assume various roles during the exer-

cise, is essential for the simulation’s authenticity. Fur-

thermore, detailed brieﬁngs and debrieﬁngs, executed

by supervisors prior to and following the simulation,

are fundamental for preparing participants and offer-

ing them constructive feedback.

To economically replicate this environment, one

can employ video game development tools to create

a lifelike 3D landscape. The brieﬁng phase can be

facilitated by a Non-Player Character (NPC) within

the virtual environment. The process of debrieﬁng

can also be programmed and incorporated in a sim-

ilar fashion, although this aspect is deliberately not

included in the scope of this project. The primary ob-

jective of this study is to investigate the integration

of mobile phones as interactive devices, rather than

producing a market-ready product. It is essential for

the hostage-taker simulation to serve dual functions:

as an entity within the 3D world and as an interactive

dialogue agent. Consequently, this requires the devel-

opment of an immersive 3D application designed to

run on a computer.

The Negotiator: Interactive Hostage-Taking Training Simulation

243

In this simulation, participants take on the role of

negotiators, responsible for engaging in dialogue with

the hostage-taker through phone communication. For

a fully immersive experience, participants would use

their own mobile devices to call an actor represent-

ing the hostage-taker. Consequently, the development

of a smartphone application is required to replicate

the interfaces and capabilities of a standard telephone,

including the functionality to record phone numbers

and place calls. This application must be seamlessly

integrated with the desktop software that manages the

simulation’s ﬂow and triggers speciﬁc scenarios based

on the interactions with the hostage-taker, such as

releasing hostages, disarming the hostage-taker, the

hostage-taker’s surrender, or calling in police forces.

To enable voice communication between the par-

ticipant and the hostage-taker, the use of Natural Lan-

guage Processing (NLP) technologies needed. As

these technologies are largely founded on Machine

Learning (ML) frameworks, the incorporation of a

third software module to handle the simulation’s

“smart” features becomes necessary. Consequently,

constructing this simulation demands a layered appli-

cation architecture that orchestrate the three elements

mentioned earlier (Figure 2).

Scene

manager

Call manager

Websocket

server

HTTP

client

Speech

recorder

HTTP

client

Contact

manager

Scenario

manager

Call manager

Diaogue

manager

Websocket

server

REST server

Speech

transcription

Semantic

comparison

Computer app

Phone app

Microservices

-- NLP server

Figure 2: Negotiator app: fundamental concepts and tech-

nologies. Interactions between entities is describes in Fig-

ure 4.

2.1 Computer App

The computer application should feature three scenes:

an introduction scene in 2D displaying the simula-

tions’s name and logo; a main scene where a police

ofﬁcer briefs the negotiator and provides textual in-

formation; and three dialogue parts where the nego-

tiator contacts the hostage-taker, calms the situation,

and resolves it. The negotiator receives updates on

their job at the start of each part from the brieﬁng of-

ﬁcer. After the ﬁnal conversation, the simulation con-

cludes with a scene displaying newspaper headlines

providing feedback on the ending.

2.2 Phone App

Contacts

12.00

Contacts

Home

Albert Bonnard 079 *** ** **

Arthur Bonnard 079 *** ** **

.........................

Nathan Dupont 078 *** ** **

+ Add new contact

12.00

New contact

Save

Cancel

Name

Number

12.00

Calling

Arthur Bonnard

079 *** ** **

12:25 min

Hang up

(a) (b) (c) (d)

Figure 3: Interfaces of the phone app: (a) home screen, (b)

contacts list, (c) menu to add a contact, and (d) phone call.

Final renderings are presented in Figure 12.

The phone app mimics a current smartphone’s inter-

face, featuring a professional-looking home screen

with essential features for the negotiator, such as a

contacts app and a phone call function. The app con-

sists of four key screens (Figure 3), designed to be

simple for realism, ease of navigation. Avoiding un-

necessary features decreases the risk of bugs and al-

lows the participant to focus only on the task they

need to accomplish.

2.3 Natural Language Processing

To enable users of the app to talk with the hostage-

taker, their spoken words must ﬁrst be converted into

text using Speech-to-Text (STT) technology. Follow-

ing this, the text is analyzed and matched against pre-

determined options by employing Word Embeddings

with text similarity techniques.

The conversation follows a tree-like structure (Co-

hen et al., 2014), offering the negotiator various pre-

set responses. To help the participant, hints are pro-

vided on screen.

2.4 Call Procedure

When a call starts (Figure 4), the computer app picks

a dialogue tree. It then sends the hostage-taker’s pre-

Transcribe

STT

Compare

NLP - Spacy

Negotiator's

answer speech

-- audio file

Transcribed

speech

(text)

Speech's ID

and answers

Most probable

answer

-- array index

Starting call

Selecting

dialogue

tree node

Playing

hostage-taker

speech

Recording

negotiator's

answer

Selecting

answer

Speech's ID

and answers

Processing

answer

YES

End of

dialogue?

Event

management

Ending call

Potential

answers

Computer app

Phone app

NLP Server

Figure 4: Flowchart of the process for natural interaction

over the phone.

SIMULTECH 2024 - 14th International Conference on Simulation and Modeling Methodologies, Technologies and Applications

244

recorded speech and response choices to the phone

app. The phone app plays this speech and records the

negotiator’s reply, sending it to the NLP server. The

server determines the reply that semantically matches

the negotiator’s speech the best, it sends this choice

back to the phone app, which then passes it on to the

computer app. The computer app responds by mov-

ing to the next part of the dialogue tree based on this

choice. This process repeats, moving through the di-

alogue tree until it reaches a point with no further op-

tions, at which time the computer app ends the call.

3 IMPLEMENTATION

3.1 Computer App

Figure 5: Main scene viewed by the negotiator.

3D World – The setting for the hostage scenario was

crafted using the Unity game engine, establishing the

main scene as a conventional crime scene featuring

police cars and ofﬁcers positioned behind barricades.

Participants, assuming the role of a negotiator, are

placed in a street section cordoned off by security bar-

riers, immersing them directly into the hostage situa-

tion’s environment. The negotiator has the ability to

move within a limited area, leading them towards a ta-

ble equipped with a laptop. Upon reaching the com-

puter, the simulation ofﬁcially kicks off as a SWAT

ofﬁcer arrives to brief the negotiator on the situation

and outline their responsibilities (Figure 5).

The environment is divided into three sections: a

building interior where the hostage situation unfolds,

an intervention area in front of the building, and a

navigation zone behind security barriers (Figure 6).

ProBuilder and ProGrid tools in Unity were used to

create this setup, providing urban development tools

Figure 6: A view of the main scene from the top with the

different zones and actors of the world.

and prefabricated components. The hostage area fea-

tures the hostage-taker and hostages, with the former

capable of animated movements, including approach-

ing the entrance to surrender or disarm. Hostages can

also ﬂee the building. The intervention area includes

two animated police ofﬁcers, one of whom periodi-

cally offers verbal guidance to the negotiator. The

navigation zone allows movement for the negotiator

character, controlled from a ﬁrst-person perspective,

and contains a table with a laptop. A blue circle marks

the simulation trigger area in front of the table, ini-

tiating the simulation when the negotiator enters it.

Background sounds like police sirens and urban am-

bience deepen immersion.

Animations – The actions of both the police ofﬁ-

cer and the hostage-taker were recorded using motion

capture technology, utilizing the Mocap Fusion soft-

ware alongside equipment such as the HP Reverb G2

headset, Index controllers, and Vive trackers. The di-

alogue for the police ofﬁcer was recorded in advance,

and a lip-sync system was employed to synchronize

the facial animations with the speech. The animation

sequences are orchestrated through a state machine,

allowing the scenario manager to regulate the charac-

ter animations by initiating state transitions.

Starting of the Simulation – When the game begins,

the computer application launches a local WebSocket

(WS) service implemented with the help of the Web-

SocketSharp.Standard NuGet package. To kick off

the simulation, the negotiator needs to move to a po-

sition in front of the table in front of the intervention

zone. The intervention ofﬁcer then approaches and

asks the negotiator to scan a QR code displayed on

the computer screen in front of them (Figure 7). This

action serves as an “authentication” process with their

service phone. This action establishes a connection

between the phone and the computer application and

initiates the brieﬁng phase. The ofﬁcer provides de-

tails about the suspect and prepares the participant for

the call to the hostage-taker.

The Negotiator: Interactive Hostage-Taking Training Simulation

245

Figure 7: From left to right: QR code given by the police-

man; view from the cellphone at the starting of the app.

Interactive Elements – The Heads-Up Display

(HUD) is a see-through overlay on the screen that

presents information to the participant (Figure 8).

Instructions about controls and required actions are

shown during the simulation. The HUD also displays

suggested phrases for the participant to use during the

conversation and badges that represents the progres-

sion of the simulation.

Figure 8: Heads-Up Display providing useful information

for the user.

The negotiator can access the hostage-taker’s pro-

ﬁle (Figure 9) that includes initial information com-

piled by the intervention team, such as a summary

of the situation, the hostage-taker’s name, age, social

status, spouse’s name, and phone number. This pro-

vides the negotiator with a preliminary understand-

ing of the scenario and supplies a contact number for

reaching out to the hostage-taker.

Figure 9: File of the suspect where user can receive hints

and take notes.

After reaching the second phase of the simulation,

the laptop initially used to show the QR code (Fig-

ure 7) can also be used to gather more information

about the hostage-taker. Participants can access the

police database to search by the hostage-taker’s name

or other associated individuals. Additionally, they can

explore a ﬁctional social network called “SelfBook”

to gain insights into the personality and background

of the person they are negotiating with.

Interacting with the computer brings up a full-

screen interface with a search bar designed for look-

ing up police records. A navigation bar with tabs la-

beled “police ﬁles” and “SelfBook” lets users toggle

between different pages. These pages are created in

advance, saved as jpeg images, and integrated into

the simulation, allowing for a realistic browsing ex-

perience without real-time internet connectivity (Fig-

ure 10).

Figure 10: Left: Police search and police ﬁle. Right: Social

network search and social network wall.

Scenario and Dialogues – The ScenarioManager

class oversees the simulation, serving as a bridge be-

tween the DialogueManager class and managing the

simulation’s state variables. These variables track the

status of the hostages, the hostage-taker, the negotia-

tor, and the current phase of the dialogue.

A dialogue tree editor has been created, enabling

crafting and adjusting dialogues. Nodes represent po-

tential statements by the negotiator and replies from

the hostage-taker, interconnected to provide response

options inﬂuencing the operation’s success.

Following each dialogue segment, the Scenario

Manager transitions to the next phase, with the in-

tervention ofﬁcer updating the negotiator. After the

concluding dialogue, the Scenario Manager evaluates

the negotiation’s effectiveness, leading to four pos-

sible outcomes: hostage-taker surrender, police as-

sault without casualties, police assault resulting in the

hostage-taker’s death, or a police assault with multi-

ple casualties.

3.2 End Scene

The simulation concludes with a focus on a table dis-

playing the next day’s newspaper, which reports on

the hostage scenario. The ﬁnal section of the article

outlines the resolution of the event (Figure 11). To

SIMULTECH 2024 - 14th International Conference on Simulation and Modeling Methodologies, Technologies and Applications

246

Figure 11: From left to right: The end scene explaining the

end of the situation and the end screen on the cellphone.

preserve the simulation’s realism from start to ﬁnish,

a decision was made against assigning a numerical

score, reﬂecting the real-world context where nego-

tiators do not receive quantitative evaluations of their

performance. At this point, a message indicating the

end of the simulation is transmitted to the phone app,

and the mobile phone screen presents options for the

user to either restart the simulation or exit it.

3.3 Phone App

Similar to the computer application, the phone app

was also developed using Unity. This decision was

taken early in the prototype development phase to in-

vestigate Unity’s capabilities for mobile app creation.

The design of the app mimics a compact, secure op-

erating system that could be utilized by law enforce-

ment agencies.

QR Code – To establish a connection between the

phone app and the computer app, the phone app re-

quires the computer’s address. This is achieved by

scanning the QR code (Figure 7), through which the

phone app saves the address, connects to the computer

app, and sends a message specifying the version to

be used and to initiate the simulation. The ZXing li-

brary was used to decode the QR code, and the NuGet

package WebSocketSharp. Standard was used to im-

plement the WS client.

Main Menu, Contacts and Call – After establish-

ing the connection, the phone app transitions to the

main phone scene, designed to resemble a simpliﬁed

mobile operating system tailored for law enforcement

use, featuring a Contacts application. The interface

is styled with a distinctive “police” motif as the back-

drop, enhancing the device’s professional appearance.

A header displays a live clock, a padlock symbol to

signify system security, and a signal strength indica-

tor to inform about the connection’s quality. The pri-

mary interface showcases the Contacts app, facilitat-

ing the saving of phone numbers and initiating calls –

Figure 12(a).

The design of the Contacts menu is intentionally

straightforward, equipped with a green button for new

contact entries and a red cross for navigating back

to the home screen – Figure 12(b). Activating the

add contact feature unveils input ﬁelds for the con-

tact’s name and number, alongside a save option –

Figure 12(c).

While on a call, the screen displays the con-

tact’s name with a prominent button at the bottom

to hang up – Figure 12(d). To enrich the call’s au-

thenticity, background sounds such as footsteps, door

movements, and mufﬂed conversations are integrated,

adding depth to the simulation experience.

Figure 12: From left to right: (a) Main screen, (b) Contacts,

3.4 Microservices

To manage the Machine Learning (ML) aspects of

the simulation, a Python REST server has been es-

tablished. This server launches concurrently with the

computer application, ensuring seamless integration

between the components. For verbal interactions with

the hostage-taker, two key models are employed: a

Speech-to-Text (STT) transcriber for converting spo-

ken words into written text, and a semantic compara-

tor to analyze and match the dialogue’s content.

Speech Transcription (STT) – Azure Speech Ser-

vices was selected as transcriptor for its good timing

performances. While local models could potentially

offer superior performance on computers equipped

with higher processing capabilities, the cloud-based

solution is chosen for this project. It ensures uni-

form performance across different computers running

the simulation. However, note that a minimum inter-

net connection speed of 10Mb/s is necessary to pre-

vent any performance issues related to uploading au-

dio ﬁles to the cloud.

Word Embeddings with Text Similarity – The ne-

gotiator’s spoken words are matched against pos-

sible replies within the dialogue tree. SpaCy is

an open-source Natural Language Processing (NLP)

tool that employs pretrained transformer models like

The Negotiator: Interactive Hostage-Taking Training Simulation

247

BERT (Devlin et al., 2018). It is designed to work

effectively in both CPU and GPU setups, supporting

platforms such as TensorFlow or PyTorch. In this

project, spaCy is chosen as a local library for its ease

of use and low demand on computational resources.

The longest time taken to compare sentences averages

at 0.02 seconds, which aligns with the simulation’s re-

quirement for real-time processing.

4 PERFORMANCES

The application’s performance was analyzed in terms

of timings. The ﬁgure below (Figure 13) illustrates

the speech processing sequence of the application.

The “End of Speech Detection” phase is also included

in the total processing time but does not involve AI

services. The sentence comparison step is not repre-

sented as the time for this step is negligible (0.03 sec).

Figure 13: Timings sequence comparison between self-

hosted and online-services solution.

Depending on the sound level of the environment

in which the simulation is played or the volume level

thereof, the end of speech is sometimes not detected.

Another issue identiﬁed after implementing the ﬁnal

product is an empty transcription. Azure Speech Ser-

vices has two modes of operation: stream and one-

shot. The one-shot mode was used, and if it does not

detect speech in the ﬁrst two seconds of the audio ﬁle,

it considers it as empty. This proved to be problematic

because participants, sometimes not feeling comfort-

able or needing time to think about what to say to the

hostage-taker, saw all their responses considered null.

For further work, the stream mode is recommended.

5 DISCUSSION

The development and deployment of the simulation,

with its integration of a 3D environment and nat-

ural language processing dialogues, require a com-

prehensive approach that goes beyond the scope of

conventional computer science. The simulation’s in-

tricate framework, which includes a mobile app, a

Python server for NLP functionalities, and a detailed

3D world, introduces various technical and design

hurdles. While this complexity enhances the user ex-

perience, it also complicates distribution due to the

elaborate setup and deployment processes involved.

Incorporating a 3D element not only boosts the

simulation’s realism and visual appeal but also im-

poses certain limitations, such as the demand for sub-

stantial computing power and restricted content cre-

ation options for facilitators. The Unity game en-

gine plays a pivotal role in creating the 3D setting

and managing the simulation’s logic, striking a bal-

ance between technological innovation and usability.

Nonetheless, the requirement for a powerful graph-

ics processor highlights the importance of considering

the simulation’s compatibility and accessibility across

different platforms. The use of a VR headset was not

considered for this simulation as it would interfere

with the normal usage of the cellphone. Moreover,

the aim of this project is also to demonstrate the use

of the cellphone as an immersive VR tool.

Opting for a local network solution over real

phone number interactions addresses some practical

challenges related to incorporating mobile phone fea-

tures. This approach simpliﬁes the technological re-

quirements by leveraging WebSockets but also points

to the need for improvements in remote connectivity

and user experience.

The deployment of an NLP pipeline, includ-

ing speech-to-text transcription and text comparison,

showcases the application of advanced machine learn-

ing technologies. However, the substantial process-

ing power these transformer-based models demand

presents a bottleneck in performance, particularly on

less equipped devices. Resorting to cloud-based ser-

vices to bypass these hardware limitations raises is-

sues about model control, the reliability of external

services, and dependence on network quality, despite

offering faster processing times overall.

6 CONCLUSION

This project resulted in the creation of a hostage-

taking simulation prototype for Negotiator training

including a 3D simulation, a phone application that

allows for natural dialogue with the hostage taker, and

a Python server providing Natural Language Process-

ing (NLP) services. The implementation of this simu-

lation showcases the complexities and challenges in-

herent in creating an immersive and technologically

advanced training tool. This research has shown that

it is possible to use a smartphone as an interactive

tool for naturally (verbally) conversing with an au-

tonomous agent in a simulation. This innovation

paves the way for using the phone not only in the

SIMULTECH 2024 - 14th International Conference on Simulation and Modeling Methodologies, Technologies and Applications

248

speciﬁc context of police negotiation but for all nego-

tiation situations such as: Business and Commerce,

Diplomacy and International Relations, Law and Me-

diation, Human Resources, Healthcare, Entertain-

ment and Media, Politics, Education, Environmental

Conservation. Beyond negotiation, all computer sim-

ulations or video games involving autonomous agents

can beneﬁt from this research by including the phone

as an interaction tool in their toolkit. To take this a

step further, we have developed an LLM (AI) model

for the autonomous agent and user tests – see twin

paper (Monaco et al., 2024).

Limitations in terms of timing, particularly dur-

ing speech capture and the transcription of the partic-

ipant’s speech are still to be overcome. In the current

state, the performance of standard computers does not

allow the use of local models and to have a smooth

and uninterrupted discussion, which is why online

services had to be used. This choice can pose a prob-

lem in the case of simulation exchanges containing

sensitive or conﬁdential personal data. The use of ex-

ternal services also poses issues in terms of availabil-

ity and bandwidth. Further research into the use of

NLP models on smartphones could yield results that

overcome these limitations; the limiting factor would

undoubtedly be the quality of the phone itself.

ACKNOWLEDGEMENTS

We are deeply thankful to the University of Applied

Sciences and Arts of Western Switzerland (HES-SO)

and the University of Sk

ovde (HIS) for their gener-

ous provision of essential equipment and invaluable

knowledge, which greatly contributed to the success

of this project. We extend our heartfelt appreciation

to Diego Villagrasa and Dylan Canton for their in-

valuable assistance during the implementation pro-

cess, their expertise and dedication signiﬁcantly en-

hanced the project’s development. Additionally, we

express our gratitude to Colin Lavanchy for lending

his voice to portray the hostage-taker, enriching the

simulation with authenticity and depth.

REFERENCES

Baines, L. (2008). A Teacher’s guide to multisensory

learning: Improving literacy by engaging the senses.

ASCD.

Bel-Enguix, G., Dediu, A. H., and Jim

enez-L

opez, M. D.

(2009). A Dialogue-Based Interaction System for

Human-Computer Interfaces, pages 55–65. Springer

Berlin Heidelberg, Berlin, Heidelberg.

Bienstock, J. and Heuer, A. (2022). A review on the evolu-

tion of simulation-based training to help build a safer

future. Medicine, 101:e29503.

Bunker, W. M. (1978). Training effectiveness versus simu-

lation realism. In Beiser, L., editor, Visual Simulation

and Image Realism I, pages 76–82.

Cohen, A., Gottifredi, S., Garc

ıa, A. J., and Simari, G. R.

(2014). A survey of different approaches to support in

argumentation systems. The Knowledge Engineering

Review, 29:513–550.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.

(2018). Bert: Pre-training of deep bidirectional trans-

formers for language understanding. North American

Chapter of the Association for Computational Lin-

guistics.

Hochmitz, I. and Yuviler-Gavish, N. (2011). Physical ﬁ-

delity versus cognitive ﬁdelity training in procedural

skills acquisition. Human Factors: The Journal of the

Human Factors and Ergonomics Society, 53:489–501.

Lobo, I., Rato, D., Prada, R., and Dignum, F. (2022).

Socially Aware Interactions: From Dialogue Trees

to Natural Language Dialogue Systems, pages 124–

140. Springer International Publishing.

MacLean, S., Geddes, F., Kelly, M., and Della, P. (2019).

Realism and presence in simulation: Nursing student

perceptions and learning outcomes. Journal of Nurs-

ing Education, 58:330–338.

Monaco, P.-B., Backlund, P., and Gobron, S. (2024). In-

teractive storytelling apps - increasing immersion and

realism with artiﬁcial intelligence? In 14th In-

ternational Conference on Simulation and Modeling

Methodologies, Technologies and Applications (SI-

MULTECH 2024). SCITEPRESS.

Monaco, P.-B., Villagrasa, D., and Canton, D. (2023).

The negotiator. In Proceedings of the 8th Interna-

tional Conference on Gamiﬁcation & Serious GameS

(GSGS’23), pages 94–97. HE-Arc, HES-SO.

Salas, E., Rosen, M. A., Held, J. D., and Weissmuller,

J. J. (2009). Performance measurement in simulation-

based training. Simulation & Gaming, 40:328–376.

Simpson, H. and Oser, R. L. (2003). Evaluating large-scale

training simulations. Military Psychology, 15:25–40.

Valiente, D., Pay

a, L., de

Avila, S. F., Ferrer, J.-C., Cebol-

lada, S., and Reinoso,

O. (2019). Active learning pro-

gram supported by online simulation applet in engi-

neering education. In SIMULTECH, pages 121–128.

The Negotiator: Interactive Hostage-Taking Training Simulation

249