Utilizing ChatGPT as a Virtual Team Member in a Digital

Transformation Consultancy Team

Tim de Wolff and Sietse Overbeek

Department of Information and Computing Sciences, Faculty of Science,

Princetonplein 5, 3584 CC, Utrecht University, The Netherlands

Keywords:

Business Architecture, ChatGPT, Consultancy, Digital Transformation, GenAI.

Abstract:

This study aims to design and evaluate a method that leverages ChatGPT for efﬁciency improvement in digital

transformation projects, speciﬁcally while designing target business architecture products. The main research

question is stated as follows: ‘How can a large language model tool be utilized to support the development

of target business architecture products?’ The resulting method, GenArch, enables utilization of ChatGPT

throughout business architecture design processes. This method is validated by means of expert interviews

and an experiment. The perceived ease of use, perceived usefulness, and intention to use of the method are

analyzed to assess the perceived efﬁcacy, which serves as an indicator for efﬁciency. The results show that

GenArch possesses at least a moderately high level of perceived efﬁcacy.

1 INTRODUCTION

Recent developments in the ﬁeld of Generative Artiﬁ-

cial Intelligence (GenAI) and ChatGPT have affected

both academia and practice in a wide variety of do-

mains. GenAI can be deﬁned as the automated con-

struction of intelligence Zant et al. (2013). ChatGPT

is a Large Language Model (LLM)-based chatbot that

allows users to conduct, reﬁne, and steer meaning-

ful discussions about topics of their preference. The

models behind ChatGPT are trained on hundreds of

terabytes of textual data and are therefore highly ef-

fective Shanahan (2024). LLMs are generative math-

ematical models of statistical distribution that gener-

ate statistically likely continuations of words based on

the input prompt. Because of context dependency, the

inclusion of many activities, and the large amount of

digital possibilities Westerman et al. (2014), Digital

Transformation (DT) is a suitable domain for differ-

ent types of support and guidance, like text generation

and advice that ChatGPT can deliver Liu et al. (2023).

DT is deﬁned as the use of new digital technologies,

like for example social media, big data or AI, to en-

able major business improvements, such as enhancing

customer experience, improving operations or cre-

ation of new business models Fitzgerald et al. (2013).

A speciﬁc part of DT is Business Architecture (BA).

https://orcid.org/0000-0003-3975-200X

This is deﬁned as an enterprise map that provides a

common understanding and is used to align strategic

objectives and tactical demands Simon and Schmidt

(2015). The target BA is the to-be state that is desired

by an organization and the goal of the DT. This map

can be divided in a variety of BA products which can

most effectively be designed and deployed by the use

of feedback cycles Hohpe (2017). Examples are a ca-

pability map, which is a visualization of an organiza-

tion based on distinct abilities to perform unique busi-

ness activities Wißotzki and Sandkuhl (2015), and a

value stream map, which is a map of sequences of ac-

tivities that are required to deliver products or services

LeanIX (2024).

This study aims to contribute to the domain of

GenAI by the design and validation of a method

that describes how ChatGPT can be leveraged in DT

projects to improve efﬁciency when designing target

BA products. The professional application, the asso-

ciated risks, and the inﬂuence of prompts are investi-

gated. ChatGPT has the potential to transform busi-

nesses by automating and executing language-based

tasks with unprecedented speed and efﬁciency KPMG

(2023). Making BA decisions in advance is subopti-

mal, as additional information becomes available over

time, allowing for more informed decisions Hohpe

(2017). We therefore focus on the design of target BA

products rather than entire BAs to ensure that not all

decisions need to be made in advance. Thereby, tar-

Wolff, T. and Overbeek, S.

Utilizing ChatGPT as a Virtual Team Member in a Digital Transformation Consultancy Team.

DOI: 10.5220/0013286200003929

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 27th International Conference on Enterprise Information Systems (ICEIS 2025) - Volume 2, pages 707-721

ISBN: 978-989-758-749-8; ISSN: 2184-4992

707

get BA product design processes have a speciﬁc out-

put, namely the product itself, and could likely be ex-

ecuted more efﬁciently while utiziling ChatGPT.

123

This is expected as BA involves lots of stakeholders,

activities and uncertainties, and requires proper align-

ment over all aspects.

123

Inefﬁciencies in practice

result from insufﬁcient input, the need to start from

scratch, time investment in modeling tasks, and de-

liberations with stakeholders.

This study aims to

provide a clear overview of how target BA product

design processes could incorporate the use of Chat-

GPT in a structural and efﬁcient manner. Within this

research, the focus speciﬁcally lies on ChatGPT be-

cause of the available (grey) literature, the isolated

interface, and more advanced storage and sharing op-

tions compared to alternatives. The design problem

Wieringa (2014) is stated as follows: Improve the de-

sign process of target Business Architecture products

by employing a method that guides the proper use of

an LLM tool (taking the example of ChatGPT) that

satisﬁes requirements of the target Business Architec-

ture products in order to increase efﬁciency of the de-

sign processes for the designer.

This research is performed in collaboration with

accounting ﬁrm KPMG to gain access to experts and

company documentation. A case study is performed

at KPMG Netherlands within the Digital Transforma-

tion team, which is part of the overarching Digital

team in their Advisory business unit. This team fo-

cuses on transformation projects in which they guide

transformation processes. The aim of this study is

to design and validate a method. A method is de-

ﬁned as an approach to perform a systems develop-

ment project, based on a speciﬁc way of thinking,

consisting of directions and rules, structured in a sys-

tematic way in development activities and deliver-

ables Brinkkemper (1996). The designed method is

aimed at DT projects, though, and does not solely in-

volve development activities. The proposed method

will serve as guidance for its stakeholders, prescrib-

ing the appropriate timing and utilization of Chat-

GPT when designing a target BA product. Further-

more, an overview of risks associated with ChatGPT

in a professional context are provided as part of our

study and we contribute to the ﬁelds of prompt engi-

neering and BA. BA is often viewed as merely one

of the elements making up an Enterprise Architec-

ture (EA), while it helps establish the pivotal connec-

tion between the business and IT sides of companies

Bouwman et al. (2011). This research aims at im-

KPMG Senior Manager, personal communication, May

2024

KPMG Manager1, personal communication, March 2024

KPMG Manager2, personal communication, May 2024

proving the efﬁciency of the target BA product design

process. Both scientiﬁc and grey literature focus on

the quality of the BA, tools or frameworks, and the

content of the BA. However, to the best of our knowl-

edge, no literature is existent that aims at improving

efﬁciency of such a process.

The remainder of this paper is as follows. First,

the research approach is presented in section 2. Sec-

ond, the results of a Multivocal Literature Review

(MLR) are discussed in section 3. Third, the research

process, including two iterations of the design cycle,

is described in section 4. Fourth, the results of the

experiment and analysis of these results are discussed

in section 5. Finally, the paper ends with a discus-

sion in section 6 and conclusions and future research

in section 7.

2 RESEARCH APPROACH

This section presents the research approach that is

adopted throughout this study. This includes the re-

search questions, the conducted method and the liter-

ature review and case study protocols. The main aim

of this research is to design and evaluate a method

that describes how ChatGPT can be leveraged in DT

projects to improve efﬁciency when designing target

BA products. The proposed method will serve as

guidance for stakeholders, prescribing the appropri-

ate timing and utilization of ChatGPT within a BA

context.

2.1 Research Questions

To be able to fulﬁll this aim, the following Main

Research Question is formulated:

Using the example of ChatGPT, how can a Large

Language Model tool be effectively integrated into

the design process of target Business Architecture

products to mitigate risks, optimize prompt

impact, and improve efﬁciency?

To be able to answer this MRQ, ﬁve Sub Research

Questions are constructed. SRQ1. In what ways does

ChatGPT pose risks within a professional context of

target Business Architecture design? This SRQ is fo-

cused on risks that occur when using ChatGPT in a

professional context. A MLR is performed, aiming

to identify risks that could occur during professional

use. These risks are to be mitigated during the de-

sign of the method in SRQ4. Section 3.1 presents

the ﬁndings of this review. SRQ2. What is the im-

pact of prompts on the output generated by ChatGPT

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

708

within a professional context of target Business Archi-

tecture design? This SRQ also requires an extensive

literature review. Section 3.2 presents the ﬁndings of

this MLR. SRQ3. How are design processes of tar-

get Business Architecture products shaped? The exact

activities and responsibilities are retrieved during the

last part of the MLR in section 3.3. This review aims

to identify the BA products and the steps that are part

of the design process. Together with additional input

of KPMG experts, this SRQ is answered in iteration

Alpha. SRQ4. How can a method be introduced to

effectively leverage ChatGPT in the design process of

target Business Architecture products? With the input

of the previous SRQs, a method can be constructed.

The technical version of this method is designed us-

ing a Process Deliverable Diagram (PDD). This is

a meta-modeling technique, based on UML activity

and class diagrams Weerd van de and Brinkkemper

(2008). The used method engineering protocol can

be found in Wolff de (2024). SRQ5. To what ex-

tent does the method incorporating ChatGPT improve

the efﬁciency of the target Business Architecture de-

sign process? To be able to ﬁnd out how the con-

structed method performs, a case study at KPMG is

performed. The focus lies on efﬁciency, for which the

perceived ease of use, perceived usefulness and inten-

tion to use are considered. Section 2.2.2 elaborates

on these variables. The case study consists of two

separate parts: expert interviews and an experiment.

These are executed at separate stages in the research

process. Section 2.2 outlines the research method and

covers this case study.

2.2 Design Science Method

Design science can be seen as the design and investi-

gation of artifacts in context Wieringa (2014). We de-

sign a method to interact with the problem context of

DT consultancy processes to improve the efﬁciency

when designing target BA products. This study ap-

plies the design cycle that is shown in Figure 1. In this

cycle, design is decomposed in three tasks: problem

investigation, treatment design and treatment valida-

tion Wieringa (2014). It is presented as a cycle as re-

searchers iterate over these activities multiple times.

During this research, we iterate over the design cy-

cle twice. These iterations involve dissimilar vali-

dation methods that lead to distinct insights. During

both cycles, a large amount of the data will be gener-

ated at KPMG by performing a case study Heale and

Twycross (2018). The case study consists of analyz-

ing company documentation, conducting two expert

interviews and executing an experiment. Explana-

tions of both iterations, as well as their accompanying

steps and deliverables, are presented in the remaining

part of this section.

2.2.1 Iteration Alpha

We refer to the ﬁrst iteration as iteration Alpha. This

iteration is focused on designing an initial version

of the method and validating this through expert

interviews with scholars and practitioners. This

iteration consists of the steps mentioned hereafter.

Problem investigation. First, the problem and its

context are analyzed. This is done by reviewing

related works and conducting an MLR. The MLR

focuses on the risks of ChatGPT in a professional

context, prompt engineering and the BA process.

This type of literature review is chosen to be able to

explore recent insights in the relatively new research

topic GenAI. At the completion of this stage, SRQs

1, 2 and 3 should be answered. Treatment design.

Second, the ﬁrst version of the treatment is designed

based on insights of the problem investigation. The

treatment is the interaction between the artifact and

the problem context Wieringa (2014). The artifact

in this research is a method, as this ﬁts the goal of

specifying when and how ChatGPT should be used

throughout a process to improve efﬁciency. The name

of this method is GenArch, a merger of the terms

GenAI and architecture. The GenArch method aims

to serve as guidance for stakeholders, prescribing

the appropriate timing and utilization of ChatGPT

within the context of BA. The inclusion of ChatGPT

in the method is based on insights of the experts

from KPMG and literature about GenAI. A method

engineering approach is adopted for the creation

of the GenArch method. The Process Deliverable

Diagram is selected as the meta-modeling technique.

This technique is especially developed for method

engineering purposes and can be used for analyzing

and assembling method fragments Weerd van de

and Brinkkemper (2008). A separate version, which

does not follow the PDD notation and shows a

ballpark view of the method, is designed as well

to increase understandability of GenArch. Treat-

ment validation. Third, the initial version of the

method is validated using expert interviews. In total,

six semi-structured interviews are conducted with

KPMG experts, external experts and scholars to gen-

erate qualitative data about potential improvements.

Semi-structured interviews start with an interview

protocol comprised of open-ended questions and

allow for follow-up questions of the interviewer

Magaldi and Berler (2020). The interviewees are

selected based on their knowledge of and experience

with DT consultancy, EA and GenAI. The focus of

the validation lies on validating the design process

Utilizing ChatGPT as a Virtual Team Member in a Digital Transformation Consultancy Team

709

Figure 1: Design cycle.

of BA products and the categorization of the

activities within this process.

2.2.2 Iteration Beta

We refer to the second iteration as iteration Beta. This

iteration is focused on analyzing the gathered qualita-

tive data from the expert interviews to identify im-

provement possibilities. Subsequently, the method is

modiﬁed based on these insights and validated by an

experiment, performed at KPMG. This iteration con-

sists of the following steps: Case evaluation. The

qualitative interview data is analyzed to identify im-

provement possibilities. Treatment redesign. The

method is improved based on the insights gathered

during the case evaluation. Treatment validation.

The improved, ﬁnal version of the method is validated

using an experiment, the last part of the case study.

The focus of this validation lies on validating if par-

ticipants perceive (a segment of) the method as more

efﬁcient. The experiment is performed in an off-line

setting with professionals from the Digital Transfor-

mation team of KPMG Netherlands. Figure 2 illus-

trates the design of the experiment. The goal of this

experiment is to measure the efﬁciency via the vari-

ables introduced hereafter. Both treatment validation

steps are aimed at ﬁnding potential improvements for

the method and validating the model based on crite-

ria. We retrieve these criteria from the method eval-

uation model Moody (2003). This model provides

mechanisms for evaluating both the likelihood of ac-

ceptance and the actual impact of a method in practice

Abrah

ao et al. (2009). The advantage over other com-

parable models is the incorporation of actual efﬁcacy

and actual usage Abrah

ao et al. (2009). To be able

to validate if the GenArch method is an improvement

over the current approaches, the perceived efﬁcacy is

used as the measure and taken as indicator for the efﬁ-

ciency. This measure consists of three sub-measures:

The perceived ease of use, which refers to the ex-

pected required effort to learn and use the method;

the perceived usefulness, which refers to the expected

degree to which the method will achieve intended ob-

jectives; the intention to use, which refers to the extent

to which a person intends to use a particular method

Abrah

ao et al. (2009). The ethical considerations of

this research are found in Wolff de (2024).

3 LITERATURE STUDY RESULTS

The results of the MLR are divided into four parts: the

risks associated with professional use of ChatGPT, the

state-of-the-art developments in the ﬁeld of prompt

engineering, the steps and products needed to create

a target BA, and a discussion of four works related to

this study.

3.1 Risks ChatGPT

As mentioned in section 1, ChatGPT is trained on

lots of input data Shanahan (2024). This data par-

tially consists of copyrighted texts, leading to con-

cerns about legal complications Pi

neiro-Mart

ın et al.

(2023). Thereby, it is not known how much data is

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

710

Figure 2: Experiment design.

used for the training of the LLM. LLMs often operate

as black boxes, which makes it challenging to under-

stand the decision-making processes behind the pro-

vided results and expert knowledge is then required to

verify produced content Pi

neiro-Mart

ın et al. (2023).

User instructions are part of the GenArch method

and include reminders to fact-check important results.

Furthermore, there are risks associated with the cur-

rent inability of GenAI to distinguish between con-

cepts acquired through data compression and singular

memorized concepts. This inability leads to halluci-

nation, i.e., the fabrication of credible but factually

incorrect content and to talk about this with conﬁ-

dence Pi

neiro-Mart

ın et al. (2023). Therefore, LLMs

are still considered far from reliable Qiu et al. (2023).

ChatGPT also does not memorize mathematical con-

cepts and the corresponding rules but aims to achieve

pattern recognition, sometimes resulting in wrongly

performed mathematics Ferrini (2023). It could occur

that a GenAI algorithm leaks an example of text or

visual art that it has memorized, which in turn leads

to plagiarism Ferrini (2023). This is likely one of the

reasons that organizations are reluctant to use sensi-

tive data in input prompts. Therefore, expert veri-

ﬁcation is an important recurring step in GenArch.

Another risk is that LLMs learn from vast amounts

of training data, which may inadvertently contain bi-

ases Pi

neiro-Mart

ın et al. (2023). Thereby, they could

partially embody the biases of their creators Tacheva

and Ramasubramanian (2023). A consequence then is

the potential to intensify discrimination and inequal-

ity Tacheva and Ramasubramanian (2023). LLMs

present language bias as well, as they perform well

in English compared to other languages as most train-

ing data was English Qiu et al. (2023). Therefore,

in GenArch the results are used as input or sugges-

tions, but decisions are made by humans to increase

the chance biases are avoided. The last risk relates to

privacy. As LLMs memorize input data it is possible

to extract sensitive information from this data using

prompts Qiu et al. (2023). Examining the privacy is-

sues that are associated with sensitive information be-

fore putting it in a prompt is therefore crucial and it

is advised to use GenArch in private GenAI tools to

mitigate these risks.

3.2 Prompt Engineering

In essence, prompt engineering entails optimizing

textual input to effectively communicate with Large

Language Models Bains (2023). Prompt engineer-

ing is the process of formulating a prompt in a way

that a GenAI system produces an output that closely

matches the expectations Bains (2023). It involves

considering the inner working of an AI model to be

able to construct inputs that work well with the model

Bains (2023). Prompt engineering skills are vital for

fully leveraging LLMs, but they do not come nat-

urally and need to be learned Wang et al. (2024).

The designed method includes a prompt template that

increases the likelihood of useful results. Prompt

patterns can be deﬁned as summaries of effective

prompt-tuning techniques that provide an approach to

crafting the input and interaction to achieve desired

output Wang et al. (2024). Proven patterns will be in-

cluded in the prompt templates of GenArch. The ex-

Utilizing ChatGPT as a Virtual Team Member in a Digital Transformation Consultancy Team

711

act patterns included in the method are chosen during

the treatment design phase. In the persona adoption

pattern, a user asks the LLM to play a particular role

without providing further details Wang et al. (2024);

Bains (2023). Closely related to this is the appliance

in reverse by instructing to complete a task with a spe-

ciﬁc audience in mind Bains (2023). Both appliances

of this pattern will be included in the prompt tem-

plates and suggestions. Chain-of-thought prompting

is appropriate for problem-solving Bains (2023) and

generates a sequence of steps with explanations be-

fore inferring the output. This manner of prompting

will be advised when ChatGPT does not answer as

expected during use of GenArch.

Another prompting technique is few-shot prompt-

ing, where examples are included within a prompt

to assure the output meets expectations Willey et al.

(2023). Target-your-response prompting is focused

on the output of the system. If the GenAI tool is not

explicitly told about the appearance of an answer, it

can give results in many forms. This prompt con-

tains two elements, the question or problem and an

explanation of what the response should be like Eliot

(2023). The inclusion of the desired response in the

prompt should cause the use of GenAI to be sufﬁ-

ciently efﬁcient, and if users pay per transaction, to

possibly be less costly Eliot (2023). It will therefore

be included in the prompt templates. Multi-turn or

continuous prompting involves a dialogue or conver-

sation between the model and the user to iteratively

get to the desired output rather than optimizing a sin-

gle prompt Bains (2023). Multi-turn prompting will

not be a standard component of the method but rather

an advise to further explore the results that Chat-

GPT provides. Meta prompting denotes using higher-

order prompts to let the LLM generate its own natural

language prompts for certain tasks Korzynski et al.

(2023). This approach aims to leverage the inherent

capabilities and understanding of natural language of

such models to create more effective prompts Korzyn-

ski et al. (2023). This prompting strategy will be used

as input during prompt template design.

3.3 Business Architecture Steps

This section is focused on identifying required prod-

ucts and steps to create a target BA. These are capa-

bilities, value streams, principles, business processes,

roles, business functions, a gap analysis, policies and

change management. All of these concepts are in-

cluded in the initial version of GenArch as steps or

products. A capability is deﬁned as the capacity of an

organization to successfully perform a unique busi-

ness activity to create a speciﬁc outcome Wißotzki

and Sandkuhl (2015). Examples are human resource

management (HRM) and customer relationship man-

agement (CRM). Capabilities can be mapped in a

business capability map Group (2022), which is a vi-

sualization of an organization based on distinct busi-

ness capabilities Smith (2024). Value streams are

sequences of activities that are required to deliver

a product or service to a customer LeanIX (2024);

Smith (2024). It is a collection of all activities, value

adding as well as non-value adding, that are required

to go from raw material to the end customers Singh

et al. (2011). Value streams show how the capabili-

ties enable value, as the value ﬂows through the ca-

pabilities and gives context to why capabilities are

needed Group (2022). An example is to ﬁrst receive

requirements, then verify these and subsequently de-

velop software based on these requirements to be able

to ultimately deploy a ﬁtting software solution. Value

stream mapping is deﬁned as the outlining of activi-

ties that an organization performs to create value that

is being delivered to stakeholders Group (2022).

Principles are general rules and guidelines that in-

form and support the way in which an organization

fulﬁlls its mission Group (2022). In this research,

we speciﬁcally focus on architecture principles that

bridge the gap between high-level intentions and con-

crete design decisions, and document fundamental

choices in an accessible form Greefhorst and Proper

(2011). BA is principle-driven and the principles are

preferably understandable, robust, complete, consis-

tent and stable Group (2022). Models of business

processes are one of the most established elements of

BA and describe components in business processes

with additional information like ownership or type of

activity Smith (2024). An example is a sales pro-

cess of generating leads, qualifying them as potential

customers, and closing deals. In the context of the

broader approach that is BA, value streams and busi-

ness processes are integrated to help provide a com-

prehensive understanding of the vital processes Smith

(2024). The different roles and responsibilities that

should be allocated for capabilities and activities are

closely related to the organizational structure. The

structure displays the formal hierarchy of the orga-

nization including departments, teams, roles and re-

sponsibilities LeanIX (2024). The decisions related

to the allocation of the responsibility of capabilities

and business processes are key in BA. Other elements

closely related to the organizational structure are busi-

ness functions and units and BA tries to achieve inte-

gration between these. The technique known as gap

analysis is widely used in TOGAF to validate a de-

veloped target BA Group (2022). The basic idea is

to highlight the differences between the baseline and

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

712

target Group (2022). Identiﬁed gaps are also input for

the roadmap towards achieving the target BA. Policies

and standards deﬁne the rules and guidelines that have

to be adhered to in various aspects of the organization

LeanIX (2024). Having such rules causes consistency

and compliance with regulations. Standards are of-

ten derived from best practices. Policies come from

regulations, these could be sector-speciﬁc or enforced

by law LeanIX (2024). The last element discussed

relates to how to manage the execution of the actual

DT. This is referred to as change management LeanIX

(2024) which is an approach of dealing with change

in terms of strategy, control and helping stakeholders

adapt.

3.4 Related Work

Four related works were identiﬁed. First, we dis-

cuss an evaluation of ChatGPT as a tool in common

business decision-making cases Chuma and Oliveira

(2023). In this study, ChatGPT is provided with

three simple questions, where ChatGPT showed sim-

ple text in all cases but appears useful to present top-

ical overviews. Another paper is concerned with the

idea of having ChatGPT as a virtual member of a soft-

ware development team to inform, coach and execute

a share of the development work Bera et al. (2023).

An experiment is included to assess the performance

of ChatGPT with tasks that would be performed by

scrum masters. Both of these studies are executed in

ﬁelds related to the DT ﬁeld. Although both studies

are exploratory, they both reveal the potential of the

professional use of ChatGPT. The subsequent paper

can be described as an exploration of the use of GenAI

in the software industry Ebert and Louridas (2023). It

is concluded that GenAI has the potential to signif-

icantly improve software production by automation,

enhancing creativity, improving accuracy, and stream-

lining development processes. Lastly, a paper related

to HRM and prompt engineering is identiﬁed. It is

shown that GenAI can be a helpful assistant for strate-

gic and operational tasks that HRM specialists per-

form Aguinis et al. (2024). Guidelines are provided

regarding effective prompt design and a veriﬁcation

process is implemented to check outputs. All of these

papers relate to this research as they contain case stud-

ies, risk assessments and prompt engineering. How-

ever, no papers have been found that are concentrated

on the ﬁeld of DT. Therefore, this research addresses

the gap of the application of GenAI in BA design and

also aims to extend knowledge on prompt engineer-

ing in this ﬁeld by the performed MLR. Furthermore,

no literature has been found that aims at improving

efﬁciency of the BA design process.

The case study in this research is executed over

multiple iterations in the context of method valida-

tion. The aim is to generate results on efﬁciency when

compared to the exploratory case studies in the men-

tioned papers. The ﬁndings of the MLR have impli-

cations for method design. First, the identiﬁed risks

(Section 3.1) are taken into account throughout the

entire design. Elements that are included are: lim-

iting the recommended use to activities that involve

common concepts such that a decent amount of train-

ing data can be expected, providing guidelines re-

garding proper use of the method, and urging users

to fact-check results. Moreover, decisions should

be made by humans and, preferably, users apply the

GenArch method in a paid version of ChatGPT where

the GenAI model cannot use input prompts to learn

OpenAI (2023). Second, the ﬁndings on prompt en-

gineering (Section 3.2) are leveraged by including a

prompt template in the method itself. The full prompt

template is presented in Wolff de (2024). Third, the

identiﬁed BA products and steps (Section 3.3) are in-

cluded in the method.

4 THE GenArch METHOD

This section describes the GenArch method that is de-

signed during this research. First, the method and

all of its components are explained. Subsequently,

the application of the design cycle as explained in

section 2 is outlined. The design cycle of this re-

search includes two iterations: Iteration Alpha is fo-

cused on designing and validating an initial version

of the GenArch method. Iteration Beta is focused on

analyzing the insights gathered during the validation

in Alpha, using these to redesign the initial version

of the GenArch method and validating this through

an experiment with KPMG experts. Extensive ex-

planations of the process of both these iterations can

be found in Wolff de (2024). The experiment is ex-

plained in more detail in section 4.2. The treatment in

this research is a method that serves as guidance for

stakeholders, prescribing the appropriate timing and

utilization of ChatGPT in the design of BA products.

The model uses colors of the visual identity of KPMG

KPMG (2024b), recognizable illustrations, and short

descriptions. Figure 3 presents the ballpark view of

the GenArch method.

The BA products and main activities are based on

the insights of the problem investigation of iteration

Alpha. Each main activity, from now on referred to

as phase, is divided into smaller activities. The key

stakeholders, policies, baseline assessment, design

product / decision, gap analysis and change manage-

Utilizing ChatGPT as a Virtual Team Member in a Digital Transformation Consultancy Team

713

ment sub-activities are identiﬁed in the MLR as pre-

sented in section 3.3. The remaining sub-activities are

identiﬁed in collaboration with KPMG experts.

1234

BA products are often redesigned and updated more

than once over time

KPMG (2024a). Therefore,

the method is iteratively designed so users can loop

through the process multiple times. Activities that are

marked with OpenAI logos are classiﬁed as ‘suited for

ChatGPT’. This does not mean that ChatGPT com-

pletely takes care of this activity. It means that Chat-

GPT can be utilized as a reference, as an input source

or a conversation partner. Human input is still always

needed to reﬁne, update, judge or provide context to

the results.

If the capability map is selected as the example

BA product for applying the GenArch method, it is

plugged in the process in the middle of ﬁgure 3. The

method starts with creating a vision for this capabil-

ity map by answering the following questions: Who

are the stakeholders and what are their needs?; What

are the objectives?; What is the scope?; ﬁnally: How

do we measure progress towards our goals? After

this phase is complete, architecture principles need

to be designed based on best practices and relevant

policies. Architecture principles can be reused from

earlier projects.

234

Subsequently, a baseline is ana-

lyzed by identifying strengths, weaknesses and root

causes for these weaknesses. After this phase, the tar-

get capability map needs to be designed. This could

be an update of a current version or a completely new

map. After ﬁnishing this capability map, the method

continues with conducting a gap analysis on the base-

line and target capability map while also taking risk

mitigation into account. This process is ﬁnished by

designing a roadmap that includes an implementa-

tion strategy, a change management strategy and a

timeline with important milestones. For each activity

that is marked with an OpenAI logo, the tool Chat-

GPT is used. The recommended use of ChatGPT

can be found at the bottom of ﬁgure 3. The ‘recom-

mended use of ChatGPT’ section consists of six sub

sections. The ‘intended role of ChatGPT’ and ‘use

cases’ sections are derived from the categorization in

iteration Alpha where the ways of using ChatGPT are

explained. The ‘potential risks’ section is directly de-

rived from the sub review of the MLR about the risks

of ChatGPT (see: Section 3.1). The ‘risk mitigation

strategy’ section is also partly based on this sub re-

view but additionally includes parts of the guidelines

of KPMG regarding the use of GenAI. The ‘prompt-

ing strategy’ and ‘reﬁne output’ sub sections describe

the patterns and elements used in the prompt tem-

plate of the GenArch method (see: Wolff de (2024)

KPMG Manager3, personal communication, April 2024

for more details). The iterative nature and suitability

for smaller decisions within BA products is assumed

to make GenArch well-suited for an agile architec-

ture. The method takes the following principles of

the agile manifesto into account: prioritizing contin-

uous delivery and welcoming changing requirements

Alliance (2001). The other principles are out of scope

of the method due to their speciﬁc nature or focus on

software. Recall that besides the ballpark view of the

GenArch method (see: ﬁgure 3), a technical version

is designed to give more detail about the functioning

of the GenArch method. Appendix A presents the

technical version of the GenArch method. Wolff de

(2024) exhibits corresponding activity and concept ta-

bles in which the activities and concepts as part of the

technical version are explained in detail.

4.1 Iteration Alpha and Beta

The ﬁrst step of iteration Alpha is the problem inves-

tigation of which the literature review of section 3

is the primary part. In addition to the literature re-

view, a baseline BA product design process is cre-

ated. The second step is the design of the ﬁrst ver-

sion of a method that serves as guidance for stake-

holders, prescribing the appropriate timing and uti-

lization of ChatGPT in the design of BA products. To

be able to decide how ChatGPT can be best leveraged

in the BA product design process, each of the activ-

ities are classiﬁed as either ‘suited for ChatGPT’ or

‘not suited’. The following indicators below are used

for this classiﬁcation: Needed data to perform the ac-

tivity is (expected to be) publicly available; Execution

of the activity does not depend on highly conﬁden-

tial company data; Successful execution is not highly

dependent on understanding the context; The activ-

ity does not mainly consist of a decision; and: The

activity does not require very recent data or informa-

tion. A prompt template is created to help users of the

GenArch method utilize the potential of GenAI. As a

last step of iteration Alpha, this initial version of the

GenArch method is validated by means of interview-

ing experts.

The ﬁrst step of iteration Beta is to analyze the

qualitative interview data that is produced as part of

iteration Alpha to identify improvement possibilities.

The outcomes of this analysis are meant to provide

reasoning for the implemented changes. Based on the

results of this analysis, the GenArch method was re-

designed. The inclusion of a section that outlines the

recommended use of ChatGPT is the most prominent

modiﬁcation. A validation of the redesigned GenArch

method concludes the Beta iteration.

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

714

Figure 3: Ballpark view of the GenArch method version 2.

4.2 Validation Experiment

As the last step of iteration Beta, the ﬁnal version

of the GenArch method is validated using an exper-

iment. Recall that the experiment design is illustrated

in ﬁgure 2. The focus of this validation lies on validat-

ing if participants perceive a segment of the method

to be more efﬁcient. Wolff de (2024) includes a de-

tailed explanation of the experiment, the workshop it

is incorporated into and reasoning for its construction.

The case of a hospital is taken as the basis through-

out the entire workshop. In round one, subjects de-

sign the capabilities of the current state of the hospi-

tal. In round two, subjects designed a set of archi-

tecture principles for the target state of the hospital

and in round three, subjects designed new capabilities

for this target state. The choices for this capability

and activity are based on the input of the interviewees

from the Alpha validation stage.

Seven subjects participated in the workshop that

was held on 7 June 2024 at the KPMG ofﬁce in Am-

stelveen, the Netherlands. They used the free version

of ChatGPT at that point in time, ChatGPT 3.5, in

an empty conversation. One participant had to leave

early and only participated in rounds one and two.

Wolff de (2024) includes a template of the informed

consent form that all subjects were required to sign

before the experiment. The answers to the surveys are

taken as the results of this validation to be able to draw

conclusions about the perceived efﬁcacy. Wolff de

(2024) includes the full set of answers to the surveys.

Section 5 discusses and analyzes these results.

5 ANALYSIS OF THE

EXPERIMENT RESULTS

This section concentrates on analyzing the results of

the performed experiment. Wolff de (2024) includes

Utilizing ChatGPT as a Virtual Team Member in a Digital Transformation Consultancy Team

715

full details of the results for each of the three rounds

of the workshop. This section focuses on analyzing

these results to be able to answer SRQ 5: To what ex-

tent does the method incorporating ChatGPT improve

the efﬁciency of the target Business Architecture de-

sign process? The experiment evaluated a segment of

the GenArch method to be able to draw conclusions

about the efﬁciency of the overall method. The ca-

pabilities segment was evaluated as the interviewees

from the validation stage in iteration Alpha expected

this segment to be very suitable for the use of the

GenArch method. For the measurement of efﬁciency,

we adopt the perceived efﬁcacy of the method evalua-

tion model as our measure (Section 2.2.2). This mea-

sure consists of three sub measures which are taken

as the dependent variables in the experiment. These

are the perceived ease of use, the perceived usefulness

and the perceived intention to use. These are adopted

from the method evaluation model as well Moody

(2003). In the remainder of this chapter, each of these

measures are discussed individually to be able to draw

conclusions about the perceived efﬁcacy.

5.1 Perceived Ease of Use

The ﬁrst variable that is discussed is the perceived

ease of use. This refers to the expected required ef-

fort to learn and use the method Abrah

ao et al. (2009).

In round one of the experiment, subjects use the cur-

rent approach, referred to as common sense, to ful-

ﬁll the task. They are already familiar with this ap-

proach meaning that a comparison for the ease of use

seems unnecessary. A majority of the comments of

subjects were positive towards the ease of use of the

GenArch method. The most relevant comments that

were made are categorized and listed below. These

comments were made by multiple subjects. The fol-

lowing are positive comments on ease of use: “It was

very helpful with a clear explanation”; “The method

had some good questions and checks”; and: “I had

guardrails on how to do it, and how to use it”. There

was also a negative comment on the ease of use:

“Helpful but not fully convenient”. All other com-

ments sound positive and indicate a high level of per-

ceived ease of use. The last remark was made be-

cause of the longer startup time that was experienced

by some subjects. To quantify the results, subjects of

the experiment were asked to specify how much they

agreed with certain statements in round three. We as-

sign range 1 - 5 to the range ‘strongly disagree’ to

‘strongly agree’. We take the average values for the

statements as an indication of how true that statement

is. This means that the higher the value, the more true

the statement. Two statements relate to the perceived

ease of use. The average values for these statements

are as follows: 4.00 - It was clear how to utilize Chat-

GPT with the GenArch method for this task; and 4.00 -

It was easy to use ChatGPT with the GenArch method

for this task. Both of the values are high, meaning

that based on these values and the comments, it can

be assumed that the segment of the GenArch method

has a good perceived ease of use. Due to multiple re-

marks about the longer startup time and large amount

of information needed for the prompts, it is expected

that the GenArch method is suited for larger and more

complex cases as such cases need to be analyzed in

detail. Thereby, appropriate training in the use of the

GenArch method is assumed to further increase the

perceived ease of use.

5.2 Perceived Usefulness

Second, the perceived usefulness is discussed. This

corresponds to the expected degree to which the

method will achieve intended objectives Abrah

et al. (2009). Subjects mostly perceived the ﬁrst

round as achievable in the given time. However,

most did not seem completely satisﬁed with the re-

sults they produced as they described them as quite

high level with somewhat short descriptions. As

this was the round in which they applied the cur-

rent approaches, other tools like search machines and

reusing old KPMG material were allowed. One sub-

ject for example mentioned to have used old deliv-

erables of KPMG for a similar case. According to

him, having such a reference made the assignment

achievable. Another subject mentioned the assign-

ment to be doable but with moderate results. He ap-

pears to be of the opinion that it is possible to ﬁn-

ish the exercise in the given time if moderate results

are accepted. To summarize, the collective opinion

about the usefulness of the application of their cur-

rent own approaches appears to be that producing a

high-level answer to the assignment was achievable

in the given time. However, most subjects appeared

not satisﬁed with their results in terms of level of de-

tail. This seems logical, as participants were given a

short time frame and restricted to current approaches.

Creating capabilities tailored to the speciﬁc context in

a detailed manner was anticipated to be challenging.

The results include certain positive comments

about the perceived usefulness of the GenArch

method. The subjects perceive the method as use-

ful and stipulate the high quality output that GenArch

manages to produce. On ﬁrst glance, these results

seem more positive compared to the results of round

one. The most relevant comments that were made

about the perceived usefulness are categorized and

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

716

listed below. These comments were made by multi-

ple subjects. Quality output: “Our experience was,

in short, very good. The answers we needed to give

did not give room to work through it quickly which

resulted in more quality input”; “It was very help-

ful. This resulted in clearer results”; “GenArch en-

hanced quality”; “The answers were clear and quite

detailed”; “I prefer the GenArch method over the cur-

rent approach. The answers were more detailed be-

cause you are more speciﬁc in what you want”. Com-

plete output: “GenArch resulted in more complete

answers”; “It gives more creative answers and makes

them more complete”. Useful GenArch elements:

“The method had some good questions and checks.

Asking ChatGPT to ask the user questions before giv-

ing an answer was helpful”.

All comments seem positive and indicate a high

level of perceived usefulness. Subjects seem to agree

on the fact that GenArch provokes clear, detailed and

complete answers which are useful for the given task.

To again quantify the results, a similar range as for

the perceived ease of use (Section 5.1) is given to

the statements relating to the perceived usefulness.

Three statements relate to the perceived usefulness

and their average values are as follows: 4.17 - Chat-

GPT was helpful for this task; 4.17 - The results that

ChatGPT provided were useful; 3.83 - ChatGPT with

the GenArch method made my answer for the exer-

cise more complete. The ﬁrst two values are very

high. The third is well above neutrality but a bit lower.

This means that we can assume, based on these values

and the comments, that the segment of the GenArch

method has a very good perceived usefulness. How-

ever, designers cannot fully depend on the complete-

ness of the results of ChatGPT. It is expected that this

slightly lower completeness of ChatGPT is caused by

the speciﬁc domain or situational knowledge that is

required. Some subjects explained that designing a

complete prompt that included all information was a

challenge in the given time. It is anticipated that this

completeness increases when users have more time

to design their prompt. However, adding speciﬁc do-

main and situational knowledge to the result of Chat-

GPT by a human will always be necessary to deliver

a complete answer. It is expected that this is the cause

for the other values not scoring even higher as well.

ChatGPT is not able to deliver results that are 100%

correct due to its inner workings. Consequently, re-

sults always need to be reﬁned, updated, judged or be

put in context by a human user.

5.3 Intention to Use

As the last of the three sub variables, the intention to

use is discussed. The intention to use refers to the

extent to which a person intends to use a particular

method Abrah

ao et al. (2009). Talking about the ap-

plication of the current approach, which subjects use

in round one, is unnecessary as this approach is cur-

rently already in use. Hence, this section does not

make a comparison between round one and round

three. In the results of round three, parts of the in-

tention to use of the subjects are already discussed.

Although some remarks are already discussed in this

particular section, the most relevant comments that

were made about the intention to use are categorized

and mentioned hereafter. These comments were made

by multiple subjects. There is one comment on more

training needed to prefer the GenArch method:

“For now I prefer my own way, however I do see that

the GenArch method results in higher quality output,

with more experience on it I would use GenArch”.

There is also one comment on the preference of free

ChatGPT use: “GenArch is a nice reference but I

do not see it as something I would go through every

time”. Two comments are made on the preference

of the GenArch method: “I prefer the third round,

because it helped in a speciﬁc scenario, with a given

prompt. So there was not much thinking required from

me”; and: “I prefer the GenArch method. The an-

swers were more detailed because you are more spe-

ciﬁc about your wishes”.

The comments seem a bit mixed. This is the ﬁrst

time the subjects use the GenArch method, which re-

quires them to think more extensively rather than al-

lowing them to start immediately. Hence, these re-

sults seem rational. As mentioned, the ﬁrst remark

could be solved by providing users with a more ex-

tensive explanation with examples or a small prac-

tice round, and the second remark could be catered

for by creating different scenarios for cases that dif-

fer in size. This calls for further investigation in fu-

ture research (Section 7). Subjects do seem to agree

on the potential of GenArch. As was done in pre-

vious sections (Sections 5.1 and 5.2), a range is as-

signed to the statement relating to the intention to use.

This statement has an average value of 3.83 - I intend

to use ChatGPT with the GenArch method for simi-

lar tasks in my work. This value gravitates towards

having the intention to use GenArch. However, this

value does not convincingly give an indication that

the subjects will immediately start using GenArch. To

achieve this, additional training regarding the use of

the GenArch method and more information about its

corresponding beneﬁts is expected to be needed. Ac-

Utilizing ChatGPT as a Virtual Team Member in a Digital Transformation Consultancy Team

717

cordingly, the adoption of GenArch is a subject for

future research (see: Section 7).

5.4 Perceived Efﬁcacy

The aforementioned sub variables all come together

in the perceived efﬁcacy (Section 2.2.2), which is

taken as the main indicator for efﬁciency. The previ-

ous sections of show that the segment of the GenArch

method has a moderately high to high level for all of

the three sub variables. Accordingly, we derive that

the GenArch method has at least a moderately high

level of perceived efﬁcacy. This is based on the com-

ments in the surveys as well as the assigned values

to the rated statements. In the survey of round three,

subjects rate an additional statement regarding the ef-

ﬁciency of the GenArch method. If we apply the same

range as in the previous rounds, the average value is

3.67. Similarly to the sub variables, this value can be

described as moderately high to high. Accordingly,

we conclude that the GenArch method has at least a

moderately high level of perceived efﬁcacy. This is

explained in greater detail in section 7.

5.5 GenArch versus Free Use of

ChatGPT

The main goal of this analysis was to compare the

current approach with the segment of the GenArch

method (Section 2.2). As an additional effort, the seg-

ment of the GenArch method is also compared with

the free use of ChatGPT. The aim of this compari-

son is to assess if practitioners could already use this

method as-is. We reuse the range that was applied in

the previous sections to be able to compare the quan-

tiﬁed results. Figure 4 uses these assigned values to

visualize the results. Figure 4 shows that currently,

based on the statements in the survey, subjects view

their own manner as slightly better in terms of useful-

ness, ease of use and intention to use. Certain subjects

do indicate that they prefer the GenArch method (Sec-

tions 5.2 and 5.3), but on average this seems not to be

the case. Two subjects did mention that they preferred

their own way of working because of the extensive-

ness of GenArch and their lack of experience with it.

It is believed that these results are partly caused

by the following three external elements. First, the

workshop consisted of short tasks in which using a

known approach is more efﬁcient. Therefore, this

could have boosted the results of round two as sub-

jects could use their own approach of using ChatGPT.

Second, the case was kept simple and common to

make sure that subjects could quickly understand the

task. This also implies that ChatGPT has a lot of rele-

vant data, potentially causing that simple GenAI inter-

action was enough to provoke useful answers. Mul-

tiple subjects mentioned the extensiveness of the in-

put prompt in the third round, implying that they used

simpler prompts in round two. This again could have

boosted the variables in round two. Third, subjects

used GenArch for the ﬁrst time while the majority in-

dicated to have used ChatGPT before. It is plausi-

ble that this familiarity caused a small bias towards

the free use of ChatGPT. Besides these partial possi-

ble causes, the results show that the GenArch method

is valuable but could still improve to further increase

the perceived efﬁcacy of its users. Extended expla-

nations including examples or small practice rounds

and different scenarios depending on the case at hand

are identiﬁed as areas in which the GenArch method

could possibly improve.

6 DISCUSSION

The design and validation of GenArch in an iterative

manner to ensure that the method is rigorously tested

and reﬁned is considered to be one of the strengths

of this research. Moreover, the execution of the val-

idation experiment in collaboration with KPMG al-

lows for practical insights and applicability in the real

world. This increases relevance and impact of the re-

search ﬁndings. During both validation phases, the

GenArch method was evaluated based on criteria as

they were experienced by participants, with the aim to

enhance acceptance and adoption of the method. It is

noteworthy to mention that before conducting the ac-

tual interviews and experimental procedures we prac-

ticed these to be as prepared as possible for the ac-

tual interviews and the experiment. Beside strengths,

this research also contains limitations. First, the sole

focus is on ChatGPT which may have limited the

generalizability. Other LLM tools could offer differ-

ent performance characteristics that are not explored.

For example, the indicators used to classify activities

(see: Section 4.1) could differ for alternative LLM

tools. The indicator relating to the fact that ChatGPT

is unable to take recent events into account would dif-

fer as alternative LLM tools are able to do this Lau

(2023). Furthermore, recommended use of such an

LLM tool needs to be investigated and testing the dif-

ference in performance is essential. Performing a sim-

ilar study with a different LLM tool could be a sub-

ject for future research (Section 7). Second, the ex-

pert interviews relied on subjective opinions of the

participants. Their perspectives might not have en-

compassed the wide variety of opinions which might

have affected the validation. To minimize the effects

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

718

Figure 4: Comparison of experiment rounds two and three.

of this limitation, the interviews were conducted with

different types of experts. Also, the execution of the

validation experiment within KPMG might not have

captured challenges in other organizations or indus-

tries. To mitigate this, a toy problem was used and

research goals were not communicated during the ex-

periment. Two iterations were performed, using six

interviewees, seven experiment-subjects and three ex-

periment rounds. Larger sample sizes would have po-

tentially made the results more representative. For ex-

ample, conducting additional experiments that focus

on other activities and BA products could have pro-

duced additional results. The validation experiment,

which corresponds to the second validation, produced

positive results concerning perceived efﬁcacy. Nev-

ertheless, it also highlighted areas for potential im-

provements. These could have consisted of the in-

clusion of additional validation cycles or experiments,

tests with alternative LLM tools to assess the gener-

alizability of the results, the development of differ-

ent scenarios tailored to differing cases, and an ex-

ploration of strategies surrounding the adoption of the

GenArch method.

7 CONCLUSIONS AND FUTURE

RESEARCH

This paper is centered around exploring the poten-

tial of utilizing an LLM tool to support the devel-

opment of target BA products within the broader

context of DT projects. The Main Research Ques-

tion is as follows: ‘Using the example of Chat-

GPT, how can a Large Language Model tool be ef-

fectively integrated into the design process of target

Business Architecture products to mitigate risks, op-

timize prompt impact, and improve efﬁciency?’ The

resulting GenArch method aims at guiding stakehold-

ers, prescribing the appropriate timing and utilization

of ChatGPT throughout the target BA product de-

sign process. The GenArch method was developed

through an iterative process comprising two design

cycles including expert interviews and an experiment

as validation (see: Section 2.2). In the experiment, a

segment of the GenArch method was evaluated to as-

sess the perceived ease of use, perceived usefulness,

and intention to use of the GenArch method in its en-

tirety. These factors were utilized to determine the

perceived efﬁcacy which was taken as the indicator

for the efﬁciency of the GenArch method (see: Sec-

tion 2.2.2). The results of the experiment indicate that

the GenArch method scores moderately high to high

Utilizing ChatGPT as a Virtual Team Member in a Digital Transformation Consultancy Team

719

on the perceived ease of use, perceived usefulness,

and intention to use (see: Section 5). More speciﬁ-

cally, perceived usefulness was rated the highest (see:

Section 5.2), while the perceived ease of use and in-

tention to use, though slightly lower, still produced

positive results (see: Sections 5.1 and 5.3). From

these ﬁndings, we derive that the GenArch method

has at least a moderately high level of perceived ef-

ﬁcacy (see: Section 5.4). This suggests that the

GenArch method effectively guides the integration of

ChatGPT into the target BA product design process,

while enhancing the efﬁciency of this process. To an-

swer the Main Research Question, we conclude that

the GenArch method provides a structured approach

for the timely and effective utilization of ChatGPT

in the design process of target BA products. The

GenArch method positively inﬂuences the efﬁciency

of this design process. In summary, this research

underscores the potential of LLM tools to enhance

BA design. The ﬁndings suggest that the GenArch

method can contribute to the success of DT projects

by providing an approach that integrates ChatGPT.

Future research could include additional experi-

ments focusing on other activities or BA products that

are part of the GenArch method. This would provide a

more comprehensive understanding of how GenArch

can be utilized and optimized. Some of these addi-

tional experiments could also be executed with the

use of a pre-trained language model to observe if

results improve. Exploring strategies for adoption

of the GenArch method within organizations could

be another subject for future research. This could

positively inﬂuence the practical applicability of the

method and support a more effective integration of

GenArch in different organizational contexts. Com-

parative studies could be conducted to determine if

and how the GenArch method needs to be adapted

when utilized with an alternative LLM tool. This

could also help identify the most effective LLM tools

for speciﬁc BA design tasks or contexts. Lastly, other

potential application areas for LLM tools within DT

could be explored.

ACKNOWLEDGEMENTS

We would like to sincerely thank the Digital Trans-

formation team of KPMG Netherlands for their sup-

port during the conduction of this research, including

the provisioning of access to company documenta-

tion, domain experts, and the support provided during

the experiment.

REFERENCES

Abrah

ao, S., Insfran, E., Cars

ı, J. A., Genero, M., and Piat-

tini, M. (2009). Evaluating the ability of novice ana-

lysts to understand requirements models. In Interna-

tional conference on quality software, pages 290–295.

IEEE.

Aguinis, H., Beltran, J. R., and Cope, A. (2024). How to

use generative AI as a HRM assistant. Organizational

Dynamics, 53(1):101029.

Alliance, T. A. (2001). Principles behind the Agile Mani-

festo. https://agilemanifesto.org/principles.html. Ac-

cessed: 13-05-2024.

Bains, C. (2023). AI prompt engineering: learn how not

to ask a chatbot a silly question. https://tinyurl.com/

4jtax73a. Accessed: 14-02-2024.

Bera, P., Wautelet, Y., and Poels, G. (2023). On the use

of ChatGPT to support agile software development.

In International Conference on Advanced Information

Systems Engineering, pages 1–9. CEUR.

Bouwman, H., Houtum van, H., Janssen, M., and Versteeg,

G. (2011). Business architectures in the public sector:

experiences from practice. Communications of the As-

sociation for Information Systems, 29:411–426.

Brinkkemper, S. (1996). Method engineering: engineer-

ing of information systems development methods and

tools. Information and Software Technology, 38:275–

280.

Chuma, E. L. and Oliveira, G. G. D. (2023). Generative

AI for business decision-making: a case of ChatGPT.

Management Science and Business Decisions, 3(1):5–

11.

Ebert, C. and Louridas, P. (2023). Generative AI for soft-

ware practitioners. IEEE Software, 40(4):30–38.

Eliot, L. (2023). Upping your prompt engineering super-

powers via target-your-response techniques when us-

ing generative AI. https://tinyurl.com/466p4h37. Ac-

cessed: 14-02-2024.

Ferrini, M. (2023). Generative AI: just a zip ﬁle of the

web? https://assets.kpmg.com/content/dam/kpmg/ch/

pdf/generative-ai.pdf. Accessed: 26-02-2024.

Fitzgerald, M., Kruschwitz, N., Bonnet, D., and Welch,

M. (2013). Embracing digital technology: A new

strategic imperative. MIT Sloan Management Review,

55(2):1–12.

Greefhorst, D. and Proper, H. A. (2011). Architecture Prin-

ciples, volume 4. Springer.

Group, T. O. (2022). Phase B: Business Architecture.

https://pubs.opengroup.org/architecture/togaf92-doc/

arch/index.html. Accessed: 01-02-2024.

Heale, R. and Twycross, A. (2018). What is a case study?

Evidence-Based Nursing, 21(1):7–8.

Hohpe, G. (2017). The Architect Elevator — Visiting

the upper ﬂoors. https://martinfowler.com/articles/

architect-elevator.html. Accessed: 16-04-2024.

Korzynski, P., Mazurek, G., Krzypkowska, P., and Kurasin-

ski, A. (2023). Artiﬁcial intelligence prompt en-

gineering as a new digital competence: Analy-

sis of generative AI technologies such as Chat-

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

720

GPT. Entrepreneurial Business and Economics Re-

view, 11(3):25–37.

KPMG (2023). Generative AI models — the risks and

potential rewards in business. https://tinyurl.com/

3vv4ap68. Accessed: 01-05-2024.

KPMG (2024a). Digital Architecture Talkbook.

KPMG (2024b). Visual identity overview. https:

//assets.kpmg.com/content/dam/kpmg/dk/pdf/

dk-2020/12/alexander.pdf. Accessed: 08-05-2024.

Lau, J. (2023). Bing Chat vs. ChatGPT: Which AI

chatbot should you use? https://zapier.com/blog/

chatgpt-vs-bing-chat. Accessed: 15-01-2024.

LeanIX (2024). Business Architecture. https://www.leanix.

net/en/wiki/ea/business-architecture. Accessed: 26-

03-2024.

Liu, Y., Han, T., Ma, S., Zhang, J., Yang, Y., Tian, J., He,

H., Li, A., He, M., Liu, Z., Wu, Z., Zhao, L., Zhu,

D., Li, X., Qiang, N., Shen, D., Liu, T., and Ge, B.

(2023). Summary of ChatGPT-related research and

perspective towards the future of large language mod-

els. Meta-Radiology, 1(2):100017.

Magaldi, D. and Berler, M. (2020). Semi-structured inter-

views, pages 293–298. Springer.

Moody, D. L. (2003). The Method Evaluation Model:

A Theoretical Model for Validating Information Sys-

tems Design Methods. In European Conference on

Information Systems (ECIS).

OpenAI (2023). Introducing ChatGPT Enterprise. https:

//openai.com/index/introducing-chatgpt-enterprise.

Accessed: 11-07-2024.

neiro-Mart

ın, A., Garc

ıa-Mateo, C., Doc

ıo-Fern

andez,

L., and del Carmen L

opez-P

erez, M. (2023). Eth-

ical challenges in the development of virtual assis-

tants powered by large language models. Electronics,

12(14):3170.

Qiu, J., Li, L., Sun, J., Peng, J., Shi, P., Zhang, R., Dong, Y.,

Lam, K., Lo, F. P. W., Xiao, B., Yuan, W., Wang, N.,

Xu, D., and Lo, B. (2023). Large AI models in health

informatics: Applications, challenges, and the future.

IEEE Journal of Biomedical and Health Informatics,

27(12):6074–6087.

Shanahan, M. (2024). Talking about large language models.

Communications of the ACM, 67(2):68–79.

Simon, D. and Schmidt, C. (2015). Introduction: Demysti-

fying Business Architecture, pages 1–17. Springer.

Singh, B., Garg, S. K., and Sharma, S. K. (2011). Value

stream mapping: Literature review and implications

for Indian industry. International Journal of Advanced

Manufacturing Technology, 53(5):799–809.

Smith, V. (2024). Business architecture: How to build a

successful model. https://tinyurl.com/3xmn4yha. Ac-

cessed: 26-03-2024.

Tacheva, J. and Ramasubramanian, S. (2023). AI Empire:

Unraveling the interlocking systems of oppression in

generative AI’s global order. Big Data and Society,

10(2).

Wang, X., Attal, M. I., Raﬁq, U., and Hubner-Benz, S.

(2024). Turning Large Language Models into AI As-

sistants for Startups Using Prompt Patterns, pages

192–200. Springer.

Weerd van de, I. and Brinkkemper, S. (2008). Meta-

modeling for situational analysis and design methods,

pages 35–54. IGI Global.

Westerman, G., Bonnet, D., and McAfee, A. (2014). Lead-

ing digital : turning technology into business trans-

formation. Harvard Business Press.

Wieringa, R. J. (2014). Design Science Methodology

for Information Systems and Software Engineering.

Springer, 1st edition.

Willey, L., White, B. J., and Deale, C. S. (2023). Teaching

AI in the college course: introducing the AI prompt

development life cycle (PDLC). Issues in Information

Systems, 24(2):123–138.

Wißotzki, M. and Sandkuhl, K. (2015). Elements and char-

acteristics of enterprise architecture capabilities. In

International Conferences on Perspectives in Business

Informatics Research, volume 229 of LNBIP, pages

82–96. Springer Verlag.

Wolff de, T. (2024). The use of ChatGPT as a virtual

team-member in a digital transformation consultancy

team. Master’s thesis, Utrecht University, Utrecht, the

Netherlands.

Zant, T. V. D., Kouw, M., and Schomaker, L. (2013). Gen-

erative AI, volume 5, pages 107–120. Springer, 1st

edition.

APPENDIX A

Click on this link to open Appendix A.

Utilizing ChatGPT as a Virtual Team Member in a Digital Transformation Consultancy Team

721