Towards Trustworthy AI in Demand Planning: Deﬁning Explainability

for Supply Chain Management

Ruiqi Zhu

, Cecilie Christensen

, Bahram Zarrin

2 a

, Per Bækgaard

1 b

and Tommy Sonne Alstrøm

1 c

Technical University of Denmark, Kongens Lyngby, Denmark

Microsoft Reserach Hub, Kongens Lyngby, Denmark

Keywords:

Explainable AI, Supply Chain Management, Demand Planning, User-Centric Explainability.

Abstract:

Artiﬁcial intelligence is increasingly essential in supply chain management, where machine learning models

improve demand forecasting accuracy. However, as AI usage expands, so does the complexity and opacity

of predictive models. Given the signiﬁcant impact on operations, it is crucial for demand planners to trust

these forecasts and the decisions derived from them, highlighting the need for explainability. This paper

reviews prominent deﬁnitions of explainability in AI and proposes a tailored deﬁnition of explainability for

supply chain management. By using a user-centric approach, we address the practical needs of deﬁnitions of

explainability for non-technical users. This domain-speciﬁc deﬁnition aims to support the future development

of interpretable AI models that enhance user trust and usability in demand planning tools.

1 INTRODUCTION

Supply chain management (SCM) is a central process

in businesses around the world. It contributes to ful-

ﬁlling customer goals, gaining competitive advantage,

and minimizing the loss of resources in the production

cycle. As a result of the prominent beneﬁts, there is a

large market for SCM solutions designed for compa-

nies to manage their supply chain, some of the lead-

ing solutions being SAP Supply Chain Management,

Blue Yonder, and Dynamics365 Supply Chain Man-

agement (D365 SCM).

A key process within SCM is demand planning

(DP), which includes forecasting the future demand

for products. DP enables companies to foresee an in-

crease or decrease in the sales of their products, mak-

ing it possible for them to plan downstream processes

accordingly (IBM, 2017). An important element of

demand planning is demand forecasting, which is an

estimation of future demand, e.g. using time series

data and mathematical computation to gain insights

from the data. Traditionally, demand forecasting has

been carried out using statistical methods such as ETS

(Hyndman and Khandakar, 2008) and ARIMA (Box

https://orcid.org/0000-0001-8790-9396

https://orcid.org/0000-0002-6720-1128

https://orcid.org/0000-0003-0941-3146

and Jenkins, 1970). However, advances in machine

learning (ML) and artiﬁcial intelligence (AI) are grad-

ually replacing these statistical methods (altexsoft,

2022). ML forecasting models offer the potential of

noticeably better predictions compared to statistical

models, which is a big advantage in demand planning

(GeeksforGeeks, 2024). However, with the increase

in ML and AI models, we also see an increase in com-

plexity, and the issue of the so-called ”black-box” is

claiming its space in the ﬁeld of demand planning.

Perceiving ML models as a black-box is currently a

hot topic as their predictions get more difﬁcult for hu-

mans to interpret, given their increased complexity.

This is critical in ﬁelds, where the prediction is used

to draw important conclusions (e.g. the medical ﬁeld

(Adadi and Berrada, 2018)) or make signiﬁcant deci-

sions (e.g. DP).

While not easy to solve, the black-box issue has

set the foundation for a new ﬁeld within ML and AI,

namely explainability. Explainability in AI is a topic

that has gained a lot of interest in recent years because

of its ability to open up the black box of ML model

(Adadi and Berrada, 2018).

The concept of explainability dates all the way

back to the 1980s where it was ﬁrst mentioned (Moore

and Swartout, 1988). Later, in 2004, the term Ex-

plainable AI (XAI) was introduced (Van Lent et al.,

2004). However, it is not until recent years that

Zhu, R., Christensen, C., Zarrin, B., Bækgaard, P. and Alstrøm, T. S.

Towards Trustworthy AI in Demand Planning: Deﬁning Explainability for Supply Chain Management.

DOI: 10.5220/0013315900003890

In Proceedings of the 17th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2025) - Volume 3, pages 1245-1253

ISBN: 978-989-758-737-5; ISSN: 2184-433X

1245

the concepts gained traction, naturally following the

increase in AI complexity and reliance (Adadi and

Berrada, 2018). Despite its rising popularity, a com-

mon deﬁnition of explainability has not found con-

sensus. Instead, researchers of different ﬁelds give

different meanings to the term, often taking either a

computer-centered approach focusing on the correct-

ness and completeness of the explanation or a human-

centered approach focusing on how the explanation

resonates with end users. Both approaches require

the use of explainability methods, which are essen-

tial for providing the actual explanation of the ML

model. Over time, a wide range of models for XAI

have been developed to describe the decision-making

process of ML and AI models. Explainable ML mod-

els generally exist in two forms: those that are in-

terpretable by nature, e.g., decision trees, and those

that become interpretable after adding an explainabil-

ity method post-training (Naqvi et al., 2024; Lopes

et al., 2022; Retzlaff et al., 2024). As the amount

of research on explainability has increased, it has be-

come more user- and context-speciﬁc. Research on

explainability for demand planning is still sparse, and

the need for analyzing the users and their needs re-

mains.

As research of user-speciﬁc needs for explainabil-

ity is not traditionally covered by the tools available

in the ﬁeld of XAI, it is beneﬁcial to draw on meth-

ods from UX design. Introducing UX design meth-

ods enables the possibility of analyzing the needs of

the users of the DP applications and, consequently,

design explanations that resonate with them speciﬁ-

cally.

Based on the above, we pose two concrete re-

search questions:

1. How can explainability be deﬁned in the context

of the demand planning domain?

2. Who are the users of the demand planning appli-

cations, and what are their explainability needs?

This paper is structured into the following sec-

tions: Explainability in AI, Identifying Explainability

in DP applications and Conclusion.

In the Explainability in AI Section, we introduce

some of the existing work that has been done in the

ﬁeld of XAI. In the section Identifying Explainabil-

ity in DP applications, we converge towards the end

of the problem space and use the background research

and theory to deﬁne the problem of what explainabil-

ity is for the users of DP applications. Lastly, we sum

up the ﬁndings and refer back to the initial problem

statement in the Conclusion Section.

2 EXPLAINABILITY IN AI

This section outlines the foundation of our research

on deﬁning explainability in the DP domain. We look

into how explainability is currently deﬁned across lit-

erature, how to evaluate an explanation on its explain-

ability, and lastly, explore currently existing methods

for explainability.

2.1 Explainability in ML

So far, no formal deﬁnition of explainability has been

broadly accepted among researchers in the XAI ﬁeld.

One reason for this is that the need and beneﬁts of ex-

plainability vary greatly between different ﬁelds and

users, meaning that a good explanation for one group

of people might not be relevant to another (Suresh

et al., 2021; Mohseni et al., 2020). Despite this, there

have been different attempts at designing frameworks

for how to provide useful explanations. One exam-

ple of this from Vilone et al., who, in their paper No-

tions of explainability and evaluation approaches for

explainable artiﬁcial intelligence (Vilone and Longo,

2021), identify four main factors that constitute a

good explanation. These include a consideration of

who the end-user is, what their goals are, what infor-

mation they should receive, and the language used to

deliver it.

Following this idea of user-dependent explana-

tions, we ﬁnd that explainability is not binary and

should be deﬁned by the degree to which it satis-

ﬁes a set of relevant metrics for speciﬁc targeted

users (Pawlicka et al., 2023; Nauta et al., 2023; Liao

and Varshney, 2022). A wide range of metrics for

explainability have been described in the literature,

making it difﬁcult to navigate and select the relevant

ones. In Table 1, we have collected some of the most

frequently encountered terms during the research on

XAI, with the purpose of showing how often they are

used and whether they are mentioned as being human-

centered or computer-centered.

Explainability is very much dependent on the re-

ceivers, and a lot of research is currently being done

on how users have different needs for explainability,

e.g. as described in the survey A Multidisciplinary

Survey and Framework for Design and Evaluation of

Explainable AI Systems (Mohseni et al., 2020). Here,

the authors explain the distinct goals and needs of

users by grouping them into AI Novices, Data Ex-

perts, and AI Experts. AI Novices are the end-users

who will interact with the ML product. They are

generally assumed to have very little or no knowl-

edge of machine learning and have no need for it

either. The level of explainability for this group is

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

1246

Table 1: Summary of the literature study on explainabil-

ity in ML, including the explainability criteria mentioned in

each paper.

Human-centered Computer-centered

Fairness

Accountability

Understandability

Trustworthiness

Usefulness

Performance

Satisfaction

Fidelity

Interpretability

Lopes et al.

(Lopes et al., 2022)

x x x x x

Mohseni et al.

(Mohseni et al., 2020)

x x x x x x x x x

Hoffman et al.

(Hoffman et al., 2019)

x x x x x

Lim et al.

(Lim et al., 2009)

x x x x

Pawlicka et al.

(Pawlicka et al., 2023)

x x x x x x x x x

Markus et al.

(Markus et al., 2021)

x x x x x x

Nauta et al.

(Nauta et al., 2023)

x x x x x x x

Zhou et al.

(Zhou et al., 2021)

x x x x x x x x

Adadi et al.

(Adadi and Berrada, 2018)

x x x x x x x

Binns et al.

(Binns et al., 2018)

x x x x

Bussone et al.

(Bussone et al., 2015)

x x

Seong et al.

(Seong and Bisantz, 2008)

x x

determined by how useful and satisfying the expla-

nation is to them, as well as how much they trust

it. Data Experts are data scientists or similar who

use the ML product to conduct analyses or research.

They are assumed to be working directly with the ML

model while not necessarily having a deep techni-

cal understanding of how the speciﬁc model works.

Explainability to them is determined by how much

it helps them to perform their tasks, in addition to

how well the model itself is performing. AI Ex-

perts are the developers or engineers working on

the ML model. Their explainability needs are de-

scribed as different from the other two groups, as

their focus is more on debugging and understanding

the model itself. In particular, there is a difference

in the needs and goals of explainability for the dif-

ferent user groups. The AI Novices generally have

human-centered needs, as opposed to the AI Experts,

whose needs are more computer-centered. This dif-

ferentiation between human-centered and computer-

centered metrics is common among researchers and is

described, among others, by Lopes et al. (Lopes et al.,

2022) and Mohseni et al. (Mohseni et al., 2020). They

deﬁne human-centered explainability as the extent to

which an ML system is understandable to humans, as

well as how it affects them when interacting with it.

On the other hand, computer-centered explainability

is about how well the ML system is explaining the ML

model itself, including how accurate the explanation

Table 2: Overview of key explainability terms in ML litera-

ture and their deﬁnitions, with all relevant references listed.

Term References Deﬁnition

Fairness (Deck et al., 2024), (Mohseni

et al., 2020), (Pawlicka et al.,

2023), (Bussone et al., 2015)

Assessing the fairness of an ML

model, particularly in sensitive do-

mains like loan applications.

Accountability (Lepri et al., 2018), (Binns

et al., 2018)

The ability to attribute responsibil-

ity for decisions made by the model.

Understandability (Lopes et al., 2022),

(Mohseni et al., 2020),

(Butz et al., 2022)

The extent to which the XAI system

is understandable to users, facilitat-

ing the prediction of its outputs.

Trustworthiness (Lim et al., 2009), (Pawlicka

et al., 2023), (Vilone and

Longo, 2021), (Adadi and

Berrada, 2018)

Reﬂects the user’s conﬁdence in the

system’s reliability and alignment

with their expectations.

Usefulness (Seong and Bisantz, 2008),

(Lopes et al., 2022),

(Mohseni et al., 2020)

Evaluates the practical value of

the explanations in assisting user

decision-making.

Performance (Lopes et al., 2022),

(Mohseni et al., 2020),

(Lount and Lauzon, 2012),

(Markus et al., 2021)

Concerns user task performance

when interacting with the XAI sys-

tem.

Satisfaction (Gedikli et al., 2014), (Vilone

and Longo, 2021)

Represents user satisfaction with

the provided explanations.

Fidelity (Lopes et al., 2022), (Markus

et al., 2021)

Reﬂects the accuracy of the expla-

nation in representing the model’s

actual behavior.

Interpretability (Bussone et al., 2015), (Lopes

et al., 2022)

Describes the ease with which ex-

planations can be understood by hu-

man users.

is to the truth. As seen in table 1, human-centered and

computer-centered explainability are both umbrella

terms, covering a number of other principles within

explainability. In order to gain a deeper understand-

ing of what meaning these terms carry across litera-

ture, table 2 describes each of the terms.

A human-centered perspective on what explain-

ability involves is the assumption that an explana-

tion is an answer to a question the user might have

when interacting with the system (Liao et al., 2020;

Preece, 2018; Vilone and Longo, 2021). The most

typically asked questions relating to explainability are

why and how (Vilone and Longo, 2021), answering

questions such as why a certain prediction was made,

or how a certain feature impacts the prediction. Ac-

cording to Liao et al., answering the right questions

can constitute a good explanation, but what is con-

sidered right again depends on the person asking it

(Liao et al., 2020). They did a study on the explain-

ability needs of different user groups by conducting

semi-structured interviews with 20 people. Notably,

the user group labeled ’Business Decision Support’

showed strong interest in explanations that enhance

their decision conﬁdence by showing the importance

of attributes as well as explanations that are made in

natural language.

2.2 Explainability Methods

In this section, we will go through different types

of explainability methods and how they relate to the

needs of end-users.

Towards Trustworthy AI in Demand Planning: Deﬁning Explainability for Supply Chain Management

1247

One of the main distinctions in explainability

methods is ante-hoc and post-hoc approaches. When

an explanation is retrieved directly from the model it-

self, e.g. from decision trees, it is said to be ante-hoc.

Conversely, if an explanation is generated after model

training, it is called post-hoc. Post-hoc explanations

require the addition of explainability methods, which

are applied separately from the model itself (Naqvi

et al., 2024; Retzlaff et al., 2024), and can be either

model-speciﬁc or model-agnostic. The distinction be-

tween the two lies in whether the method is speciﬁ-

cally applicable to a given model or is generally ap-

plicable to a range of different ML models. Explain-

ability methods can be further broken down into lo-

cal and global explanations, where local explanations

are used to describe the reasons for a single predic-

tion, while global explanations are used to describe

the overall model (Liao and Varshney, 2022). Ex-

amples of global, post-hoc explainability methods in-

clude Accumulated Local Effects (ALE) plots (Apley

and Zhu, 2020) and Partial Dependence Plots (PDP)

(Friedman, 2001), while examples of local post-hoc

methods include Local Interpretable Model-agnostic

Explanations (LIME) (Singh and Guestrin, 2016) and

SHapley Additive exPlanations (SHAP) (Lundberg

and Lee, 2017).

Each of these explainability methods provides

users with different types of explanations and is suit-

able for different purposes. Liao et al. closed the

gap between algorithmic explainability methods and

the needs of end-users by developing a question bank,

which is a collection of questions that users might ask

in relation to explainability, along with explainability

methods that can be applied to answer these questions

(Liao et al., 2020).

Liao et al. argued that there is no one-ﬁts-all solu-

tion to a good explanation and suggested a collabora-

tive approach, where UX designers and data scientists

work together to identify relevant explainability meth-

ods. For this purpose, they used the question bank

to develop a mapping guidance between user ques-

tions and explainability methods (Liao and Varshney,

2022).

2.3 Evaluating Explainability

There are several methods for evaluating explainabil-

ity. Nauta et al. (Nauta et al., 2023) described how

evaluating explainability is about measuring the de-

gree to which an explanation satisﬁes a set of deﬁned

metrics and that each aspect of the explanation should

be evaluated separately. Pawlicka et al. (Pawlicka

et al., 2023) presented a similar approach by arguing

that an explanation should be evaluated by 1) check-

ing whether explainability is achieved by how well it

fulﬁlls the deﬁned objectives, and 2) comparing ex-

planation methods to identify the most preferred one.

As for the deﬁnition of explainability, methods for

evaluating explainability are also often divided into

human-centered and computer-centered approaches.

The human-centered evaluation methods include hu-

mans in the evaluation process and apply user test-

ing of domain experts or lay people as a way to mea-

sure an explanation (Molnar et al., 2020). Meanwhile,

computer-centered approaches use quantitative met-

rics to evaluate the explanation, e.g. in terms of ﬁ-

delity (see Table 1).

Doshi-Velez et al. distinguished between func-

tionally grounded, application grounded, and hu-

man grounded evaluations (Doshi-Velez and Kim,

2017). The functionally grounded evaluation corre-

sponds to the computer-centered evaluation and does

not include humans. The human-centered evalua-

tion is divided into the application-grounded evalu-

ation and the human-grounded evaluation, where the

application-grounded evaluation is based speciﬁcally

on target users, while the human-grounded evalu-

ation is based on lay users, meaning humans that

are not necessarily domain experts or targeted users.

Human-centered and computer-centered evaluations

each have their merits. Human-centered evaluations

are perceived to be more accurate in determining the

level of explainability (Zhou et al., 2021). How-

ever, they are also time-consuming and can have is-

sues such as bias and inefﬁciency. At the same time,

computer-centered evaluations are ”objective” and re-

quire fewer resources in terms of time but does not

include the user perspective (Pawlicka et al., 2023).

Hoffman et al. (Hoffman et al., 2019) developed a

conceptual model to map out the process of evaluating

explainability in a ML context. A slightly modiﬁed

version of the conceptual model is shown in Figure 1.

A user of an XAI system initially feels trust or mis-

trust in the ML model, and immediately forms a men-

tal model about how the system works. An explana-

tion is then provided to give a greater understanding

of the system, which affects the user’s mental model

and builds trust. Hoffman further argued that a user’s

perceived trust/mistrust in the system greatly affects

how they interact with the system, which in turn af-

fects the performance. Each of these stages of the

conceptual model can be assessed to evaluate the ex-

planation, as a good explanation will provide the user

with trust and understanding and, hence, better perfor-

mance. Different methods for evaluating explainabil-

ity have been suggested. Generally, we distinguish

between quantitative and qualitative evaluation meth-

ods, in addition to subjective and objective evaluation

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

1248

Figure 1: Conceptual model of explainability in ML. Adapted from (Hoffman et al., 2019; Lopes et al., 2022).

Table 3: Summary of most frequently encountered human-

centered evaluation methods for explainability in ML.

Method Type Mentioned In

Likert

Scale

Subjective/

Quantita-

tive

(Bussone et al., 2015),

(Berkovsky et al., 2017),

(Nourani et al., 2019), (Lim

et al., 2009), (Binns et al.,

2018), (Gedikli et al., 2014)

Interview Subjective/

Qualitative

(Gedikli et al., 2014),

(Binns et al., 2018), (Lim

et al., 2009), (Lount and

Lauzon, 2012)

Think

Aloud

Subjective/

Qualitative

(Bussone et al., 2015),

(Binns et al., 2018)

Task

Perfor-

mance

Objective/

Quantita-

tive

(Lim et al., 2009),

(Huysmans et al., 2011),

(Kulesza et al., 2010)

Self-

explanation

Subjective/

Qualitative

(Bussone et al., 2015),

(Cahour and Forzy, 2009)

methods. Subjective evaluations captures the partici-

pants’ own opinions or perceptions, while objective

evaluations measure some deﬁned objectives inde-

pendent of the users’ opinions. Human-centered eval-

uations generally cover all types of evaluations, while

computer-centered evaluations usually apply quanti-

tative and objective methods. An overview of the

evaluation methods is presented in Table 3.

A popular method for evaluating explainability is

the of the Likert scale. This method is used to cap-

ture the subjective opinions of participants and get in-

sights on how they perceive different metrics. Bus-

sone et al. investigated how explainability affects the

trust and reliability in users of Clinical Decision Sup-

port Systems (CDSS), and use a 7-point Likert scale

to evaluate the users’ trust in the system before and

after receiving an explanation (Bussone et al., 2015).

The research by Gedikli et al. is about improving

satisfaction in recommender systems by helping the

user to understand why certain predictions are given

(Gedikli et al., 2014). They follow a similar approach

to evaluation, by asking participants to evaluate trans-

parency and satisfaction on a 7-point Likert scale after

receiving an explanation, and comparing transparency

to satisfaction. Both of these papers use a similar ap-

proach by applying a qualitative evaluation method

to support the quantitative methods. Bussone et al.

applied the ”think aloud” method for evaluation, by

asking users to share their thoughts during the per-

formance of a given task, and then used post-task in-

terviews to gather additional information. Gedikli et

al. also perform post-task interviews with the pur-

pose of validating the results of the quantitative ap-

proach. The papers by Lim et al. (Lim et al., 2009)

and Huysmans et al. (Huysmans et al., 2011) both

objectively evaluate their explanations using task per-

formance. Lim et al. researched the effectiveness of

different types of explanations (why and why not) in

context-aware intelligent systems. They use task per-

formance as one of the measures for evaluating the

explanations, and measure it in terms of completion

time, Fill-in-the-Blanks test answers and answer cor-

rectness. The answer is rated into one of four groups,

depending on the actual correctness and the level of

detail the participant was able to convey.

Huysmans et al. evaluated the explainability

(which they refer to as ’comprehensibility’) of deci-

sion tables, decision trees and rule based predictive

models. Participants were asked to answer a list of

yes/no questions, and rate their conﬁdence on a Lik-

ert scale. The authors then evaluated the explanations

based on the perceived conﬁdence of the participants

and their task performance, which is determined by

the accuracy (percentage of correct answers) and the

task completion time.

Towards Trustworthy AI in Demand Planning: Deﬁning Explainability for Supply Chain Management

1249

Figure 2: Final afﬁnity diagram used to structure the SME Interview into concrete user needs.

3 IDENTIFYING

EXPLAINABILITY IN DP

APPLICATIONS

Eplainability is highly dependent on the user and their

speciﬁc needs, and for this reason, it is important for

a good implementation of explainability, to research

the speciﬁc needs of users of DP applications. The

research combines different methods within UX De-

sign, the purpose being to gather information about

the users, and structure it into concrete needs and re-

quirements for explainability. By the end of this sec-

tion, we will have a clear deﬁnition of the explainabil-

ity needs of users, what explainability is in context

of DP applications, and which explainability methods

can be applied to provide that.

3.1 Identifying User Needs

3.1.1 SME Interview

To gather information about the users of DP applica-

tions, we decided to do interviews with demand plan-

ners. The purpose of the interview is to gather infor-

mation about the users, including how they use DP

applications, what they use it for, and which require-

ments they have for current and future use. We did

unstructured interviews since this interview type al-

lowed the conversation to be more dynamic, and for

emerging questions to be asked.

3.1.2 Afﬁnity Diagramming

The interviews provided a large amount of unstruc-

tured data, which needed to be organized to extract

relevant information. For this purpose, we chose to

apply afﬁnity diagramming, which is used to orga-

nize the data by translating the raw qualitative data

into a concrete mapping of the users and their needs.

First, the information from the raw interview data was

mapped out on post-its, disregarding their perceived

relevance. Next, the post-its were grouped into sub-

jects, each given a unique color. Lastly, labels were

assigned to the groups in the diagram in order to de-

ﬁne each of them more speciﬁcally. The result is seen

in Figure 2, and provides a clear overview of the sub-

jects and needs that were discussed during the inter-

view with demand planners. We see the seven sub-

groups of user information identiﬁed through the in-

terviews, and that the overall goal of users is to op-

timize the SC process and make smarter decisions

based on the demand forecast. The users want to

have conﬁdence in the decisions they are making, and

feel in control of the forecasting. They also want

to trust the model predictions, the overall forecasting

system, get a better understanding of the predictions,

and some users even want the option to control the

forecasting model itself.

There is a need for users to have an understand-

ing of why certain predictions are made, as well as

what happens if some of their features change. This

includes e.g. the option to see correlation between

different features and the demand, and being able to

experiment with the feature values to learn how they

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

1250

drive demand. Lastly, there is also a wish to learn how

demand can be increased in the context of the avail-

able features.

It is clear that users have some ideas in mind about

drivers of the demand of their products. They might

have a feeling about something having an effect, with-

out being able to check the relevance of that feeling.

So, they use these ideas and feelings to make up hy-

potheses which they seek to conﬁrm or reject.

Based on the ﬁndings from the afﬁnity diagram,

we have chosen to categorize the explainability needs

of the users of DP applications into the following ob-

jectives:

O1 Make better decisions

O2 Trust the predictions

O3 Understand why certain predictions were made

O4 Understand what happens to demand if the fea-

tures change values

O5 Increase demand using available features

O6 Test hypotheses

O7 Be in control of forecasting model

The topics found in the afﬁnity diagram, are usually

prioritized to select the ones to move forward with.

This means, that not all the ﬁndings above will neces-

sarily be fulﬁlled within the scope of this paper.

3.2 User Story Mapping

After grouping the ﬁndings from the interviews in the

afﬁnity diagram, we have a list of objectives. As men-

tioned, we do not include all of these objectives go-

ing forward, and choose a subset. For this purpose,

we apply User Story Mapping (USM) to prioritize

selected objectives in terms of goals, activities, and

tasks. The tasks are outlined as the necessary steps

for the user to complete an activity, and are subject

to change later in the design process. The USM is

shown in the Appendix, where each of the objectives

from section 3.1.2 are presented as either a goal or an

activity.

From the USM, we found that the overall goals

users are trying to achieve through explainability are

O1 and O2. Additionally, we found that O6, can be

obtained through O3, O4 and O5.

Based on a prioritization of the USM and in col-

laboration with demand planners, we decided to move

forward with O3, O4 and O5. The overall goals of

building trust and making better, more conﬁdent deci-

sions, are natural derivations from good explanations

as also described in section 2.3, and the users will be

able to test their hypotheses on which features drive

demand and why certain predictions were made.

3.3 Deﬁning Explainability in DP

Applications

After getting an understanding of the needs and goals

of the end-users, we moved on to deﬁning explain-

ability in DP applications. currently, we know who

the end-users are, their goals, which questions they

want answers to and their level of technical expertise.

This means that we now have all the essential compo-

nents to structure the explanations. However, we still

need to establish a clear deﬁnition of what explain-

ability is in the context of these components.

As we found in section 2.3, explainability in an

ML system can be measured by how well it satisﬁes a

set of user-dependent and measurable objectives. We

have chosen to rely on this deﬁnition, and use the ﬁnd-

ings from section 3.1 to deﬁne a set of requirements

that should be satisﬁed in order for the user to feel

accomplished in their goals of gaining more trust in

the forecasting model and making more conﬁdent de-

cisions, as well as optimizing the SC process. Fol-

lowing this approach will ensure, that explainability

becomes a measurable term, allowing us to evaluate

and compare different explanations. Based on the ob-

jectives that were identiﬁed through the afﬁnity dia-

gram and USM, along with the research on explain-

ability, more speciﬁcally table 1, we choose a set of

relevant objectives to deﬁne explainability in DP ap-

plications. The objectives are chosen from existing

literature based on the extend to which they can fulﬁll

the goals and needs of users, and are listed below:

• Usefulness: The explanation should be useful and

satisfying to the user.

• Trustworthiness: The explanation should meet

the user’s expectations and provide them with

conﬁdence in their decisions.

• Understandability: The explanation should be

understandable and meet the user’s expectations

in terms of what information it provides i.e. the

questions it answers.

• Performance: The explanation should help the

user to perform their intended tasks more efﬁ-

ciently.

4 CONCLUSIONS

The aim of this paper was to explore and deﬁne ex-

plainability within the supply chain management do-

main, speciﬁcally focusing on the demand planning.

In order to perform this investigation, we adopted an

approach inspired by the double-diamond framework,

Towards Trustworthy AI in Demand Planning: Deﬁning Explainability for Supply Chain Management

1251

involving stages of discovery to deeply understand the

problem space.

During the discover phase, we found that explain-

ability in not a binary term, and that something can be

explainable to one groups of users while not necessar-

ily being explainable to another. As a result, adding

good explanations requires a study of the target users

in terms of their needs and goals when interacting

with the entire XAI system. In the deﬁne phase of

the problem space, we found that the main goals of

users of DP Applications is to 1) Make better deci-

sions and 2) Trust the predictions they get from the

system. More speciﬁcally, they want to know why

certain predictions are made, and what happens to a

prediction if certain features change.

In conclusion, this paper contributes to the evolv-

ing ﬁeld of explainable AI in supply chain manage-

ment by providing a user-focused description of ex-

plainability and identifying the speciﬁc needs of de-

mand planning application users. This discovery built

a foundation for implementing explainable AI solu-

tions that can enhance user trust, satisfaction, and

decision-making in demand planning processes.

REFERENCES

Adadi, A. and Berrada, M. (2018). Peeking inside the

black-box: a survey on explainable artiﬁcial intelli-

gence (xai). IEEE access, 6:52138–52160.

altexsoft (2022). Demand forecasting methods: Using ma-

chine learning to see the future of sales.

Apley, D. W. and Zhu, J. (2020). Visualizing the effects

of predictor variables in black box supervised learn-

ing models. Journal of the Royal Statistical Society:

Series B (Statistical Methodology).

Berkovsky, S., Taib, R., and Conway, D. (2017). How

to recommend?: User trust factors in movie recom-

mender systems. In Proceedings of the 22nd Inter-

national Conference on Intelligent User Interfaces,

pages 287–300, United States. Association for Com-

puting Machinery (ACM).

Binns, R., Van Kleek, M., Veale, M., Lyngs, U., Zhao, J.,

and Shadbolt, N. (2018). “it’s reducing a human being

to a percentage”: Perceptions of justice in algorithmic

decisions. In Proceedings of the 2018 CHI Conference

on Human Factors in Computing Systems, CHI ’18,

page 1–14. ACM.

Box, G. E. P. and Jenkins, G. M. (1970). Time Series Analy-

sis: Forecasting and Control. Holden-Day, San Fran-

cisco.

Bussone, A., Stumpf, S., and O’Sullivan, D. (2015). The

role of explanations on trust and reliance in clinical

decision support systems. Proceedings - 2015 Ieee

International Conference on Healthcare Informatics.

Butz, R., Schulz, R., Hommersom, A., and van Eeke-

len, M. (2022). Investigating the understandability

of xai methods for enhanced user experience: When

bayesian network users became detectives. Artiﬁcial

Intelligence in Medicine, 134:102438.

Cahour, B. and Forzy, J.-F. (2009). Does projection into

use improve trust and exploration? an example with

a cruise control system. Safety Science, Volume 47,

Issue 9.

Deck, L., Schom

acker, A., Speith, T., Sch

offer, J., K

astner,

L., and K

uhl, N. (2024). Mapping the potential of

explainable ai for fairness along the ai lifecycle.

Doshi-Velez, F. and Kim, B. (2017). Towards a rigorous

science of interpretable machine learning.

Friedman, J. H. (2001). Greedy function approximation: A

gradient boosting machine. Annals of Statistics.

Gedikli, F., Jannach, D., and Ge, M. (2014). How should i

explain? a comparison of different explanation types

for recommender systems. International Journal of

Human-Computer Studies, 72(4):367–382.

GeeksforGeeks (2024). Difference between statistical

model and machine learning.

Hoffman, R. R., Mueller, S. T., Klein, G., and Litman, J.

(2019). Metrics for explainable ai: Challenges and

prospects.

Huysmans, J., Dejaeger, K., Mues, C., Vanthienen, J.,

and Baesens, B. (2011). An empirical evaluation of

the comprehensibility of decision table, tree and rule

based predictive models. Decision Support Systems,

51(1):141–154.

Hyndman, R. J. and Khandakar, Y. (2008). Automatic time

series forecasting: The forecast package for r. Journal

of Statistical Software.

IBM (2017). What is demand planning?

Kulesza, T., Stumpf, S., Burnett, M., Wong, W.-K., Riche,

Y., Moore, T., Oberst, I., Shinsel, A., and McIntosh,

K. (2010). Explanatory debugging: Supporting end-

user debugging of machine-learned programs. In 2010

IEEE Symposium on Visual Languages and Human-

Centric Computing, pages 41–48.

Lepri, B., Oliver, N., Letouz

e, E., Pentland, A., and Vinck,

P. (2018). Fair, transparent, and accountable algorith-

mic decision-making processes: The premise, the pro-

posed solutions, and the open challenges. Philosophy

& Technology, 31(4):611–627.

Liao, Q. V., Gruen, D., and Miller, S. (2020). Question-

ing the ai: informing design practices for explainable

ai user experiences. In Proceedings of the 2020 CHI

conference on human factors in computing systems,

pages 1–15.

Liao, Q. V. and Varshney, K. R. (2022). Human-centered

explainable ai (xai): From algorithms to user experi-

ences.

Lim, B. Y., Dey, A. K., and Avrahami, D. (2009). Why

and why not explanations improve the intelligibility of

context-aware intelligent systems. Proceedings of the

SIGCHI Conference on Human Factors in Computing

Systems.

Lopes, P., Silva, E., Braga, C., Oliveira, T., and Rosado,

L. (2022). Xai systems evaluation: A review of hu-

man and computer-centred methods. Applied Sci-

ences, 12(19).

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

1252

Lount, A. B. M. and Lauzon, C. (2012). Are explanations

always important? a study of deployed, low-cost intel-

ligent interactive systems. International Conference

on Intelligent User Interfaces, Proceedings Iui.

Lundberg, S. M. and Lee, S.-I. (2017). A uniﬁed approach

to interpreting model predictions. In Advances in Neu-

ral Information Processing Systems, volume 30.

Markus, A. F., Kors, J. A., and Rijnbeek, P. R. (2021).

The role of explainability in creating trustworthy ar-

tiﬁcial intelligence for health care: a comprehensive

survey of the terminology, design choices, and eval-

uation strategies. Journal of biomedical informatics,

113:103655.

Mohseni, S., Zarei, N., and Ragan, E. D. (2020). A mul-

tidisciplinary survey and framework for design and

evaluation of explainable ai systems.

Molnar, C., Casalicchio, G., and Bischl, B. (2020). In-

terpretable machine learning–a brief history, state-of-

the-art and challenges. In Joint European confer-

ence on machine learning and knowledge discovery

in databases, pages 417–431. Springer.

Moore, J. D. and Swartout, W. R. (1988). Explanation in

expert systems: A survey. University of Southern Cal-

ifornia.

Naqvi, M. R., Elmhadhbi, L., Sarkar, A., Archimede, B.,

and Karray, M. H. (2024). Survey on ontology-based

explainable AI in manufacturing. Journal of Intelli-

gent Manufacturing, 35(8):3605–3627.

Nauta, M., Trienes, J., Pathak, S., Nguyen, E., Peters,

M., Schmitt, Y., Schl

otterer, J., van Keulen, M., and

Seifert, C. (2023). From anecdotal evidence to quan-

titative evaluation methods: A systematic review on

evaluating explainable ai. ACM Computing Surveys,

55(13s):1–42.

Nourani, M., Kabir, S., Mohseni, S., and Ragan, E. D.

(2019). The effects of meaningful and meaningless

explanations on trust and perceived system accuracy

in intelligent systems. Proceedings of the Aaai Con-

ference on Human Computation and Crowdsourcing,

7:97–105.

Pawlicka, A., Pawlicki, M., Kozik, R., Kurek, W., and

Chora

s, M. (2023). How explainable is explainability?

towards better metrics for explainable ai. In The Inter-

national Research & Innovation Forum, pages 685–

695. Springer.

Preece, A. (2018). Asking ‘why’ in ai: Explainability of

intelligent systems – perspectives and challenges. In-

telligent Systems in Accounting, Finance and Manage-

ment.

Retzlaff, C. O., Angerschmid, A., Saranti, A., Schnee-

berger, D., R

ottger, R., M

uller, H., and Holzinger, A.

(2024). Post-hoc vs ante-hoc explanations: xai design

guidelines for data scientists. Cognitive Systems Re-

search, 86:101243.

Seong, Y. and Bisantz, A. M. (2008). The impact of cogni-

tive feedback on judgment performance and trust with

decision aids. International Journal of Industrial Er-

gonomics.

Singh, M. T. R. S. and Guestrin, C. (2016). ”why should i

trust you?” explaining the predictions of any classiﬁer.

Conference of the North American Chapter of the As-

sociation for Computational Linguistics: Human Lan-

guage Technologies, Proceedings of the Demonstra-

tions Session.

Suresh, H., Gomez, S. R., Nam, K. K., and Satyanarayan,

A. (2021). Beyond expertise and roles: A framework

to characterize the stakeholders of interpretable ma-

chine learning and their needs. In Proceedings of the

2021 CHI Conference on Human Factors in Comput-

ing Systems, pages 1–16.

Van Lent, M., Fisher, W., and Mancuso, M. (2004). An ex-

plainable artiﬁcial intelligence system for small-unit

tactical behavior. In Proceedings of the national con-

ference on artiﬁcial intelligence, pages 900–907.

Vilone, G. and Longo, L. (2021). Notions of explainability

and evaluation approaches for explainable artiﬁcial in-

telligence. Information Fusion.

Zhou, J., Gandomi, A. H., Chen, F., and Holzinger, A.

(2021). Evaluating the quality of machine learning

explanations: A survey on methods and metrics. elec-

tronics.

APPENDIX

4.1 SME Interview

The full interviews can be made available upon re-

quest, if needed.

4.2 User Story Mapping

Towards Trustworthy AI in Demand Planning: Deﬁning Explainability for Supply Chain Management

1253