Model Transparency: Why Do We Care?
Ioannis Papantonis and Vaishak Belle
University of Edinburgh, U.K.
Explainable AI.
Artificial intelligence (AI) and especially machine learning (ML) has been increasingly incorporated into a
wide range of critical applications, such as healthcare, justice, credit risk assessment, and loan approval.
In this paper, we survey the motivations for caring about model transparency, especially as AI systems are
becoming increasingly complex leviathans with many moving parts. We then briefly outline the challenges in
providing computational solutions to transparency.
Artificial intelligence (AI) and especially machine
learning (ML) has been increasingly incorporated into
a wide range of critical applications, such as health-
care Kononenko (2001); Loftus et al. (2019), jus-
tice Chouldechova (2017); Christin (2017); Kleinberg
et al. (2017), credit risk assessment Chen et al. (2016);
Finlay (2011), and loan approval Finlay (2011); Wu
et al. (2019a). At the same time, automated sys-
tems have an effect on casual everyday decisions,
by recommending news articles Alvarado and Waern
(2018), movies Bennett et al. (2007), and music
Mehrotra et al. (2018). In the core of this ML pre-
dominance lies the expectation that models can be
more accurate than humans Poursabzi-Sangdeh et al.
(2021a), something that has already been demon-
strated in various cases Culverhouse et al. (2003);
Goh et al. (2020); Hilder et al. (2009); Grove et al.
(2000). Having said that, employing algorithms
even as recommendation systems for cultural prod-
ucts (like movies) makes them part of the human cul-
ture, since they not only handle cultural products, but
they also influence peoples’ decisions and perceptions
Gillespie (2016). This also means that they should not
be viewed as mere tools Bozdag (2013), but rather
as entities that hold their own values Alvarado and
Waern (2018).
As such, it is paramount to make sure that their
values align with those of human’s, thus enabling
MLs responsible integration to society Russell et al.
(2015); Gabriel (2020); Christian (2020). This need
is further magnified by several recent instances of au-
tomated systems perpetuating undesired historical hu-
man biases, such as Amazon’s recruitment algorithm
exhibiting misogynistic behaviour Meyer (2018), or
commercial systems utilized by the US criminal jus-
tice system being extremely biased against black de-
fendants Angwin et al. (2016); Dressel and Farid
(2018). Apart from that, ML failures can arise, for
example, due to misuse, as in the case of an individ-
ual who spent an extra year in prison due to a typo-
graphical error in one of the inputs that was given to
the ML system Wexler (2017). Of course, poor model
design is another major source of catastrophic failures
with far-reaching implications, such as putting people
in danger due to inaccurate air quality assessment Mc-
Gough (2018), or by providing life-threatening cancer
treatment recommendations Strickland (2019); Ross
and Swetlitz (2018). These and other similar pit-
falls, along with the consequences and confusion that
come with them Galanos (2019); Aleksander (2017),
have led to some extreme arguments about ML po-
tentially eroding the social fabric and even posing a
threat to society’s democratic foundation Bozdag and
Van Den Hoven (2015).
In light of such concerns, it is becoming increas-
ingly clear that proactive actions need to be taken on
a large scale in order to avoid bleak future situations.
The urgency of this matter is reflected, for example,
in the Declaration of Cooperation on Artificial Intelli-
gence signed by the members of the European Union
This development was followed by the forma-
tion of an expert group on AI,
with the goal of an-
choring the development of AI that is both success-
ful and ethically sound. Transparency was a central
notion highlighted in this call, and especially its re-
lationship with trustworthiness, which is one of the
end goals of this initiative. As the Commission Vice-
President for the Digital Single Market, Andrus An-
sip, noted: As always with the use of technologies,
trust is a must”. Subsequent research outputs of the
resulting group have further emphasized the impor-
tance of transparency,
listing it as one of the seven
key requirements in order to achieve trustworthy AI.
At the same time, several additional large scale
initiatives regarding the responsible integration of AI
have been taken, such as:
The Asilomar AI Principles, with the support of
the Future of Life Institute.
The Montreal Declaration for Responsible AI,
with the support of the University of Montreal.
The General Principles, with contributions from
250 thought leaders throughout the world.
The Tenets of the Partnership on AI, with con-
tributions from stakeholders coming from diverse
fields that make use of AI (academia, industry,
It is worth noting that a comparative meta-analysis
found the above sets of principles to greatly overlap,
ensuring that the scientific/regulatory/investing com-
munities have reached a satisfying level of consensus
Floridi et al. (2018). Again, transparency was recog-
nized as a core component that should drive the in-
tegration of automated systems, present in all initia-
tives, although the terminology was somehow incon-
. Finally, an additional survey that analyzed
84 different ethical guidelines for AI found that trans-
parency was the most common principle among them,
called for in 73 of them Jobin et al. (2019).
Having established the need for transparency, as the
ability to understand AI/ML, a natural next step is to
The authors in Floridi et al. (2018) introduce the term
Explicability, to overcome this inconsistency, however they
define it in terms of transparency.
define all the notions it should encompass. At this
point, it should be noted that transparency is by no
means a new concept by itself, rather it has a long his-
tory in an array of disciplines Margetts (2011); Hood
and Heald (2006). Despite that, AI/ML adds a unique
dimension to it, since other disciplines rarely face
the issue of employing tools with black-box design
where the decision process itself is elusive Floridi
et al. (2018). This challenge has been a contributing
factor to the surge in related publications, which in-
crease by about 100% every other year Larsson et al.
While this is an impressive rate, stakeholders out-
side the AI community have indicated that future de-
velopments should be predicated on further mutual
engagement between the two parties (non-AI and AI)
Bhatt et al. (2020b,a). This is a reasonable con-
cern, since although the AI community has produced
a vast literature during the last decade, the major-
ity of the scientific output addresses only the techni-
cal side of transparency, through the lens of model
transparency and model explainability Arrieta et al.
(2020). The former paradigm advocates in favour of
utilizing “transparent” (or white-box) models, mean-
ing that their design allows for readily inspecting their
inner workings Linardatos et al. (2020), such as rule-
based classifiers or regression analysis. On the other
hand, the latter approach, which is also known as ex-
plainability in AI (XAI), develops post-hoc techniques
that can provide explanations and information about
the decision process of black-box models, i.e. models
with an overly complex design that does not allow for
gaining any meaningful insights, such as neural net-
works or random forests Guidotti et al. (2018). These
are both essential research directions, however, com-
paring them to the notion of transparency discussed so
far, they seems rather narrow-scoped, focusing only
on the technical side of achieving a transparent AI in-
This observation has motivated a series of works
that advocate in favour of expanding the scope of
transparency, as used in the AI/ML community, to
encompass a wider range of goals Mittelstadt et al.
(2019), in line with the calls mentioned earlier in this
section. More specifically, some important directions
that need to be incorporated into the AI community’s
agenda are related to:
Providing guidelines regarding the appropriate
way to utilize and explain AI systems, referred to
as competence.
Building an environment of trust,
which can
only be achieved by ensuring an appropriate in-
volvement by human beings in relation to high-
risk AI applications”.
This is not an exhaustive list, as, for example, ad-
ditional dimensions that incorporate legal aspects can
potentially be fostered under this expanded notion of
transparency Larsson (2019). However, both of these
aspects can have an immediate positive impact, con-
sidering that AI/ML systems are already deployed, so
their correct and responsible use should be top prior-
Given the dimensions identified above, we briefly
discuss some research directions in both the techni-
cal camp (transparent models) as well as the socio-
technical camp (stakeholder engagement), and outline
challenges in both.
3.1 Transparent Models
Clearly, the use of transparent models i.e., models
that are transparent by design, have interpretable fea-
tures, are human-readable, etc – is motivated by their
ability to allow users to understand their inner work-
ings. Let us examine a candidate, for concreteness,
but also mention challenges that arise with it.
Probabilistic Models. For the sake of concrete-
ness, consider Bayesian networks, and other types
of graphical models Pearl (1988). Bayesian net-
works (BNs) are a class of probabilistic mod-
els that represent relationships between variables
by using direct (usually acyclic) graphs Darwiche
(2009). This has the very appealing advantage of
clearly expressing dependencies in the data, by
only drawing arrows between variables. Further-
more, once the BN is specified, graphical tests can
accurately recover all conditional independencies,
without the need to perform any algebraic manip-
ulations Geiger et al. (1990). Due to these prop-
erties BNs are arguably one of the most transpar-
ent model classes, since their internal representa-
tion (and its implications) can be easily inspected,
by construction. It is this strength that has turned
BNs into the backbone of causal inference, too;
causal relationships are represented through a BN,
while graphical criteria identify which causal ef-
fects can be estimated using observational data
Pearl (2009). Naturally, BNs have found nu-
merous applications in many integral applications
Kalet et al. (2015); Castelletti and Soncini-Sessa
(2007); Shenton et al. (2014); Uusitalo (2007);
Stewart-Koster et al. (2010); Friis-Hansen (2000).
Expert Knowledge. In addition to the above,
BNs allow for incorporating various forms of a
priori constraints, such as temporal ones Dechter
et al. (1991). Of course, probabilistic indepen-
dence constraints can be encoded as well dur-
ing model design, by directly adjusting the topol-
ogy of the directed graph. The combination of
all these properties as well as the ability to infer
causal relationships, instead of correlations, offers
a powerful alternative to black-box models, espe-
cially when considering high-stakes applications
Rudin (2019); Rudin et al. (2022).
Computational Hurdles. While BNs come with
significant advantages, a downside is that in-
ference using them is intractable, in the sense
that computing marginal probabilities is NP-hard
Cooper (1990). On top of that, specialized
routines are required to perform the inferential
step. This is the main motivation behind the re-
cent emergence of so-called tractable probabilis-
tic models (TPMs) Poon and Domingos (2011),
as an alternative approach that generalizes tradi-
tional BNs. TPMs directly encode the joint distri-
bution of a set of variables, in a way that allows
for a simple mechanism for performing inference.
Furthermore, they can potentially lead to expo-
nential savings in both inference time and stor-
ing space Darwiche (2003). Consequently, TPMs
have gathered significant attention in many appli-
cations Bekker et al. (2014).
A downside, however, is that TPMs are repre-
sented as computational graphs that do not allow
for directly inspecting the relationships between
the variables. Furthermore, incorporating proba-
bilistic constraints is not immediate, as in the BN
case. In fact, it is unclear whether it is at all feasi-
ble. These challenges effectively turn TPMs into
black-box models, despite them being closely re-
lated to one of the most transparent model classes.
The same kind of computation vs transparency di-
chotomy can be observed in so-called variational
approaches Srivastava and Sutton (2017). These
models avoid the explicit computation of proba-
bilities at run-time by training, say, neural net-
works on the distribution encoded in the model.
But the downside is that neural networks are not
interpretable, and although there is considerable
work on unwrapping the functionings of neural
networks Sharma et al. (2019); Belle and Papanto-
nis (2020), either by inspecting the internal nodes
or doing post-hoc analysis, transparency as a first-
class object is ultimately lost.
Balancing Transparency and Computation.
Perhaps a midpoint in between these extremes is
the emerging work on statistical relational learn-
ing and neuro-symbolic AI Gutmann et al. (2011);
Hu et al. (2016). The idea is to empower prob-
abilistic and deep learning models with logical
templates, either as a specification language, a
training function, or a classification target so that
(respectively) experts can encode their knowledge
using logic, perform data-efficient learning using
domain-specific logical rules, or extract logical
rules for post-hoc inspection. It remains to be seen
whether this line of work will bear more fruit in
the long run.
We refer readers interested in learning about other
types of transparent models to Arrieta et al. (2019);
Mehrabi et al. (2021); Belle and Papantonis (2020)
3.2 User Competence and
Model transparency is an essential component for im-
portant applications, but, as argued earlier, it is rather
narrow scoped, since it does not address the perplex-
ing complexity of incorporating AI into society. En-
gaging with parties that either use or are affected by
AI can lead to significant advances in terms of ensur-
ing its proper use as well as establishing a trusting
human-AI relationship. In fact, there seems to be a
strong link between these two desiderata, supported
by evidence suggesting that users’ understanding and
competence have a great influence on the amount of
trust placed upon an automated system Balfe et al.
(2018); Sheridan and Telerobotics (1992); Merritt and
Ilgen (2008). Fostering trust then, may have direct
implications on the adoption of AI in practical appli-
cations Linegang et al. (2006); Wright et al. (2019).
Having said that, trust is not a binary distinction
where one can only trust or distrust a model. Rather
than that, a user might trust a model’s outcomes for a
certain sub-population, while being suspicious of de-
cisions concerning another sub-population (perhaps
one that is under-represented in the dataset). This
showcases that trust is a thing to be adjusted or cal-
ibrated, so models are employed appropriately Zhang
et al. (2020). Failing to do so, can potentially lead to
an over-reliance on the model’s decisions Cummings
(2004) or model aversion, where users entirely dis-
miss a model after a few mistakes are made Dietvorst
et al. (2015).
There is already a considerable body of work that
explores the use of XAI generated explanations as
a means to enhance trust Mahbooba et al. (2021);
Guo (2020); Gunning et al. (2019); Gunning and Aha
(2019). This is a fairly reasonable approach, resem-
bling the way that humans justify their decisions by
providing information that is relevant to their decision
making process. Despite that, recent studies provide
evidence both in favour Lai et al. (2020); Lai and Tan
(2019) and against Poursabzi-Sangdeh et al. (2021b);
Chu et al. (2020); Carton et al. (2020) the utility of
explanations in making a model’s internal reasoning
clear to users. This has raised concerns about the way
users perceive XAI explanations overall, calling for
additional surveys to shed some light on this topic
Doshi-Velez and Kim (2017); Vaughan and Wallach
Along this line, there is also alarming evidence
suggesting that practitioners utilize XAI techniques in
a wrong way Kaur et al. (2020). An important obser-
vation here is that misuse may arise both due to an in-
complete technical understanding of XAI, as well as
due to misunderstandings regarding XAI’s intended
use. This situation clearly impedes trust calibration,
and thus achieving transparency in AI’s social integra-
tion. Here are some avenues by means of which we,
as a community, can contribute to the understanding
and embedding of trust:
Clarify Use. We need to establish frameworks
that establish the proper use of XAI, while also
developing a framework that can be used in order
to calibrate trust between human users and AI.
Explicate Limitations. While there is a plethora
of technical XAI contributions, studying the ad-
vantages and limitations of each explanation type,
as well as ways they can be combined to convey a
more complete picture of a model’s decision mak-
ing process, has not received as much attention
by the AI community. We need to identify the
most prominent explanation types and techniques,
discuss the kind of insights each one offers, and
suggest conceptual frameworks to further empha-
size their distinctions. Finally, we need to propose
ways to combine multiple explanations together
in order to gain a more well-versed understanding
of a model. Some recent surveys such as Arri-
eta et al. (2019); Mehrabi et al. (2021); Wu et al.
(2019b); Belle and Papantonis (2020) are starting
to paint such a picture.
Education in XAI. As mentioned earlier, practi-
tioners face various kinds of challenges when ap-
plying XAI techniques, most of them stemming
from their incomplete understanding of the field.
A natural step to address this issue would be to of-
fer the affected parties sufficient education to ap-
propriately understand and apply the right tech-
niques. However, there is a stark lack of aca-
demic resources on XAI, such as university level
courses. Of course, there are online articles dis-
cussing related things, but this is not a holistic,
systematic approach. In fact, there is only a sin-
gle academic course on XAI, offered by Harvard
University Lakkaraju and Lage (2019), as well
as some tutorials Samek and Montavon (2020);
Camburu and Akata (2021), but they are usu-
ally intended for researchers. We need to pro-
vide guidelines for implementing and delivering
courses, including coding assignments with con-
crete feedback.
Trust Calibration and Model Comprehension
One of the ultimate goals of XAI is to facilitate
building trusting relationships between users and
AI. While educating people on the technical de-
tails and underlying principles is a step towards
this goal, there are additional factors to be consid-
ered to ensure proper use. A concerning finding
is that data scientists might understand explana-
tions, but instead of using them in order to fur-
ther inspect a model, they use them to construct
narratives to convince themselves that the model
performs as it should Kaur et al. (2020).
We have briefly surveyed the importance of model
transparency and suggested some directions for fu-
ture work. We consider both technical directions, dis-
cussing the computation vs expressiveness tradeoff,
and socio-technical ones. We hope we convey the ur-
gency of the matter to readers, and that they are en-
couraged to come up with novel solutions that pro-
mote the responsible and safe integration of AI into
critical social applications.
This research was partly supported by a Royal Soci-
ety University Research Fellowship, UK, and partly
supported by a grant from the UKRI Strategic Priori-
ties Fund, UK to the UKRI Research Node on Trust-
worthy Autonomous Systems Governance and Regu-
lation (EP/V026607/1, 2020–2024).
