Algorithmic Bias from the Perspectives of Healthcare Professionals

Jennifer Xu

and Tamara Babaian

Department of Computer Information Systems, Bentley University, Waltham, MA, U.S.A.

Keywords: Algorithmic Bias, Perceived Fairness, Healthcare Professionals.

Abstract: This paper focuses on algorithmic bias of machine learning and artificial intelligence applications in

healthcare information systems. Based on the quantitative data and qualitative comments from a survey and

interviews with healthcare professionals, who have different job roles (e.g., clinical vs. administrative), this

study provides findings about the relationships between algorithmic bias, perceived fairness, and the intended

acceptance and adoption of ML algorithms and algorithm generated outcomes. The results suggest that the

opinions of healthcare professionals toward the causes of algorithmic bias, the criteria of algorithm assessment,

the perceived fairness, and bias mitigation approaches may vary depending on their job roles, perspectives,

tasks, and the algorithm characteristics. More research is needed to investigate algorithmic bias to ensure

fairness and equality in healthcare.

1 INTRODUCTION

Artificial intelligence (AI) has been increasingly used

and adopted by individuals, organizations, and

institutions around the globe. AI can be employed to

support decision making and enhance productivity in

a broad spectrum of application domains, such as

business, finance, transportation, and education. AI

and machine learning (ML) algorithms have also been

used in the medical and healthcare domain for clinical

decision support (Rajkomar et al., 2018), such as

predicting hypertension and obesity (Ge et al., 2023;

Gupta et al., 2022), diagnosing cardiovascular

diseases (Litjens et al., 2019), detecting cancerous

tumors (Lehman et al., 2015), identifying high-risk,

high-cost patients (Osawa et al., 2020), and

optimizing clinical workflow (Akkus, 2021).

While AI and ML have found many promising

applications in healthcare, many healthcare

stakeholders (e.g., physicians, hospital managers,

payers, and patients) have been increasingly

concerned with algorithmic bias, which may cause

serious consequences for clinical safety (Challen et

al., 2019) and misalignments with ethical principles

(Morley et al., 2020). A recent comprehensive study

of clinical ML algorithms shows that in several

medical disciplines, such as cardiology, nephrology,

https://orcid.org/0000-0001-5615-9967

https://orcid.org/0009-0000-4341-042X

and obstetrics, using patient race and ethnicity in ML

algorithms may lead to the conclusion that Black

patients are in less need for care (Vyas et al., 2020).

There have been many proposals for addressing

ML algorithmic bias and related ethical issues, such

as fairness, equality, discrimination, among many

others (Mehrabi et al., 2021). However, the problem

remains challenging to tackle, especially in

healthcare, due to several reasons.

First, it can be difficult to identify the sources of

algorithmic bias. There are a wide variety of biases,

which may originate from different sources and

caused by different reasons (Giovanola & Tiribelli,

2023; Kordzadeh & Ghasemaghaei, 2022). For

example, biased outcomes produced by an ML

algorithm may be caused by human bias embedded in

the training data (Gaonkar et al., 2020; Larrazabal et

al., 2020). Algorithms trained on data from one

community may be biased when utilized in another

community with a different patient population (Liu et

al., 2018). Rarity of certain medical conditions and

lack of clinical expertise may also result in

imbalanced samples, leading to biased, unfair

outcomes (Ktena et al., 2024).

Second, since ML algorithms are used to support

decision making, their performance (e.g., accuracy,

sensitivity) is one of the most important criteria for

Xu, J. and Babaian, T.

Algorithmic Bias from the Perspectives of Healthcare Professionals.

DOI: 10.5220/0013076500003911

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2025) - Volume 2: HEALTHINF, pages 17-28

ISBN: 978-989-758-731-3; ISSN: 2184-4305

algorithm assessment. However, it can be practically

impossible for ML algorithms to satisfy all

performance criteria and ethical principles at the same

time (Giovanola & Tiribelli, 2023). For example, as

fairness is a perception (Kordzadeh & Ghasemaghaei,

2022; Wang et al., 2020), what one social group

perceives as fair may be considered unfair by other

groups (Ochmann et al., 2024). In addition, the

variety in algorithms adds more complexity to the

task of assessing and selecting algorithms when

facing algorithmic bias. There are many types of ML

algorithms, ranging from supervised learning,

unsupervised learning, to generative AI algorithms.

Even within the same learning category, algorithms

may have heterogeneous characteristics in their

design, architecture, and parameter setting. For

instance, although both decision tree algorithms and

neural networks can be used in supervised learning to

classify data, their design, inner workings, and

learned models are completely different. The

classifiers learned by decision trees are often easy to

interpret and explain, yet neural networks are

considered “black box,” lacking transparency and

explainability.

Third, it is unclear how different healthcare

stakeholders view algorithmic bias and mitigation

strategies. Different stakeholders may have different

attitudes and opinions toward many questions, such

as the presence of algorithmic bias in healthcare ML

algorithms, where the biases come from, how to

mitigate the biases, and which algorithms to select

and adopt (Vorisek et al., 2023). Their opinions may

depend on several factors, such as their roles and

perspectives, the tasks they wish to use AI and ML to

accomplish, and the priority of goals. For example,

doctors and physicians may focus on accuracy of

diagnoses and effectiveness of treatments; managers

may prioritize patient safety and hospital operational

efficiency over other aspects; patients may wish to

lower medical costs in addition to receiving timely,

quality care.

This research seeks to study the opinions of

healthcare professionals toward algorithmic bias and

fairness. We intend to explore the following research

questions (RQs):

• RQ1: What do healthcare professionals think

about the causes of algorithmic bias?

• RQ2: What approaches do they believe would

help address bias and fairness?

• RQ3: What criteria and factors do they

consider when selecting algorithms?

• RQ4: How would they intend to use and adopt

ML algorithms in their practice and work?

Using a survey and interviews, we gathered

quantitative data and qualitative comments from

healthcare professionals from different hospitals. The

participants have various job roles in their hospitals

ranging from clinical (e.g., physicians, doctors, and

nurses), technical (e.g., radiologists), administrative,

to IT (e.g., information system developers) and

support (e.g., trainers and educators). Our findings

show that healthcare professionals are concerned

about algorithmic bias and fairness, and that their

opinions about using AI and ML algorithms in their

practice of medicine and healthcare management

differ depending on several factors.

The remainder of the paper is organized as

follows. The next section reviews the literature about

algorithmic bias, the application of ML algorithms

and bias identified in the literature. Section 3

describes research methods and data, followed by

reports of analysis and results in Section 4. Section 5

discusses our findings and implications. The last

section outlines plans for future research and

concludes the paper.

2 LITERATURE REVIEW

2.1 Algorithmic Bias

The concept of computer system bias is not new

(Moor, 1985); it was further developed by Friedman

& Nissenbaum (Friedman & Nissenbaum, 1996) and

has since evolved into what is now commonly

referred to as algorithmic bias, particularly, within the

scope of AI/ML systems. Algorithmic bias refers to

systematic errors which may disadvantage specific

individuals or groups of population without a justified

reason (Kordzadeh & Ghasemaghaei, 2022).

Algorithmic bias is a socio-technical concept: it is

rooted in the biases that exist in society, which make

their way into technology, the use of which, in-turn,

may contribute to helping proliferate and amplify

discriminatory practices by humans.

Within the context of ML applications,

algorithmic bias can be a result of inappropriate

choice of training data, model, or inappropriate use of

a system (Giovanola & Tiribelli, 2023), categorized,

respectively, as data-driven bias, model design bias,

and user-interaction bias. Data-driven biases

originate from inadequate representation (minority or

selection bias), missing data, and differences between

the population in the training and deployment data

(domain shift bias). Model bias arises during the

model conceptualization stages, for example, with the

selection and assignment of classification labels

HEALTHINF 2025 - 18th International Conference on Health Informatics

(Zając et al., 2023). Biases arising from the

interaction of users and ML technology (sometimes

referred to as latent (DeCamp & Lindvall, 2020), or

emergent (Friedman & Nissenbaum, 1996) bias,

include automation and feedback loop biases, which

are caused by overreliance on the potentially

imperfect decision support algorithms without

thorough questioning of the system-generated

predictions.

Fairness, one of the core ethical principles, is

closely related to algorithmic bias. In the healthcare

context, unfair outcomes often are related to the use

of protected attributes, such as gender, age, race,

ethnicity, socioeconomic status, etc. (Abràmoff et al.,

2023). As an ethics value, fairness has multiple

dimensions including distributive, procedural,

interpersonal, and informational justice (Ochmann et

al., 2024). In our research, we focus on the perceived

fairness, which is defined as the extent to which

algorithms are perceived to be fair by people

(Kordzadeh & Ghasemaghaei, 2022).

While ML models are typically developed and

assessed with the goal of optimizing for specific

overall performance measures, such as accuracy,

precision, and sensitivity, it remains a challenge, in

general, for ML algorithms to achieve high

performance while aligning with all ethical principles

at the same time (Giovanola & Tiribelli, 2023).

2.2 Algorithmic Bias in Healthcare

A family of decision-tree-based algorithms have been

employed in clinical and healthcare management

applications and research, such as predicting survival

in locally advanced rectal cancer (De Felice et al.,

2020), readmission in mental health patients (Morel

et al., 2020), and triage level designation for

emergency room (ER) patients (Levin et al., 2018).

Deep learning algorithms that use neural networks

have also been employed to process medical imagery

data (e.g., X-ray, MRI, CT scans, and ultrasound

images) for detecting, screening, or analyzing various

clinical conditions, such as breast tumors (Lehman et

al., 2015), lung cancers (Ardila et al., 2019), and

cardiovascular complications (Litjens et al., 2019),

and to classify numerical and text data (e.g., visual

signs and clinical notes) for predicting hypertension

(Ge et al., 2023), diagnosing cancers (Fu et al., 2020),

preventing inpatient falls (Cheligeer et al., 2024), and

providing new disease insights (Rajpurkar et al.,

2022).

While ML algorithms are employed in healthcare,

there have been growing concerns about algorithmic

bias that may jeopardize clinical safety (Challen et al.,

2019) and cause healthcare inequality, disparity, and

unfair outcomes toward underrepresented social

groups (Giovanola & Tiribelli, 2023; Mehrabi et al.,

2021; Schrouff et al., 2023).

Investigations of the sources and types of

algorithmic bias in clinical decision support

applications confirm that algorithmic bias in

healthcare may originate from the training data, the

algorithm design, or the interactions between human

users (e.g., physicians and patients) and clinical

support systems (Giovanola & Tiribelli, 2023). The

distribution shift, which occurs when there are

discrepancies between the training data and the data

in real-world settings, can cause the learned model to

perform and generalize poorly and produce biased

outcomes. For example, a study finds that a large

performance drop occurs when an ML model trained

on data from 17 teledermatology services in the U.S.

is applied to teledermatology cases in Colombia

(Schrouff et al., 2023). Similarly, Watson for

Oncology, a system with ML algorithms trained on

Western datasets, is found to perform much worse

when used for Chinese patients (Liu et al., 2018).

Imbalanced data caused by underrepresentation of

some social groups (e.g., gender) may lead to biased

classifiers (Larrazabal et al., 2020). Labelling bias

resulted from subjective annotations of physicians or

use of billing and reimbursement driven diagnostic

coding of diseases may also cause a trained model to

reflect the bias embedded in the training data (Yu &

Kohane, 2019). Automation bias (a.k.a., confirmatory

bias) may occur when physicians over rely on

algorithm generated recommendations and diagnoses

(Giovanola & Tiribelli, 2023). For example,

automation bias is shown to cause an increased false

negative rate in radiology diagnoses (Lehman et al.,

2015).

The discussion of fairness in healthcare often

centers on distributive fairness (Giovanola &

Tiribelli, 2023). A recent study shows that including

patient race and ethnicity information in the data for

ML algorithms may potentially lead to unfair

distribution of clinical resources toward certain

minority groups (Vyas et al., 2020). However, it has

also been reported that in some cases, ML algorithms

may perform well with fair outcomes for different

social groups (Noseworthy et al., 2020) and across

hospitals and sections (Levin et al., 2018).

Researchers propose frameworks and guidelines

for building safer ML supported clinical decision

systems (Challen et al., 2019), and promoting trust

(Ema et al., 2020). However, development of

strategies and methods for mitigating bias and

enhancing fairness and trust in ML algorithms in

Algorithmic Bias from the Perspectives of Healthcare Professionals

healthcare remains elusive, due to the complexity of

the issues. It is widely recognized that the demands

on effectiveness, safety, and fairness of AI-based

tools require that healthcare AI developers, users, and

regulating bodies work collaboratively to define

guidelines for clarifying the levels of transparency on

the provenance, quality of data as well as assessment

mechanisms for the AI tools before their adoption

(Matheny et al., 2023). Furthermore, the risks

associated with the entrenchment and amplification

of existing biases due to the use of black-box decision

support require specific attention to the clinicians’

understanding of the potential risks to patient safety

and equity, which motivates our study.

2.3 Stakeholder Perspectives

Healthcare is an industry with many different

stakeholders, including healthcare organizations,

physicians, administrative and support personnel,

payers and insurance companies, clinical information

system and electronic health record (EHR)

developers, and patients. Different stakeholders may

have diverse perspectives, priorities, and opinions

regarding ML algorithms, algorithmic bias, and

fairness.

There has been limited research and reports on

stakeholder perspectives. A qualitative study (Parikh

et al., 2022) reports findings from interviews of 29

oncology clinicians regarding their perceptions on the

adoption of ML-based predictions of patient mortality

risk to prompt conversations of end-of -life care with

cancer patients. It is found that physicians are

generally positive toward the prospect of using ML

generated predictions. However, they are concerned

with ethical issues, accuracy and possible

confirmation and automation biases. Another study

has conducted interviews with a group of healthcare

AI experts specialized in system development and

regulation. The findings show that experts’ opinions

vary regarding the mitigation strategies for

algorithmic bias (Aquino et al., 2023). Specifically,

there are divergent views about whether protected

attributes such as sociocultural identifiers (e.g., race

and gender) should be included in healthcare ML

algorithms (US Department of Health and Human

Services 2024). Similarly, a web-based survey has

found that healthcare AI developers perceive their

algorithms to be moderately fair (Vorisek et al.,

2023).

Our research seeks to explore how healthcare

professionals view and perceive algorithmic bias and

fairness and how they intend to assess, select, and

adopt ML algorithms when facing algorithmic bias.

Our chosen theoretical framework presented in

(Kordzadeh & Ghasemaghaei, 2022) focuses on

algorithmic bias, perceived fairness, and user

behavioral responses. It posits that algorithmic bias

negatively influences perceived fairness, which

further affects the behavioral response of users in

terms of their acceptance of the algorithms and

adoption of ML based decision support systems.

Moreover, these relationships are affected by several

factors including individual, task, and technology

characteristics. For example, individuals with

different education levels and gender may perceive

the outcome of ML algorithms differently (Wang et

al., 2020), and the responses to algorithmic bias may

also vary depending on whether the task has high-

impact or low-impact. Technology characteristics,

such as levels of transparency and explainability, may

also affect the perceptions of fairness, and users’

behavioral responses.

3 METHODS AND DATA

Our research methodology includes survey and

interviews. Both survey and interviews target

healthcare professionals taking various roles in their

organizations.

3.1 Survey

The survey is designed to explore opinions and

attitudes of healthcare professionals toward

algorithmic bias, fairness, and intended behavior in

response to bias. To reduce the scope of ML

algorithms referred to in the survey, we focus only on

supervised ML classification algorithms.

The survey consists of two parts. The first part

contains eight questions regarding the participant

demographic information (e.g., age, gender),

background (e.g., job roles in organizations, years of

work experience, and knowledge about ML

algorithms), and their general attitudes toward AI

technology in general. The second part includes five

questions using a five-point Likert scale ranging from

Strongly Disagree to Strongly Agree. Each of these

questions focuses on the participant’s opinions

toward a specific topic: causes of algorithmic bias

(Q9), fairness (Q10), algorithm assessment and task

characteristics (Q11), technology characteristics

(Q12), and intended behavior (acceptance and

adoption) (Q13). Each question consists of several

sub-questions (Q9: 4, Q10: 3, Q11: 4, Q12: 5, Q13:

6). The complete set of questions is provided in the

Appendix.

HEALTHINF 2025 - 18th International Conference on Health Informatics

The survey questions were developed based on

the review of the literature on algorithmic bias in

healthcare. Major issues related to algorithmic bias

(e.g., distribution shifts, algorithm transparency) are

covered in the questions. Questions about the

intended behaviors were adapted from the survey

instruments used for accessing technology

acceptance and adoption in the information systems

(IS) literature (Venkatesh et al., 2003).

A pilot study was conducted involving four

participants, who were IS researchers with expertise

in algorithmic bias and healthcare decision support

systems. Questions were revised and modified for

several rounds based on participants’ feedback.

The participants of the full-scale survey were

recruited from an executive MBA program offered at

a business university in the northeastern USA. The

executive MBA program was offered specifically to

a cohort of employees in a world-class hospital based

in the Greater Boston area. Students who were taking

a healthcare analytics course in this program in spring

2024 were invited to participate in the survey. The

survey was administered after covering various ML

algorithms during the semester.

Among the 20 students who took the course, 19

responded to the invitation and participated in the

survey. Among the 19 participants, the majority (n =

14) were female, and the rest (n = 5) are male.

Students were from four age groups (i.e., 20-29, 30-

39, 40-49, and 50+), and the number of participants

in the groups was 4, 7, 4, 4, respectively. Their work

experience ranged from 1-5 years (n = 3), 6-10 years

(n = 6), 11-20 years (n = 7), to 20 or more years (n =

3). Their job roles included administrative (e.g.,

manager, director, team leader) (n = 4), clinical (e.g.,

physician, nurse, surgeon) (n = 8), technical (e.g.,

radiology) (n = 1), support (e.g., educator, analyst) (n

= 4), and IT (e.g., developer, clinical system engineer)

(n = 2). Regarding their experience with ML

algorithms, all participants were familiar with

decision tree and neural networks, which were

covered in the course. Three participants also were

familiar with other ML algorithms (e.g., k-nearest

neighbor and naïve Bayes classifier).

3.2 Interviews

In a parallel study, we conducted semi-structured

interviews with eight healthcare professionals in

different positions/roles: doctors, nurses, therapists,

and administrators overseeing medical IT. All

recruited interviewees work in medical facilities in

the northeastern USA; all of them were medically

trained, although some were currently working in

administrative positions, sometimes, in addition to

their clinical work. In terms of type and area of

practice, interviewees spanned a range from in-

patient and outpatient care, emergency room (ER)

care, primary care, to urgent care facilities.

Interview questions addressed perspective use of

AI within EHRs and the issues surrounding ethics of

using AI algorithms. The interviews lasting 30-45

minutes were conducted over Zoom, recorded,

transcribed and analysed using inductive content

analysis with the coding labels derived from the

interview topic questions and emergent themes,

refined in the process.

4 ANALYSIS AND RESULTS

4.1 Survey Results

Using the responses to each close-ended sub-

questions as the dependent variable, we performed

two-way ANOVA on three independent variables:

age, gender, and job role. Since years of experience

in healthcare and years of experience in organization

are highly correlated with age, adding them to the

ANOVA would cause the multicollinearity problem.

The results from ANOVA using either experience

variable produced similar results with those using

age. Participants’ knowledge about ML algorithms is

homogeneous and does not show individual

difference, since the cohort took the same analytics

course where the algorithms were covered. Their

intended goals of using AI correlate with their job

roles (e.g., enhancing quality of care for clinical roles,

reducing costs and improving operational

productivity for administrative roles). As a result,

these variables were not included in ANOVA. We

summarize the ANOVA findings in the following.

4.1.1 Causes of Algorithmic Bias

The results show that the participants generally agree

on the four statements about the major causes of

algorithmic bias. In particular, they believe that bias

in the training data can cause algorithmic bias (Q9.1

Likert scale mean = 4.6, S.D. = 0.51), and that the

distribution shift of data, which occurs when an

algorithm is trained on data from one hospital is used

in another hospital, can also cause algorithmic bias

(e.g., it may not work) (Q9.2 mean = 4.5, S.D. = 0.61).

They are not as positive that a larger training dataset

would necessarily mitigate bias (Q9.3 mean = 3.6,

S.D. = 1.0). They tend to believe that bias may also

come from the specific design of algorithms (Q9.4

Algorithmic Bias from the Perspectives of Healthcare Professionals

mean = 3.9, S.D. = 0.88).

None of the three independent variables (age,

gender, and job role) is significant in ANOVA across

the four sub-questions. In other words, there is no

gender, age, and role difference in the participants’

opinions toward sources of algorithmic bias.

4.1.2 Fairness of Outcomes

Participants strongly agree that if the outcome

generated by an algorithm is biased against certain

social groups, it is unfair for those groups (Q10.1

mean = 4.11, S.D. = 1.0). However, they disagree that

protected demographic attributes (e.g., age, gender,

race) of patients should be excluded in ML algorithms

as a solution to prevent possible unfair outcomes

(Q10.2 mean = 2.4, S.D. = 0.96). Instead, these

attributes, which may potentially lead to unfair

outcomes, can be used in some ML applications

depending on the specific problem under study

(Q10.3 mean = 4.4, S.D. = 0.60).

There is no age, gender, or role difference in the

first two sub-questions in ANOVA. Only age is

significant in the third sub-question, showing that the

female participants (mean = 4.3) are slightly less

inclined than the male participants (mean = 4.6) to

allow ML algorithms to use demographical

information of patients.

4.1.3 Assessment Criteria and Task

Characteristics

The responses to the sub-question regarding

algorithm assessment are rather similar: the

participants generally believe that ML algorithms

should be assessed based on multiple criteria

including performance, bias, and fairness (Q11.1

mean = 3.5, S.D. = 0.61), and that the prioritization of

the criteria depends on the task characteristics (Q11.2

mean = 4.1, S.D. = 0.71). However, they are relatively

neutral about whether algorithmic bias should be

allowed in high- (e.g., fatal disease prediction) (Q11.3

mean = 3.5, S.D. = 1.0) or low-impact situations (e.g.,

patient satisfaction prediction) (Q11.4 mean = 3.5,

S.D. = 0.84).

The ANOVA result shows that there is significant

difference in age and job role for Q11.4. Among the

four age groups, the 40-49 group gives significantly

lower score to this question, indicating that they tend

to disagree that algorithmic bias can be tolerated even

in low-impact situations. In terms of job roles,

clinicians also tend to disagree with this statement

more than other participants with other roles.

4.1.4 Technology Characteristics

Participants strongly believe that algorithm

transparency (Q12.1 mean = 4.6, S.D. = 0.61) and

explainability (Q12.3 mean = 4.3, S.D. = 0.73) are

important, and they prefer algorithms with high

transparency (Q12.2 mean = 4.3, S.D. = 0.75) and

explainability (Q12.4 mean = 4.5, S.D. = 0.51) over

those with lower transparency and explainability and

similar performance. However, if one algorithm

significantly outperforms another one, participants

are less certain about whether they would choose the

one with higher performance and lower transparency

(Q12.5 mean = 3.2, S.D. = 1.1).

None of the independent variables is significant in

ANOVA.

4.1.5 Intended Behavioral Responses

Participants’ opinions are more divergent in terms of

their intended behaviors (e.g. acceptance and

adoption), given that ML algorithms vary in their

performance (e.g., accuracy and error rate), bias, and

fairness. Participants do not agree that ML algorithms

should be outright abandoned even though their

outcomes may have errors (e.g., false positives and

false negatives) (Q13.1 mean = 2.7, S.D. = 1.0).

However, they are more inclined to avoid using

algorithms with biased (Q13.2 mean = 3.3, S.D. = 1.0)

or unfair outcomes (Q13.3 mean = 4.0, S.D. = 1.1),

but may still use the algorithm if its performance is

high (Q13.5 mean = 3.4, S.D. = 1.1). They strongly

agree that they will treat algorithm generated

outcomes as mere recommendations and will rely on

their own knowledge and experience to make final

decisions (Q13.6 mean = 4.4, S.D. = 1.0).

The ANOVA identifies significant differences in

job role, gender, and the interactions between role and

age, and between gender and age for Q13.6. More

specifically, the participants with administrative roles

(mean = 3.25) are less likely to agree with the

statement than other roles including clinical,

technical, IT, and support (mean = 4.7).

4.2 Interview Results

In this section we summarize the interviewee

responses regarding issues surrounding the use of AI

and algorithmic bias. Full analysis of the interview

data is beyond the scope of this paper. Here we

present points relevant to our research questions,

illustrated with quotes from the interviews.

HEALTHINF 2025 - 18th International Conference on Health Informatics

4.2.1 Causes and Consequences of Errors

and Algorithmic Bias

All interviewees saw the potential associated with the

application of AI/ML in medical practice, while

recognizing that the technology may not have reached

the level needed to be used in clinical settings.

Algorithm generated recommendations are part of the

usual workflow for some practitioners. Practitioners

referred to system generated, typically rule-based

scoring of risks, such as risk of patient experiencing

sepsis, falling, or heart attack. Interviewees

mentioned that while they use such information, they

typically combine it with other assessments instead of

following it blindly. They emphasized that

transparency in the way the scores are generated

(typically, rule-based) is important to them and they

realize that the scoring is not flawless. Among the

sources of concern for potential inaccuracy and bias

in the automatically generated recommendations,

they mentioned issues of distribution shift, model

bias, and the validity of the knowledge base used by

algorithms to make the assessment. Concerns

regarding the knowledge base include patient

information that cannot be easily put into the written

form or easily found, as well as the validity of

literature.

(INT_7) Mostly, we have a very diverse patient

population and if the model has been designed and

tested on a population that is very different from

our population, certainly bias can be introduced in

that way.

One practitioner recognized the failure of a newly

implemented ML system to account for the

differences in the environment for recommending

whether to discharge a patient to a rehab facility or to

their home:

(INT_2) One of the things that we're running into is

the discharge tool that we're using doesn't take into

account a person's home environment. So you know

it might say that a patient can go home not realizing

there's 30 steps to get into their home. But again, in

Massachusetts, our architecture is very different

than you get in Florida, so if this was a tool

developed in Florida where a lot of the houses there

are one level, no steps to get in, versus

Massachusetts where you're dealing with

architecture from the 17 - 18 hundreds in some

cases.

An emergency care physician noted the danger

associated with over-reliance on tools (i.e., the

automation bias), as an additional potential negative

consequence of using inaccurate or biased

algorithms:

(INT_0) So I just worry that there are subtleties in

the human condition that AI may not be able to

analyze and give us that information. And we may

become too reliant on it.

Physicians emphasized that in the end they rely on

their own judgement, as stated, for example, by this

emergency care physician describing the use of

patient acuity score:

(INT_3) We use that as a guide, but we don’t rely

on it.

A different concern is expressed by a family

physician, who is also a chief medical informatics

officer (CMIO), regarding AI potentially proposing

case-relevant literature:

(INT_5) And then the, you know, the concerns

about bias as well. Kind of knowing that the data

that the AI is using is unbiased data. You mentioned

the AI looking at the latest literature and presenting

that data to the clinician. There's a lot of junk in the

literature now as you know, there are journals that

are really not going through the peer review

process, so just because it's the latest data doesn't

mean that it's trustworthy. So, you know, I'd have

concerns about bias as well. Those are all

challenges. I'm sure there are others too.

4.2.2 Assessment and Response to Bias

Approaches to addressing bias and inaccuracies in

models were expressed by two professionals, who are

both involved in assessment and implementation of

ML-based tools. Both emphasized working closely

with vendors to understand the parameters of the

black-box models and data they were trained on.

Detecting and addressing bias issues before a tool

is put into clinical use is a major concern of a chief

medical informatics officer, who listed bias as one of

the many evaluation parameters for the ML-based

tools. In this person’s opinion, there is a wide range

of the level of physicians’ awareness of the strengths

and pitfalls of ML-based tools:

(INT_7) Not many people are sort of thinking about

how do we asses for bias? Is the tool safe to use?

All those kinds of things…. And that's my job. And

that’s why we're not implementing a lot of tools

right now that would provide clinical

recommendations, at least.

A more optimistic view on dealing with some

known biases was expressed by this provider:

Algorithmic Bias from the Perspectives of Healthcare Professionals

(INT_2) I think if it's a known bias, that people

know about, you're gonna automatically correct for

that.

A CMIO points out that the significance of

different factors used in model accuracy evaluation

depends on the specific task, for example, age may be

more important than race in some settings:

(INT_7) So with, something for radiology, I'd be

thinking about age because structure is what

radiology looks on, changes with age... But in other

things, you certainly want to make sure that you,

are looking at race and ethnicity a lot

more...depending on the model you're looking at,

you may be particularly cognizant of certain

demographic characteristics that you want to make

sure are included in in the study populations.

Retrospective data evaluation is used to assess the

impact of the distribution shift:

(INT_7) We are more likely to feel comfortable with

the model if we're going to test it on retrospective

data with our population. And see that it performs

well and if there is a plan for repeated validations

going forward, then if there's a proposal to put it in

with our population and just move forward

prospectively.

4.2.3 Technology Adoption

Regarding adoption and use, transparency of the

reasons for presented recommendations as well as the

data used in model training was mentioned as a

desirable factor by many interviewees. For the

proprietary models, vendors do not necessarily

disclose what data their model is trained on right now,

although regulations, such as HTI 1 (US Department

of Health and Human Services 2024), are being put

in place requiring more disclosure:

(INT_7) But going forward, they will have to be a

lot more transparent with sharing a lot of

information, not their entire, sort of, secret sauce,

but a lot of information about their algorithm. But

right now we work very closely with them, try to

gain as much information as possible. But yeah, do

we don't have, we don't have a lot of that

information and so, you know, it's hard.

Doctors and nurses emphasized that an ML-based

clinical decision-making support tool’s

recommendation should be one of the points of

information for them to consider. Many expressed the

need for algorithm’s transparency regarding the basis

for the recommendation. The following quote

describes the sentiment expressed by many

interviewees:

(INT_5) There's always the concern of the black

box, you know, you don't know how the AI is

arriving at these decisions. So transparency is

definitely a limitation. I would love AI to make

suggestions and not make decisions. So like, you

know, I always want to make sure that there's some

human reviewing everything, so you know can't

make it completely automatic.

Our interviewees also expressed many

suggestions regarding the tasks for which they see

AI/ML being useful, but this information is beyond

the scope of this paper.

5 DISCUSSION

Recent advancements in AI and ML technologies are

set to revolutionize healthcare and medical practice.

However, the threat that the algorithmic bias inherent

in many ML applications will amplify existing social

biases and cause significant harm to the fair and safe

medical care is real. Algorithmic bias requires

focused attention by all stakeholders: developers of

algorithms and healthcare systems, physicians,

patients, insurance and other payers, and regulatory

institutions. In our study, we conducted an initial

investigation of perceptions of healthcare

professionals regarding algorithmic bias, its sources,

and their intended behaviors in responding to the

algorithmic bias and fairness issues.

Healthcare professionals participating in the study

are well aware of the issues of algorithmic bias, its

relationship to the model design and training data, and

the dangers of proliferating the unfair treatment

through unguarded use of biased models. Those

professionals who have experienced working with or

evaluating ML models for clinical decision making

have experienced dealing with issues of model bias,

distribution shift, imbalanced training data, and non-

transparency of recommendations. The prevailing

attitude of surveyed and interviewed healthcare

personnel, both in patient-facing and administrative

roles, is that algorithmic recommendations should be

treated as a point of information, with the physician’s

judgement applied as the deciding factor. Assessment

of ML models for performance, safety, and fairness is

a major concern for medical informatics officers and

physicians involved in evaluating and implementing

new technology. In assessing the model suitability for

clinical practice, practitioners favor a differentiated

approach to the inclusion of a variety of demographic

HEALTHINF 2025 - 18th International Conference on Health Informatics

and socio-economic factors in the training data. The

differentiation is based on the specific task and

domain of application of the model (e.g. radiology vs

suicide prevention). Regarding the tolerance of bias

and its importance compared to model performance,

practitioners also have a differentiated approach

based on the task. The approach to weighing

performance versus potential bias does not depend on

the importance of the decision (high-impact, fatal

disease prediction, or low impact situation).

Practitioners strongly prefer technology that

exhibits transparency and explainability of decisions,

although they are less concerned with transparency

for high-performing algorithms. In terms of the roles,

administrators are less likely to agree to adopt flawed

or biased algorithms into the practice. Administrators

working on assessing and implementing ML-based

technologies stress dialog with the developers,

especially regarding the population characteristics of

the training data, as a way to achieve satisfactory

transparency, performance and fairness results from

applying the ML models. Healthcare workers

recognize that the different stakeholders have

responsibilities in ensuring fairness and safety in the

use of ML-based tools.

5.1 Implications for Research and

Practice

Our study has several implications for both research

and practice related to algorithmic bias. First, based

on the theoretical framework on algorithmic bias

(Kordzadeh & Ghasemaghaei, 2022), our study

explores the relationships between the key constructs

in the framework: algorithmic bias, perceived

fairness, and intended behavioral responses, in the

context of healthcare. Our findings show that

algorithmic bias may have a negative impact on

perceived fairness, which will further affect users’

decisions for accepting or adopting ML algorithms.

Second, using both survey and interview methods, we

have gathered empirical evidence that different

stakeholders have varying opinions toward

algorithmic bias and fairness, which depend on

several factors including the stakeholders’ job roles,

their perspectives, the particular task, and the

technical characteristics of the algorithms and

systems. Third, the findings from this study suggest

that research on algorithmic bias in healthcare should

focus on approaches to developing transparent

solutions and communicating the known uncertainty

in the model recommendations, including threats to

model fairness, in clinical setting. Specifically, given

the prominence of the distribution shift, as a source of

bias, researchers should develop robust

methodologies for assessing the distribution shift, as

well as mitigating or overcoming it. It is important to

study the impact of specific demographic and socio-

economic factors for algorithm fairness for the

specific domains of application, as it was noted by

study participants.

Our findings also suggest that the practice of

algorithmic bias mitigation should take serious

consideration of the particular tasks and contexts, and

that algorithms should be assessed and selected based

on multiple performance criteria and ethical

principles. Sometimes, it is necessary to prioritize

these criteria and principles depending on the specific

applications under study and the perspectives of

stakeholders involved in the development of

healthcare information systems. It is still a long

journey to fully address algorithmic bias and ensure

fairness and equity in healthcare.

5.2 Limitations

Limitations of this study include the small sample of

surveyed and interviewed professionals as well as

their limited geographic diversity. In recruiting for

this initial study, we did not consider the race, nor the

socio-economic status and other demographic

characteristics of the patient population faced by the

participants in their practice. A greater sample size

would have enabled us to also stratify the participants

by the level of knowledge of AI/ML in general.

6 CONCLUDING REMARKS

This research seeks to explore opinions of healthcare

professionals facing rapid advancements of AI and

ML in healthcare and the associated algorithmic bias

and fairness issues. Our future research will extend to

other important healthcare stakeholders. In particular,

it will be an interesting research question to

investigate how patients with different backgrounds,

health conditions, and socioeconomic status view the

prospects of AI in healthcare and the resulting ethical

implications.

ACKNOWLEDGEMENTS

We thank our study participants for their time and

invaluable input.

Algorithmic Bias from the Perspectives of Healthcare Professionals

REFERENCES

Abràmoff, M. D., et al. (2023). Considerations for

addressing bias in artificial intelligence for health

equity. NPJ Digital Medicine, 6(1), 1-7.

Akkus, Z. (2021). Artificial intelligence-powered

ultrasound for diagnosis and improving clinical

workflow. In Machine Learning in Medicine. Chapman

and Hall.

Aquino, Y. S. J., et al. (2023). Practical, epistemic and

normative implications of algorithmic bias in

healthcare artificial intelligence: A qualitative study of

multidisciplinary expert perspectives. Journal of

Medical Ethics, 0, 1-9.

Ardila, D., et al. (2019). End-to-end lung cancer screening

with three-dimensional deep learning on low-dose chest

computed tomography. Nature Medicine, 25(6), 954-

961.

Challen, R., et al. (2019). Artificial intelligence, bias and

clinical safety. BMJ Quality & Safety, 28(3), 231-237.

Cheligeer, C., et al. (2024). BERT-based neural network for

inpatient fall detection from electronic medical records:

Retrospective cohort study. JMIR Medical Informatics,

12, Article e48995.

De Felice, F., et al. (2020). Decision tree algorithm in

locally advanced rectal cancer: An example of over-

interpretation and misuse of a machine learning

approach. Journal of Cancer Research and Clinical

Oncology, 146(3), 761-765.

DeCamp, M., & Lindvall, C. (2020). Latent bias and the

implementation of artificial intelligence in medicine.

Journal of the American Medical Informatics

Association, 27(12), 2020-2023.

Ema, A., et al. (2020). Proposal for type classification for

building trust in medical artificial intelligence systems.

Proceedings of the AAAI/ACM Conference on AI,

Ethics, and Society, NY, USA.

Friedman, B., & Nissenbaum, H. (1996). Bias in computer

systems. ACM Transactions on Information Systems,

14(3), 330-347.

Fu, Y., et al. (2020). Pan-cancer computational

histopathology reveals mutations, tumor composition

and prognosis. Nature Cancer, 1(8), 800-810.

Gaonkar, B., et al. (2020). Ethical issues arising due to bias

in training a.I. algorithms in healthcare and data sharing

as a potential solution. The AI Ethics Journal, 1(1), 2-11.

Ge, B., et al. (2023). Detection of pulmonary hypertension

associated with congenital heart disease based on time-

frequency domain and deep learning features.

Biomedical Signal Processing and Control, 81, Article

104316.

Giovanola, B., & Tiribelli, S. (2023). Beyond bias and

discrimination: Redefining the AI ethics principle of

fairness in healthcare machine-learning algorithms. AI

& Society, 38(2), 549-563.

Gupta, M., et al. (2022). Obesity prediction with EHR data:

A deep learning approach with interpretable elements.

ACM Transactions on Computing for Healthcare, 3(3),

Article 32.

Kordzadeh, N., & Ghasemaghaei, M. (2022). Algorithmic

bias: review, synthesis, and future research directions.

European Journal of Information Systems, 31(3), 388–

409.

Ktena, I., et al. (2024). Generative models improve fairness

of medical classifiers under distribution shifts. Nature

Medicine, 30(4), 1166-1173.

Larrazabal, A. J., et al. (2020). Gender imbalance in

medical imaging datasets produces biased classifiers

for computer-aided diagnosis. Proc. of the National

Academy of Sciences, 117(23), 12592-12594.

Lehman, C. D., et al. (2015). Diagnostic accuracy of digital

screening mammography with and without computer-

aided detection. JAMA Internal Medicine, 175(11),

1828-1837.

Levin, S., et al. (2018). Machine-learning-based electronic

triage more accurately differentiates patients with

respect to clinical outcomes compared with the

emergency severity index. Annals of Emergency

Medicine, 71(5), 565-574.

Litjens, G., et al. (2019). State-of-the-art deep learning in

cardiovascular image analysis. JACC: Cardiovascular

Imaging, 12(8), 1549-1565.

Liu, C., et al. (2018). Using artificial intelligence (watson

for oncology) for treatment recommendations amongst

chinese patients with lung cancer: Feasibility study.

Journal of Medical Internet Research, 20(9), Article

e11087.

Matheny, M., et al. (2023). Artificial Intelligence in Health

Care: The Hope, the Hype, the Promise, the Peril (Vol.

2019). National Academy of Medicine.

Mehrabi, N., et al. (2021). A survey on bias and fairness in

machine learning. ACM Computing Surveys, 54(6),

Article 115.

Moor, J.H. (1985). What is computer ethics?

Metaphilosophy, 16(4), 266-275.

Morel, D., et al. (2020). Predicting hospital readmission in

patients with mental or substance use disorders: A

machine learning approach. International Journal of

Medical Informatics, 139, Article 104136.

Morley, J., et al. (2020). The ethics of AI in health care: A

mapping review. Social Science & Medicine

, 260,

Article 113172.

Noseworthy, P. A., et al. (2020). Assessing and mitigating

bias in medical artificial intelligence: The effects of

race and ethnicity on a deep learning model for ecg

analysis. Circulation. Arrhythmia and

Electrophysiology, 13(3), Article e007988.

Ochmann, J., et al. (2024). Perceived algorithmic fairness:

An empirical study of transparency and

anthropomorphism in algorithmic recruiting.

Information Systems Journal, 34(2), 384-414.

Osawa, I., et al. (2020). Machine-learning-based prediction

models for high-need high-cost patients using

nationwide clinical and claims data. NPJ Digital

Medicine, 3(1), 1-9.

Parikh, R. B., et al. (2022). Clinician perspectives on

machine learning prognostic algorithms in the routine

care of patients with cancer: a qualitative study.

Supportive Care in Cancer, 30(5), 4363-4372.

HEALTHINF 2025 - 18th International Conference on Health Informatics

Rajkomar, A., et al. (2018). Ensuring Fairness in Machine

Learning to Advance Health Equity. Annals of Internal

Medicine, 169(12), 866-872.

Rajpurkar, P., et al. (2022). AI in health and medicine.

Nature Medicine, 28(1), 31-38.

Schrouff, J., et al. (2023). Diagnosing failures of fairness

transfer across distribution shift in real-world medical

settings. https://arxiv.org/abs/2202.01034

US Department of Health and Human Services (2024),

“Health Data, Technology, and Interoperability:

Certification Program Updates, Algorithm

Transparency, and Information Sharing,” Federal

govinfo.gov/content/pkg/FR-2024-01-09/pdf/2023-28

857.pdf

Venkatesh, V., et al. (2003). User acceptance of

information technology: Toward a unified view. MIS

Quarterly, 27(3), 425-478.

Vorisek, C., et al. (2023). Artificial intelligence bias in

health care: Web-based survey. Jounal of Medical

Internet Research, 25, Article e41089.

Vyas, D. A., et al. (2020). Hidden in plain sight:

Reconsidering the use of race correction in clinical

algorithms. New England Journal of Medicine, 383(9),

874-882.

Wang, R., et al. (2020). Factors influencing perceived

fairness in algorithmic decision-making: Algorithm

outcomes, development procedures, and individual

differences. Proc. of the 2020 CHI Conference on

Human Factors in Computing Systems, NY, USA.

Yu, K.-H., & Kohane, I. S. (2019). Framing the challenges

of artificial intelligence in medicine. BMJ Quality &

Safety, 28(3), 238-241.

Zając, H. D., et al. (2023). Ground truth or dare: Factors

affecting the creation of medical datasets for training

AI. Proc. of the 2023 AAAI/ACM Conference on AI,

Ethics, and Society, Montreal, QC Canada.

APPENDIX: SURVEY QUESTIONS

Q1. What is your gender?

• Male; Female; Prefer not to answer

Q2. What is your age?

• 20-29; 30-39; 40-49; 50+

Q3. What is your job role/function (check all that

apply)?

• Clinical (e.g., physician, nurse, surgeon)

• Administrative (e.g., director, manager, team

leader)

• Technical (e.g., radiologist)

• Support (e.g., educator, trainer, analyst)

• IT (e.g., developer, system engineer)

• Other, please specify ___________

Q4. How long have you been working at your hospital

or organization?

• Less than one year; 1-2 years; 3-5 years; 6-9 years;

10-15 years; 15+ years

Q5. How long have you been working in the

healthcare sector?

• Less than one year; 1-2 years; 3-5 years; 6-9 years;

10-15 years; 15+ years

Q6. Which of the following analytics and machine

learning algorithms are you familiar with (check all

that apply)?

• Regression (linear and logistic)

• Decision tree

• Neural networks

• Support vector machine

• K-nearest neighbor

• Naïve Bayes classifier

• Random Forest

• Other, please specify __________

Q7. Which will be the primary goal you wish to

achieve with the use of AI (select the most applicable

options)?

• To improve patient care quality

• To increase patient satisfaction

• To reduce cost

• To reduce errors

• To improve the performance of my organization

• To conduct research and publish papers

• Other, please specify__________

Q8. In general, what is your opinion about using AI

technology in healthcare (check all that apply)?

• I think that AI has great potential to help improve

my productivity.

• I think that AI has great potential to help improve

healthcare quality and performance.

• I think the use of AI in healthcare may pose a

tremendous amount of risks (e.g., misdiagnosis)

onto patients.

• I think AI is a threat to healthcare professionals' job

opportunity.

• I am concerned that the use of AI may cause

privacy breaches of patient data.

• I think the adoption of AI technology may cause

healthcare costs to increase.

• Other, please specify___________

In this study, we focus on a subset of AI techniques

that use supervised machine learning algorithms to

make decisions (e.g., disease diagnosis, physician

referrals). In other words, the outcomes of the

algorithms are classification labels (e.g., positive vs.

negative). Without special notice, “algorithms” in the

following statements refer to classification algorithms.

Algorithmic Bias from the Perspectives of Healthcare Professionals

Q9. Please rate how much you agree (or disagree)

with each of the following statements about the

sources of algorithmic bias.

• Q9.1 The outcome of an algorithm is likely to be

biased if the training data are biased.

• Q9.2 Algorithms trained on data from one hospital

may not necessarily perform well when used in a

different hospital.

• Q9.3 The larger the training dataset, the less likely

an algorithm is biased.

• Q9.4 Algorithmic bias may result from algorithm

design (e.g., the impurity measure used in decision

trees).

Q10. Please rate how much you agree (or disagree)

with each of the following statements about the

fairness of the outcome produced by algorithms.

• Q10.1 If an algorithm’s outcome is biased against

certain social groups, it is unfair for those groups.

• Q10.2 To mitigate possible algorithmic bias,

individual characteristics (e.g., race, gender, and

age) should be excluded from all healthcare

applications involving machine learning

algorithms.

• Q10.3 Depending on the specific problems under

study, such as those assessing risks for certain

diseases (e.g., diabetes), individual characteristics

(e.g., race, gender, and age) can be used in some

applications involving machine learning

algorithms.

Q11. Now imagine that you need to select algorithms

to assist your decision making in your work. Please

rate how much you agree (or disagree) with each of

the following statements regarding how algorithms

should be selected.

• Q11.1 Algorithms should be assessed based on

multiple criteria such as performance, possible

bias, fairness, etc.

• Q11.2 I would prioritize algorithm performance

and bias differently depending on the specific

situations.

• Q11.3 In high-impact situations (e.g., fatal disease

prediction), algorithmic bias should not be allowed.

• Q11.4 In relatively low-impact situations (e.g.,

patient satisfaction prediction), algorithmic bias

may be tolerated to a certain degree.

Q12. Please rate how much you agree (or disagree)

with each of the following statements regarding

algorithm transparency and explainability.

• Q12.1 Algorithm transparency is important for me

to assess algorithmic bias.

• Q12.2 If two algorithms perform similarly, I prefer

to use algorithms with high transparency (e.g.,

decision tree) over algorithms with low

transparency (e.g., neural networks).

• Q12.3 Algorithmic explainability is important for

me to assess algorithmic bias.

• Q12.4 If two algorithms perform similarly, I prefer

to use algorithms with high explainability (e.g.,

decision tree) over algorithms with low

explainability (e.g., neural networks).

• Q12.5 If one algorithm performs significantly

better than another algorithm, I prefer to use the one

with high performance even if its transparency or

explainability is worse than the other algorithm.

Q13. Please rate how much you agree (disagree) with

each of the following statements about how you treat

algorithmic bias.

• Q13.1 If an algorithm's outcome has errors (e.g.,

false positives and false negatives), I will NOT use

that algorithm to assist my decision making.

• Q13.2 If an algorithm's outcome is biased, I will

NOT use that algorithm to assist my decision

making.

• Q13.3 If an algorithm's outcome leads to unfair

resource allocation among different social groups,

I will NOT use that algorithm to assist my decision

making.

• Q13.4 If the outcome of an algorithm is completely

unbiased but its performance is low, I may still use

the algorithm depending on the problem under

study.

• Q13.5 If the outcome of an algorithm is biased but

its performance is high, I may still use the

algorithm depending on the problem under study.

• Q13.6 Being aware that algorithms may have errors

and bias, I may still use algorithms but will treat the

outcomes only as recommendations and rely on my

own knowledge and experience to make the final

decisions.

HEALTHINF 2025 - 18th International Conference on Health Informatics