Deep Learning and Medical Image Analysis: Epistemology and Ethical
Issues
Francesca Lizzi
1 a
, Alessandra Retico
1 b
and Maria Evelina Fantacci
2 c
1
National Institute for Nuclear Physics, Pisa Division, Pisa, Italy
2
Department of Physics, University of Pisa, Pisa, Italy
Keywords:
Deep Learning, Ethics, Epistemology, Multi Disciplinary Science, Medical Imaging.
Abstract:
Machine and deep learning methods applied to medicine seem to be a promising way to improve the perfor-
mance in solving many issues from the diagnosis of a disease to the prediction of personalized therapies by
analyzing many and diverse types of data. However, developing an algorithm with the aim of applying it in
clinical practice is a complex task which should take into account the context in which the software is devel-
oped and should be used. In the first report of the World Health Organization (WHO) about the ethics and
governance of Artificial Intelligence (AI) for health published in 2021, it has been stated that AI may improve
healthcare and medicine all over the world only if ethics and human rights are a main part of its development.
Involving ethics in technology development means to take into account several issues that should be discussed
also inside the scientific community: the epistemological changes, population stratification issues, the opacity
of deep learning algorithms, data complexity and accessibility, health processes and so on. In this work, some
of the mentioned issues will be discussed in order to open a discussion on whether and how it is possible to
address them.
1 INTRODUCTION
Machine and deep learning methods applied to medi-
cal images are proving to be a promising way to im-
prove the performance in solving many issues: the
diagnosis of a specific disease, the contouring of or-
gans or lesions and the prediction of the prognosis.
Deep learning, in particular, offers the possibility of
analyzing many patients’ data in a reproducible way
and they can be applied to carry out follow up and
radiomic studies. In particular, the advent of Deep
Learning (DL) algorithms in the field of medical im-
age analyses is leading to a change in supporting
physicians in their role. Many different applications
have been explored (Litjens et al., 2017) successfully.
However, developing a DL algorithm with the aim of
applying it in clinical practice is a complex task which
should take into account the context wherein the soft-
ware is developed and should be used. In 2021, the
World Health Organization (WHO) published the first
report about the ethics and governance of Artificial
a
https://orcid.org/0000-0003-0900-0421
b
https://orcid.org/0000-0001-5135-4472
c
https://orcid.org/0000-0003-2130-4372
Intelligence (AI) for health (WHO, 2021). In that re-
port, it has been stated that AI may improve health-
care and medicine all over the world only if ethics
and human rights are a main part of its development.
WHO recognizes that ethical guidance based on the
shared perspectives of the different entities that de-
velop, use or oversee such technologies is critical to
build trust in these technologies, to guard against neg-
ative or erosive effects and to avoid the proliferation
of contradictory guidelines. Involving ethics in tech-
nology development means to take into account sev-
eral issues.
In this work some of the most interesting ones
will be discussed. First, understanding how scien-
tific method is changing should be at least taken into
account and discussed, when developing a medical
software. In fact, most of the ethical issues related
to the application of DL algorithms to clinical prac-
tice are directly connected to the shift of the scien-
tific paradigm brought by the intense use of data and
data mining. Second, the way we collect data and
build data sets is crucial to develop fair AI-based al-
gorithms. This means to correctly perform the popu-
lation sampling in order to prevent social biases and
to preserve technical information to avoid technolog-
172
Lizzi, F., Retico, A. and Fantacci, M.
Deep Learning and Medical Image Analysis: Epistemology and Ethical Issues.
DOI: 10.5220/0011983000003497
In Proceedings of the 3rd International Conference on Image Processing and Vision Engineering (IMPROVE 2023), pages 172-179
ISBN: 978-989-758-642-2; ISSN: 2795-4943
Copyright
c
2023 by SCITEPRESS Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)
ical biases. In fact, the development of AI-based al-
gorithms should pay attention to the process of im-
age production which includes manufacturers, acqui-
sition parameters and also the interactions with physi-
cians. Moreover, since we should always compare
the predicted results with a ground truth, a labeled
data set collection should consider the inter- and intra-
observer variability as well as the risk of containing
a not negligible label noise (see fig. 1). In this con-
Figure 1: An example of label noise: on the left, an original
lung CT scan (case 0053) from the COVID-19 Challenge
data set (An et al., 2020) is shown with a windowing be-
tween -1000 and 300 HU. On the right: the reference label
mask for a COVID-19 lung lesion published within this data
set is shown in red. The label appears in the axial view (top
row) as a perfect circle probably due to the use of a support
software for image labelling by the radiologist.
text, the application of AI to medical images needs a
special care since its wrong use may harm not only
people but also the healthcare systems (Stevens et al.,
2018).
In this work, the first section is dedicated to the
changing scientific paradigm and on how the so-
called Fourth Paradigm affects studies that concern
deep learning and medical images. In the second sec-
tion, the many issues connected to the collection of a
data set are discussed, including the problem of the
ground truth reliability, of the access to data and their
quality, and the problem of improving the clinical out-
comes in the context of a real hospital protocol.
2 THE IMPORTANCE OF
EPISTEMOLOGICAL CLAIMS
IN HEALTHCARE
The concept of ”scientific paradigm” was introduced
by Kuhn in the ”Structure of Scientific Revolutions”
published in 1962 (Kuhn, 1962) and, despite its lim-
its, it is very useful to frame a simplified scheme of
the evolution of the scientific paradigms reported by
(Hey et al., 2021), as shown in (Table 1). According
to Kitchin (Kitchin, 2014), we can summarize the idea
of the paradigm according to Kuhn as an accepted
way of interrogating the world and produce knowl-
edge which is common to a substantial proportion of
researcher in a discipline at any one moment in time.
According to Hey (Hey et al., 2021), epistemology
is moving towards a new paradigm called the ”fourth
paradigm” or ”exploration science”. In this evolu-
tion, some fundamental rules of traditional science
are deeply changing and they should be taken into ac-
count since they are useful to establish the limits and
the possibilities of these new rising methods. The as-
sumptions that are made during the development of an
algorithm are critical to define the model itself and the
boundaries in which it should be applied. According
to Gray (Hey et al., 2021), there are two main ways
to frame the fourth paradigm: the first way, typical of
industry, is a pure empiricism wherein machine learn-
ing techniques can reveal the inherent truth of data,
while the second one looks at the fourth paradigm as a
new extension of the established scientific paradigm.
The fourth paradigm, as pure inductive empiricism
brought by the use of Big Data and also by the ris-
ing of the deep learning methodologies, has the po-
tential to undermine the scientific legitimacy of the
machine learning (Enni and Herrie, 2021). As an ex-
ample, Campolo and Crawford (Campolo and Craw-
ford, 2020) use the Enchantment theory of Max We-
ber to describe a broader epistemological diagnosis
of modernity. They affirm that not understanding the
motivation that leads a deep learning based model to a
decision could produce the effect of considering that
algorithm as something magical. These considera-
tions do not come only from humanities but also from
“hard” sciences. Stuart J. Russell, a well-known pro-
fessor of computer science from Berkeley University,
in 2018, spoke about deep learning and described it as
“a kind of magic” since we cannot understand when
and why the deep learning hypothesis is consistent.
For this reason, it is interesting to discuss the pro-
cess of knowledge generation, evidence and causa-
tion in particular in the healthcare domain. In Stevens
at al. (Stevens et al., 2018), a critically and health-
care centered review of epistemological claims is pre-
sented. The healthcare field is characterized by an
institutionalized set of epistemological principles and
generally accepted scientific methodologies (Stevens
et al., 2018) (Beltran, 2005) which are challenged by
the deep learning practices. The language used to de-
scribe the applications of algorithm in healthcare can
be an interesting way of analyzing whether there are
Deep Learning and Medical Image Analysis: Epistemology and Ethical Issues
173
Table 1: The evolution of scientific paradigm according to Hey et al. (Hey et al., 2021).
Paradigm Method Dating
Experimental Science Observation of natural phenomena Pre-Renaissance
Theoretical Science Using models, generalization Pre-Computers
Computational Science Simulations Pre-Big Data
Exploratory Science Data mining Now
different ways of using data in this specific domain.
Stevens et al., studying systematically the editorials
on the use of Big Data practices in healthcare, de-
scribes five ideal typical discourses, naming them us-
ing the relations between implicit assumptions about
evidence and knowledge and the diverse epistemolog-
ical positions. The five categories they design are: the
modernist, the instrumentalist, the pragmatist, the sci-
entist and the critical-interpretative. Despite the de-
tails of each discourse, there is a significant differ-
ence between the firsts four and the last one about
the conceptualization of data: the former consider
data inherently truthful and meaningful (natural and
pre-existing), while the latter consider data as con-
structed and hence they necessarily emphasize some
aspects and leave out others. While a simple positivis-
tic, hypothesis-and-theory free and purely empiricist
approach seems to be a way of making the use of data
simpler in the clinical practice, it is a trap. With re-
gard to the field of medical image analysis and con-
sequently the radiological medical domain, we know
that data are not given, natural or pre-existing. Medi-
cal images are the results of:
1. A traditional scientific process: their production
is based on physical studies on the interaction be-
tween matter and radiation, human body and ra-
diation, on the physical processes of X-ray, mag-
netic fields and ultrasounds production as regards
radiology, and radioactivity and all the issues
linked to it as regard nuclear imaging;
2. A technological development history: the im-
age production deals with the detectors improve-
ments, the materials used for detecting photons,
the electronics which, simplifying, determine for
example the spatial resolution, the contrast and
other image quality characteristics. When we
deal with 3D images such as Computed Tomog-
raphy (CT) scans we should consider the image
reconstruction algorithms which are a mix of tra-
ditional scientific research, especially mathemati-
cal one (for example Radon transform or Fourier
signal analysis) and pure technological improve-
ments such as the sliding contacts.
3. An industrial process: medical imaging systems
are not equally distributed around the world and
their production is highly costly producing as a
result that there are few vendors that deals with the
imaging machinery market. Moreover, as a result
of an industrial process, some parts of the medical
images production are protected by patents which
inhibits the complete knowledge on how an image
is produced;
4. A function-based process: medical images are
made on the basis of their utility and improved
following their possible uses in hospitals. The
choice of using a specific imaging modality de-
pends on the scope (morphology or functionality
and diagnosis, follow up, radiotherapy planning,
...) and on the part of the body that needs to be
imaged. They are made to be presented to physi-
cians in a way that medical doctors can interpret
and taking into account the specific medical for-
mation process they attended. Moreover, contrar-
ily to natural images, most of medical imaging
modality implies the delivering of a radiation dose
to the patients, making their use a dynamic equi-
librium between costs and risks, in terms of capi-
tal and health, and benefits.
For all these reasons, it is unacceptable to consider
medical images data as pre-existing or natural. More-
over, applying Big Data techniques, such as machine
and deep learning, cannot be considered as a free hy-
pothesis science. Even if the hypothesis is a complex
hypothesis and it is very far from the pre-Renaissance
way of formulating it, we are always assuming that,
given the constructed data and the context, there is
a model which may solve the given task we want to
study. This means that we are using data that con-
tain already the solution. Characterizing the machine
and deep learning techniques as comprehensive and
intrinsically unbiased can be misleading rather than
helpful in shaping scientific as well as public percep-
tions of the features, opportunities and dangers asso-
ciated with data-intensive research (Leonelli, 2014).
Finally medical images can rarely be considered as
Big Data and hence the application of techniques de-
veloped on them should be even more careful as re-
gards, for example, the generalization goal. What is
at stake is our ability to produce knowledge not only
in a traditional scientific way but making it from a
critical position, avoiding the accusation of practising
”magic” or ”alchemy”. It deals with knowing and as-
suming how much complex is to create algorithms,
IMPROVE 2023 - 3rd International Conference on Image Processing and Vision Engineering
174
especially deep learning one, with the scope of ap-
plying them in hospitals and, by assuming it, proceed
towards a fair, scientific, active and impacting appli-
cation of machine and deep learning to medicine.
3 BUILDING MEDICAL IMAGE
DATA SETS
As discussed in the previous section, medical data sets
of images cannot be considered inherently truthful as
natural resources. It is interesting to discuss the issues
that should be considered in a data set collection that
demonstrates the thesis of the first section. In particu-
lar, the ground truth production and accessibility and
quality of medical images data sets will be considered
in the following.
3.1 Ground Truth Reliability
The ground truth on medical images is usually made
by medical doctors opinions or by a consensus among
them. For this reason, it always suffers from a certain
grade of variability which should be kept under con-
trol (Bridge et al., 2016) (Renard et al., 2020). The
use of a peculiar imaging modality and of a specific
imaging system may affect the capability of having a
reliable ground truth and the aggregation of data com-
ing from different sources is a challenge that still need
to be addressed. In fact, publicly available data sets of
medical images usually are small data sets that need to
be aggregated in order to obtain a set with a sufficient
number of samples to train a deep learning algorithm
(Lizzi et al., 2021). Even if it is possible to collect
private image data sets from hospitals, the process is
very time consuming for both the collection and the
labeling. Moreover, the publication of such data sets
may not be possible, thus reducing the chance of re-
producing results obtained by other studies. Publish-
ing the data is not easy because database maintenance
is expensive and the privacy of patients has to be man-
aged rightfully. The ground truth usually depends on
the task we want to solve and its creation is a pivotal
step for algorithm development. If the task we want to
solve is a classification task, the ground truth consists
in assigning a class to each image or patient in the
data set. A patient image taken at different time points
may belong to a different class because human body
changes over time. The way the classes are defined is
mainly based on medical protocols but medical pro-
tocols do not guarantee the variability delete. Even if
this labelling process seems to be very fast, when a
huge amount of labelled data is required, the process
is very time consuming for doctors. Another way for
labelling medical images is to assign to each pixel or
each voxel a certain class. This kind of labelling is
suited for solving segmentation problems. A medi-
cal image usually contains many pixel and voxel and
this characteristic makes the labelling extremely time
consuming. If we suppose to have a standard lung
CT scan with size 512x512x100 the number of ele-
ments to be labelled is more than 26 millions! There
are some tools to help physicians in this task but they
may introduce a bias in the labelling. In order to re-
duce the cost of labelling, the use of non-expert peo-
ple has been employed in the field of natural images
(Kuznetsova et al., 2020), but the use of such kind
of labelling process leads to highly noise data sets.
In the medical images domain, in which the objects
to be identified are usually small and are difficult to
be identified, this process is even harder. Having the
availability of large labeled data sets of medical im-
ages is currently a real challenge even despite the la-
belling process. Medical images data sets, in fact, are
usually small and their collection is not easy because
of privacy issues and institutional policies.
3.2 Data Accessibility and Quality
Data may come from public or private collections.
Both of these two modalities have weaknesses and
strengths which are going to be discussed in the fol-
lowing. First of all, public data may be effectively
public, i.e. accessible to every one, or they may be
accessed through a specific agreement. Private data
are instead data which can not be used or accessed
in any case. Data may be not accessible for many
reasons. One of the most problematic is related to
privacy. In order to better understand the risks of pub-
lishing data, it is interesting to discuss the most used
image formats. This is important because medical im-
age formats usually contain a header with patient and
physician information. There are mainly two data
formats typically used for medical images and they
are the Neuroimaging Informatics Technology Initia-
tive (NIfTI) and the Digital Imaging and Commu-
nications in Medicine (DICOM) (Standard DICOM,
2021). The NIfTI format was created in the field of
neuroimaging and it is a standard which contains a
header with only information about orientation, voxel
size and image visualization. 3D images, for example
CT scans or MRI scans, can be stored in this format
which defines uniquely the correct orientation and the
physical volumes. In Figure 2, an example of a NIfTI
header of a CT scan is reported.
The Digital Imaging and Communications in
Medicine (DICOM) standard (Standard DICOM,
2021) is the global convention used by manufacturers
Deep Learning and Medical Image Analysis: Epistemology and Ethical Issues
175
Figure 2: An example of a whole NIfTI header. It can be
noticed that this header includes only information about the
voxel size and the image orientation.
to define and store diagnostic imaging data. DICOM
images are encoded as a set of elements; public ele-
ments are defined by the DICOM standard, and pri-
vate elements are defined on an individual basis by
each manufacturer. A DICOM data element or at-
tribute is made of 3:
a tag that identifies the attribute, usually in the for-
mat (XXXX, YYYY) with hexadecimal numbers,
and may be divided further into DICOM Group
Number and DICOM Element Number;
a DICOM Value Representation (VR) that de-
scribes the data type and format of the attribute
value.
The fields of the DICOM header contain many
information from the patient ID, which is a number
that uniquely identifies the patient, the Patient’s Birth
Name (0010,1005), the Patient’s Age (0010,1010),
the Patient’s Size (0010,1020), the Patient’s Address
(0010,1040) or even the Patient’s Mother’s Birth
Name (0010,1060). All these data are a problem when
we deal with privacy because they may allow a com-
plete re-identification of subjects. On the other hand
the DICOM format contains also information on the
acquisition parameters such as the reconstruction ker-
nel, the imaging system used, exposure time, X-ray
tube current, the field of view (FOV) size or the re-
constructed FOV.
These characteristics are less prone to be problem-
atic with regard to privacy and they are very useful for
algorithms development. However, in most of pub-
lished medical images data all this information is lost.
This is mainly due to the fact that it is not so easy to
Figure 3: An example of a part of a DICOM header. Con-
trarily to the NIfTI one, the DICOM header contains infor-
mation about the patient, the instrumentation, the acquisi-
tion protocol and the clinical personnel.
treat privacy and DICOM standard, since the number
of tags that may be contained is very large. Moreover,
making studies on humans implies not only privacy
related issues but ethical issues too. For these rea-
sons accessing to Italian hospital data requires a strict
protocol to be carried out. The image instrumentation
manufacturers use private elements to encode acqui-
sition parameters that are not yet defined by the DI-
COM standard or that they consider proprietary. They
also define and include private elements that contain
Protected Health Information (PHI). These PHI pri-
vate elements can be as obvious as the name of a pa-
tient and as subtle as an identifier string that could be
tracked back to a patient by someone with access to
the departmental image archive. A DICOM confor-
mance statement is a document published by a manu-
facturer that contains technical information concern-
ing data exchange with a specific type of device (e.g.
an imaging unit, workstation, printer, image archive).
The conformance statement provides the mechanism
for a manufacturer to publish the set of private ele-
ments that are stored in the DICOM files created by
an imaging system. Manufacturers do not document
and publish all of their private elements. For these
reasons, the de-identification process should meet two
conflicting requirements: (i) any PHI must not be in-
cluded in exported data and (ii) the system must retain
all data that describe the acquisition, such as physical
parameters for individual images, as well as other pa-
rameters such as series description. De-identifying a
IMPROVE 2023 - 3rd International Conference on Image Processing and Vision Engineering
176
DICOM image is a challenging task that carries the
risk of leaving in the header PHI or meta-data that
makes the re-identification possible. On the other
hand, the NIfTI format has been invented to have not
patient information in the header but it does not allow
to store important technical parameters. It could be
interesting to study a new image format standard suit-
able for AI and deep learning algorithms which con-
tains all the technical information while keeping the
privacy risk as lower as possible. The availability of
acquisition parameters is crucial to have a high quality
data set and the research of a good trade-off between
accessibility and quality is an urgent challenge to be
tackled.
3.3 How to Gain Impact?
Over the last 10 years, publications on AI in radiology
have increased from 100–150 per year to 700–800
per year (Pesapane et al., 2018) and the interest in
the medicine field is continuously increasing. AI and
deep learning studies mainly focus their scopes on in-
creasing the accuracy of diagnosis when compared to
the physicians performances.
However high accuracy does not necessarily mean
that an AI algorithm improves clinical outcomes. It
is, in fact, important to assess whether its use in clin-
ical practice can be integrated in the hospital work-
flow and how much the impact is, not only on the
outcomes, but also on the physicians’ training. In or-
der to perform this kind of analysis, a clinical trial
is needed and clinical trial studies are usually time
consuming and expensive. It is pivotal to question
which could be the role of AI in the medical and clin-
ical workflow, especially in the radiology field which
seems to be the most explored field of medicine. It
is also interesting to discuss the role of a radiologist
in the hospital workflow and whether they can be re-
placed by an artificial intelligence or be supported
by it. In Pesapane et al. (Pesapane et al., 2018), a
group of radiologists reflects on what it means to let
an AI make a diagnosis and what are the differences
between the human evaluation and the AI one.
AI and especially deep learning functioning in ra-
diology is based on a principle that is very similar to
the clinical one: “the more images you see, the more
examinations you report, the better you get” and this
may be the reason why AI is successfully applied to
radiology. Since the comparison between the radiol-
ogist’s and AI performance depends on the radiolo-
gist experience and also on the quality of the devel-
oped AI, it is not straightforward to state whether and
when one is better than the other. When image anal-
ysis takes too much time with respect to the neces-
sity of the patient, i.e. a very urgent clinical evalua-
tion is necessary, AI may be very helpful in a hospital
workflow. As an example, in this study (Kim et al.,
2021), the application of a deep learning-based assis-
tive technology in the Emergency Department (ED)
context has been studied on Chest Radiographs (CRs).
CR interpretation is a difficult task that requires both
experience and expertise because various anatomical
structures tend to overlap when captured on a single
two-dimensional image, different diseases may have
a similar presentation and specific diseases may be
present with different characteristics. For these rea-
sons, the CR interpretation suffers from a significant
possibility of misinterpretation, reaching the 22% ac-
cording to (Donald and Barnard, 2012). ED physi-
cians perform worse than trained radiologists in read-
ing images. However, radiologists may not be avail-
able, especially during nights and weekend and CR
interpretation in the ED settings is given to ED physi-
cians. For all these reasons, Kim et al. (Kim et al.,
2021) studied whether an ED physician supported by
a deep-learning based algorithm for CR interpretation
performs better than the single ED physician. They
found that ED departments may benefit from the use
of AI even if this experiment needs at least an exter-
nal validation study. This is an example that shows
clearly how much it is important to know the health-
care domain and practice in order to structure a deep
learning experiment.
Despite the improvements deep learning may pro-
duce to healthcare, another pivotal question concerns
the problem of accountability. When an AI is used to
make a decision in clinical practice, it is not trivial to
understand who is responsible for the diagnosis. In
this work (Neri et al., 2020), a radiologist supported
by an AI is depicted as responsible for the diagnosis
if they are trained on the use of AI since they are re-
sponsible for the actions of machines. Moreover, it
is necessary to deepen the research field of explain-
ability in order to let the radiologists understand the
AI behaviour. Furthermore, the use of AI may bias
the radiologist decisions. Lastly, even the public dis-
cussion on the introduction of AI systems as possible
substitutes of the physicians themselves can be dan-
gerous and produce a paradox effect: since radiolo-
gists are going to be replaced by AI, there will be a
lack of motivation for young doctors to pursue a ca-
reer in radiology.
For all these reasons, building and even bringing
in the public debate deep learning models to be ap-
plied to radiology is a delicate task and claiming clin-
ical advancements without clinical trials is unfair. As
described above, medicine and healthcare fields are
complex domains and the advancement that comes
Deep Learning and Medical Image Analysis: Epistemology and Ethical Issues
177
from the introduction of deep-learning based tech-
nologies should be carefully evaluate inside that con-
text to evaluate the real impact. Including information
on context at the beginning of the development of a
DL algorithm could help to gain clinical impact.
4 CONCLUSIONS AND
DISCUSSION
Some of the many aspects that deal with the creation
of a deep learning algorithm applied to medicine and,
specifically, to imaging have been discussed. The dif-
ficulty of taking into account all the issues is clear and
they relate to many fields of knowledge.
In section 2, the changing scientific paradigm
has been discussed. How researchers pose their re-
search questions and which epistemological assump-
tions they embrace are fundamental to understand the
kind of research they are doing. This process cannot
be done without looking at the social processes that
leads to the data collection and the data generation.
In section 3.2, the process of collecting a data set
and subsequently building a ground truth on medi-
cal images is discussed with potentialities and limi-
tations. Typically, the ground truth on medical im-
ages is made by the physician opinion or by a consen-
sus among many medical doctors. When made with
the second modality, the ground truth always suffers
from a variability that is difficult to be erased. The
quality of an algorithm strictly depends on the qual-
ity of the ground truth but having a large number of
physicians is economically expensive and requires a
high grade of coordination and collaboration among
research and health institutes. The quality of the al-
gorithms depends also on the quality of data that can
be private or public. Public data guarantees the possi-
bility of testing different algorithms on the same data
set but, in order to make them publishable, important
information on, for example, acquisition protocols or
scanners, may be lost. Private data has the advantage
to be designed for the specific experiment and taken
following inclusion criteria decided by the collector.
When released, this kind of data can be designed to
contain the information on acquisition that could be
useful and meaningful for the analysis. In any case,
medical images data are scarce and they may lack of
label quality. This issue is one of the most limiting in
deep learning algorithm development. Lastly, when
an algorithm is developed to be used in clinical prac-
tice, it has to be validated not only statistically but
also clinically. The performances on a test set are
not sufficient to claim for clinical advancements since
it belongs to the same data set used for the training
and the validation. The algorithm, in fact, needs to
be tested also on at least an independent external data
set to evaluate its generalization capability. The ex-
ternal data should be taken from another medical cen-
ter and should contain the information on acquisition
and scanners in order to make possible the analysis
of the image characteristics that may confuse the al-
gorithm. This process can be done in two modali-
ties, case-control and clinical trial studies, and both
of them suffer from the issues to correctly represent
the population.
Once the algorithm has been externally validated,
it should be integrated in the hospital workflow and
its performance should be evaluated also in this con-
text. It has been established that the capacity of an
algorithm to outperform a physician is strictly con-
nected to the experience of the physician to solve that
specific task. For this reason, there exist situations in
which the application of an algorithm may be really
helpful to both increase performance and save time.
In this context, it is interesting to question who is re-
sponsible for the diagnosis when an algorithm is used
to support physicians or directly to diagnose a certain
disease. In order to solve this issue, we need juridical
instruments that helps the application of algorithms in
clinical practice. Building responsibility means also
to train physicians to the use of AI in order to make
them mindful of its use and to produce an informed
consent that patients can really understand.
All these issues suggest that the development of
an algorithm for clinical applications need a deep
and widespread knowledge in all of the cited fields:
medicine, radiology, healthcare processes, laws, com-
puter science, computer engineering, physics, social
sciences, philosophy and so on. We believe that the
future research on the mentioned issues should pro-
ceed along two paths: on one hand, we need to think
and implement controlled experiments in order to sys-
tematically understand, for example, how the physical
parameters of images, such as kVp or reconstruction
algorithms, affects the feature learning of DL mod-
els; on the other hand, we should study how to build
high quality shareable inclusive privacy-preserving
data sets, not only as a benchmark for performance
comparison but also for the whole algorithm develop-
ment. This could be done, for example, by creating a
new medical image standard that considers the use of
data for scientific analysis and purpose as an intrinsic
property of the standard itself.
What is at stake is to develop a high-performance,
inclusive and trustworthy AI.
IMPROVE 2023 - 3rd International Conference on Image Processing and Vision Engineering
178
REFERENCES
An, P., Xu, S., Harmon, S. A., Turkbey, E. B., Sanford,
T. H., Amalou, A., Kassin, M., Varble, N., Blain,
M., Anderson, V., Patella, F., Carrafiello, G., Turkbey,
B. T., and Wood, B. J. (2020). CT Images in COVID-
19.
Beltran, R. A. (2005). The Gold Standard: The Challenge
of Evidence-Based Medicine and Standardization in
Health Care. Journal of the National Medical Associ-
ation, 97(1):110.
Bridge, P., Fielding, A., Rowntree, P., and Pullar, A.
(2016). Intraobserver Variability: Should We Worry?
Journal of Medical Imaging and Radiation Sciences,
47(3):217–220.
Campolo, A. and Crawford, K. (2020). Enchanted Deter-
minism: Power without Responsibility in Artificial In-
telligence. Engaging Science, Technology, and Soci-
ety, 6:1–19.
Donald, J. J. and Barnard, S. A. (2012). Common patterns
in 558 diagnostic radiology errors. Journal of Medical
Imaging and Radiation Oncology, 56(2):173–178.
Enni, S. A. and Herrie, M. B. (2021). Turning biases into
hypotheses through method: A logic of scientific dis-
covery for machine learning. Big Data and Society,
8(1).
Hey, T., Tansley, S., and Tolle, K. M. (2021). Fourth
Paradigm.
Kim, J. H., Han, S. G., Cho, A., Shin, H. J., and Baek,
S.-E. (2021). Effect of deep learning-based assistive
technology use on chest radiograph interpretation by
emergency department physicians: a prospective in-
terventional simulation-based study. BMC Medical
Informatics and Decision Making, 21(1):1–9.
Kitchin, R. (2014). Big Data, new epistemologies and
paradigm shifts. Big Data and Society, 1(1):1–12.
Kuhn, T. S. (1962). The structure of Scientific Revolution,
volume I,II.
Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin,
I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M.,
Kolesnikov, A., Duerig, T., and Ferrari, V. (2020). The
Open Images Dataset V4: Unified Image Classifica-
tion, Object Detection, and Visual Relationship De-
tection at Scale. International Journal of Computer
Vision, 128(7):1956–1981.
Leonelli, S. (2014). What difference does quantity make?
On the epistemology of Big Data in biology. Big Data
and Society, 1(1):1–11.
Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A.,
Ciompi, F., Ghafoorian, M., van der Laak, J. A., van
Ginneken, B., and S
´
anchez, C. I. (2017). A survey
on deep learning in medical image analysis. Medical
Image Analysis, 42(December 2012):60–88.
Lizzi, F., Brero, F., Cabini, R. F., Fantacci, M. E., Piffer, S.,
Postuma, I., Rinaldi, L., and Retico, A. (2021). Mak-
ing data big for a deep-learning analysis: Aggrega-
tion of public COVID-19 datasets of lung computed
tomography scans. Proceedings of the 10th Interna-
tional Conference on Data Science, Technology and
Applications, DATA 2021, (Data):316–321.
Neri, E., Coppola, F., Miele, V., Bibbolino, C., and Grassi,
R. (2020). Artificial intelligence: Who is responsible
for the diagnosis? Radiologia Medica, 125(6):517–
521.
Pesapane, F., Codari, M., and Sardanelli, F. (2018). Artifi-
cial intelligence in medical imaging: threat or oppor-
tunity? Radiologists again at the forefront of innova-
tion in medicine. European Radiology Experimental,
2(1).
Renard, F., Guedria, S., Palma, N. D., and Vuillerme, N.
(2020). Variability and reproducibility in deep learn-
ing for medical image segmentation. Scientific Re-
ports, 10(1):1–16.
Standard DICOM (2021). DICOM standard.
Stevens, M., Wehrens, R., and de Bont, A. (2018). Con-
ceptualizations of Big Data and their epistemological
claims in healthcare: A discourse analysis. Big Data
and Society, 5(2):1–21.
WHO (2021). Ethics and Governance of Artificial Intelli-
gence for Health Ethics and Governance of Artificial
Intelligence for Health 2.
Deep Learning and Medical Image Analysis: Epistemology and Ethical Issues
179