Mapping Cost-Sensitive Learning for Imbalanced Medical Data:
Research Trends and Applications
Imane Araf
1a
, Ali Idri
1,2 b
and Ikram Chairi
1c
1
Mohammed VI Polytechnic University, Ben Guerir, Morocco
2
Mohammed V University, Rabat, Morocco
Keywords: Machine Learning, Data Imbalance, Cost-Sensitive Learning, Medical Data, Systematic Mapping Study.
Abstract: Incorporating Machine Learning (ML) in medicine has opened up new avenues for leveraging complex
medical data to enhance patient outcomes and advance the field. However, the imbalanced nature of medical
data poses a significant challenge, resulting in biased ML models that perform poorly on the minority class
of interest. To address this issue, researchers have proposed various approaches, among which Cost-Sensitive
Learning (CSL) stands out as a promising technique to improve the accuracy of ML models. To the best of
our knowledge, this paper presents the first systematic mapping study on CSL for imbalanced medical data.
To comprehensively investigate the scope of existing literature, papers published from January 2010 to
December 2022 and sourced from five major digital libraries were thoroughly explored. A total of 173 papers
were selected and analyzed according to three classification criteria: publication years, channels and sources;
medical disciplines; and CSL approaches. This study provides a valuable resource for researchers seeking to
explore the current state of research and advance the application of CSL for imbalanced data in medicine.
1 INTRODUCTION
Medicine is a dynamic and intricate field that has
witnessed remarkable progress in recent decades,
attributed to the advances in technology and medical
imaging (Johnson et al., 2018). These developments
have endowed healthcare providers with powerful
tools, improving patient outcomes and extending life
expectancies. Nevertheless, with the escalating
complexity and abundance of medical data, medical
practitioners now face new challenges in accurately
diagnosing and treating patients.
To address these challenges, Machine Learning
(ML), a branch of artificial intelligence, has emerged
as a promising solution in recent years. ML
techniques enable the analysis of massive amounts of
data, recognizing patterns, and predicting outcomes.
Consequently, it has opened up new perspectives into
the fundamental disease mechanisms, ultimately
facilitating improved healthcare delivery systems and
more effective treatments and therapies.
Additionally, ML harbours a tremendous potential to
a
https://orcid.org/0000-0001-7278-6848
b
https://orcid.org/0000-0002-4586-4158
c
https://orcid.org/0000-0001-9175-0074
transform medical research, potentially unlocking
novel discoveries and revolutionizing the field.
However, medical data is often imbalanced,
meaning one class is underrepresented compared to
the other. For instance, in cancer screening, the
number of patients with cancer is typically much
smaller than that of healthy patients. This data
imbalance can lead to biased ML models that perform
poorly on the minority class, which is more often than
not the class of interest. ML researchers proposed
various approaches to address this issue, including
resampling (Khushi et al., 2021) and Cost-Sensitive
Learning (CSL) (Elkan, 2001).
Resampling techniques aim to balance the data
either by oversampling the minority class or
undersampling the majority class. While resampling
can enhance models' performance, it may result in
overfitting or information loss (Hu et al., 2021). On
the other hand, CSL tackles the class imbalance
problem without any data modifications by assigning
different misclassification costs to each class. In
particular, cost-sensitive methods assign higher costs
Araf, I., Idri, A. and Chairi, I.
Mapping Cost-Sensitive Learning for Imbalanced Medical Data: Research Trends and Applications.
DOI: 10.5220/0012176000003598
In Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2023) - Volume 1: KDIR, pages 265-272
ISBN: 978-989-758-671-2; ISSN: 2184-3228
Copyright © 2023 by SCITEPRESS Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)
265
for misclassifying examples of the minority class and
seek to minimize the high-cost errors (López,
Fernández, García, Palade, & Herrera, 2013). This
approach is advantageous in many real-world
applications, including medical ones, where certain
misclassifications can have more severe
consequences (Sterner, Goretzko, & Pargent, 2021).
For example, misclassifying a patient with cancer as
healthy is more detrimental than the opposite, as it
can delay treatment and lead to further complications.
The misclassification costs are often specified as cost
matrices, which can be expert-defined or estimated
from training data (Fernández et al., 2018).
CSL techniques can be broadly classified into
direct approaches and meta-learning approaches
(Fernández et al., 2018; Liu et al., 2021). The former
modify the learning algorithms by incorporating
misclassification costs during the model training
phase (Fernández et al., 2018). Conversely, the latter
do not alter the learning algorithms per se (Liu et al.,
2021). Instead, meta-learning approaches adjust the
training data (preprocessing) or the model's outputs
(postprocessing) to ensure cost sensitivity. Popular
preprocessing techniques include instance weighting
based on a cost matrix and MetaCost (Fernández et
al., 2018), which relabels the training data according
to misclassification costs. Postprocessing techniques,
meanwhile, often involve adjusting the decision
thresholds based on the pre-defined costs (Fernández
et al., 2018; Liu et al., 2021).
Despite the potential of CSL in medical research,
existing reviews on the topic (Freitas, Brazdil, &
Costa-Pereira, 2009; Sterner et al., 2021) suffer from
limitations, including a lack of systematic approach,
limited scope or outdatedness. As such, a Systematic
Mapping Study (SMS) was conducted to address CSL
for imbalanced medical data, which, to the best of our
knowledge, is the first of its kind. The contributions
of this paper are two-fold. Firstly, a systematic and
comprehensive overview of the current state of
research on CSL for imbalanced data in the medical
field is presented. Secondly, the existing literature's
strengths and limitations are critically evaluated, and
potential future research directions are suggested. To
comprehensively investigate the scope of existing
literature, materials from January 2010 to December
2022 were extensively explored. The materials were
sourced from five major digital libraries: PubMed,
ScienceDirect, IEEE Xplore, SpringerLink, and
Google Scholar. The 173 selected papers were
subsequently analyzed to answer three Mapping
Questions (MQs): (i) publication years, channels and
sources, (ii) medical disciplines, and (iii) CSL
approaches.
The remainder of this paper is structured as
follows. Section 2 details the research methodology.
Section 3 reports the results of this study and provides
an in-depth discussion of the findings, highlighting
trends, strengths and gaps in the existing literature.
Finally, Section 4 concludes the paper by
summarising the main findings and outlining future
work.
2 METHODOLOGY
An SMS systematically categorizes and classifies
existing research in a particular field and often gives
a visual summary of its results (Petersen, Feldt,
Mujtaba, & Mattsson, 2008). It aims to determine the
scope and extent of existing research on a topic,
identify gaps and trends, and provide a foundation for
future research. The present study follows the
mapping process proposed by Peterson, Vakkalanka,
and Kuzniarz (2015). This process covers: (i) clearly
defining the research questions, (ii) developing a
comprehensive search strategy to identify relevant
papers, (iii) screening the identified papers based on
inclusion and exclusion criteria, (iv) designing a
classification scheme, and (v) data extraction and
analysis, resulting in a systematic map.
2.1 Mapping Questions
This study aims to provide an overview and a
structured understanding of the existing literature on
using CSL for imbalanced medical data by addressing
three MQs:
MQ1: In which years, publication channels and
sources were the selected papers published?
MQ2: In which disciplines of medicine was CSL
mainly employed?
MQ3: Which CSL approaches were most
frequently used in medicine?
2.2 Search Strategy
The search is conducted in five digital libraries:
PubMed, ScienceDirect, IEEE Xplore, SpringerLink
and Google Scholar from January 2010 until
December 2022. These libraries were chosen based
on their extensive coverage of peer-reviewed
publications in medicine and health sciences, as well
as computer science and engineering.
The search string was formulated based on the
principal terms from the MQs, as well as the PICO
(Population, Intervention, Comparison and
KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval
266
Outcomes) framework (Kitchenham & Charters,
2007). Note that the third and fourth letters of PICO
were not included in the search string formulation
since neither empirical comparison nor measurable
outcomes were considered in this study. Additionally,
the search string was expanded to include alternative
spellings and synonyms of the derived terms to ensure
a comprehensive search.
The main search terms were initially linked with
their substitutes using the Boolean operator "OR" and
were joined using "AND" afterwards. The complete
search string was defined as follows:
(Health* OR Medic* OR Disease OR Clinic*) AND
("Machine Learning" OR "Deep Learning" OR
Intelligen* OR Classif* OR Predict* OR Diagnos*
OR Prognos*) AND (Technique OR Method OR
Tool OR Model OR Algorithm OR Approach OR
Framework) AND ("Cost sensitive" OR Cost-
sensitive OR "weighted cost function" OR "weighted
loss function" OR "class weighting" OR re-
weighting) AND (Imbalance* OR unbalance* OR
"skewed class distribution" OR under-represented
OR "majority class" OR "minority class").
2.3 Study Selection
The Inclusion Criteria (IC) and Exclusion Criteria
(EC) used to identify the relevant papers are
presented below.
IC1: Studies developing new or using existing
cost-sensitive techniques in medicine.
IC2: Papers focusing mainly on cost-sensitive
models in medicine, whether or not comparing them
to other balancing techniques.
IC3: Papers presenting fair comparisons of
several balancing techniques in medicine, including
cost-sensitive methods.
IC4: Papers presenting comparisons between
CSL methods in medicine without proposing any
newly developed techniques.
IC5: Papers providing an overview of studies
investigating cost-sensitive methods in medicine.
IC6: Papers combining cost-sensitive methods
with other balancing techniques in medicine.
EC1: Papers published earlier than January 2010
or later than December 2022.
EC2: Papers using several datasets from multiple
areas with a mere presence of medical ones.
EC3: Papers using cost-sensitive techniques in
public health, biology, pharmacology or genomics.
EC4: Papers available as abstracts, posters, book
chapters, or presentations.
EC5: Non-peer-reviewed papers.
EC6: Duplicate publications of the same study.
EC7: Studies published in languages other than
English.
EC8: Short papers.
EC9: Papers for which the full texts are not
available.
The suitability of a study for inclusion was
determined by examining its title, abstract, and
keywords. All the articles were further screened by
reviewing their introduction, discussion, and
conclusion sections. Full-text reading was conducted
in case of doubt. Initially, one author examined the
papers, and the remaining authors subsequently
evaluated the final selection.
Furthermore, each paper was evaluated by two
authors based on a set of Quality Assessment (QA)
criteria to ensure that the selected studies are of
sufficient quality and provide reliable and valid
evidence to address the MQs. The criteria included
clear empirical results, justified empirical design,
performance evaluation, comparison with other
methods, explicit presentation of benefits and
limitations, and publication in a recognized source.
2.4 Data Extraction Strategy and
Synthesis
During this phase, a data extraction form was used for
each selected paper to answer the MQs.
MQ1: Publication years, channels (journal,
conference or workshop), and sources were extracted
to address this question.
MQ2: Each paper was examined to determine its
specific medical focus, encompassing disciplines
such as oncology, cardiology, ophthalmology, and
others, as detailed exhaustively in ("Specialty Profiles
| Careers in Medicine," 2023).
MQ3: The proposed cost-sensitive methods in the
selected studies were identified. These methods can
be classified as either direct or meta-learning
approaches. The latter could further be classified as
preprocessing or postprocessing methods (Fernández
et al., 2018).
3 RESULTS AND DISCUSSION
This section provides an overview of the study
selection. It also presents and discusses the mapping
results according to the proposed MQs.
3.1 Study Selection
Figure 1 displays the number of articles at each stage
of the selection process. Initially, 49325 candidate
Mapping Cost-Sensitive Learning for Imbalanced Medical Data: Research Trends and Applications
267
papers were identified, from which 49124 studies
were discarded according to the IC and EC.
28 studies that did not fulfil the QA criteria were
later excluded. Eventually, 173 papers were retained
to answer the MQs. Given space limitations, the list
of selected papers and their extracted data can be
obtained through an email request to the authors.
3.2 MQ1: In Which Years, Publication
Channels and Sources Were the
Selected Papers Published?
Figure 2 shows the number of selected studies per
publication channel from January 2010 to December
2022. Three main channels were identified: journals,
conferences and workshops. Out of the 173 selected
studies, the majority, precisely 69.9% (121 papers),
were published in journals, 27.2% (47 papers) were
published in conference proceedings, and only 2.9%
(five papers) were published in workshops. Table 1
outlines the publication sources that have published
more than two papers.
Figure 1: Selection process.
Figure 2: Distribution of the selected papers per publication
year and channel.
Table 1: Publication sources.
Journal source No.
Papers
Percentage
Computer Methods and
Programs in Biomedicine
9 5.2%
Computers in Biology and
Medicine
8 4.6%
BMC Medical Informatics and
Decision Making
5 2.9%
Neurocomputing 5 2.9%
Multimedia Tools
and Applications
5 2.9%
Medical Image Analysis 4 2.3%
Biomedical Signal Processing
and Control
4 2.3%
Artificial Intelligence in
Medicine
3 1.7%
A
pp
lied Soft Com
p
utin
g
3 1.7%
Other 75 43.4%
Conference source No.
Pa
p
ers
Percentage
International Conference on
Medical Image Computing and
Computer-Assisted
Intervention (MICCAI)
5 2.9%
Othe
r
42 24.3%
Workshop source No.
Papers
Percentage
International Workshop on
Machine Learning in Medical
Imaging (MLMI)
2 1.7%
Othe
r
2 1.2%
The findings indicate that Computer Methods and
Programs in Biomedicine was the most commonly
targeted journal venue, while the International
Conference on Medical Image Computing and
Computer-Assisted Intervention (MICCAI) and the
International Workshop on Machine Learning in
Medical Imaging (MLMI) emerged as the most
frequently occurring sources for conference and
workshop papers, respectively.
Chronologically speaking, conference papers
were the dominant publication type in 2012 and 2013.
However, the trend shifted in 2014 as the journal
publication frequency surpassed that of conference
papers in subsequent years. A key observation is that
the gap between the two types of publications became
increasingly pronounced from 2020 onwards. The
analysis further revealed a growing trend of
publications, particularly since 2020, when the count
peaked significantly. Notably, no study was
published in 2010, and only one workshop paper was
published in 2011.
0
5
10
15
20
25
30
35
40
45
50
Number of papers
Years
Journal Conference Workshop
KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval
268
The dearth of published papers in 2010-2011 and
the dominance of conference papers until 2013
suggest that CSL research in the medical field was in
its early stages. However, as the field progressed,
researchers started prioritizing top-tier journals due to
their strict review processes and higher publication
standards, resulting in more rigorous research. This
shift towards journal publications began in 2014
when the number of journal articles surpassed
conference papers and continued to widen in
subsequent years. This trend indicates a maturing
field and researchers increasingly meeting the
demanding standards of high-quality journals.
The growing interest and abundance of
publications on CSL can be attributed to several key
factors. Firstly, the development of high-throughput
technologies has resulted in massive amounts of
medical data (Johnson et al., 2018), including clinical
data, electronic health records, and data from
wearable devices. These advancements in data
collection have created an urgent need for novel
methods to analyze and leverage this data for
improved medical outcomes. Secondly, the inherent
imbalanced nature of this collected data poses a
critical challenge that impacts the accuracy and
reliability of ML models in medical applications.
Thirdly, the significant advances in CSL algorithms
(Khan, Hayat, Bennamoun, Sohel, & Togneri, 2018)
and their success in other fields (Sahin, Bulkan, &
Duman, 2013) have encouraged researchers to apply
these techniques in the medical domain, where they
are much needed. Additionally, the advances in deep
learning have been a significant catalyst for progress
in medical data analysis (Esteva et al., 2019). Finally,
the increasing availability of public datasets and tools
for analyzing medical data has facilitated the
dissemination and replication of research findings. As
a result, the research community has become more
aware of the importance of addressing the class
imbalance problem, leading to a surge in publications
on this topic, particularly in recent years.
Besides, the findings revealed diverse publication
sources covering various disciplines such as
medicine, medical informatics, computer science, and
artificial intelligence. This diversity reflects the
interdisciplinary nature of the research topic,
requiring a multi-faceted approach that draws on
expertise from different fields.
3.3 MQ2: In Which Disciplines of
Medicine Was CSL Mainly Used?
The 173 selected studies collectively explored 21
distinct medical disciplines. Interestingly, 17 papers
addressed more than one discipline, either by
investigating a topic at the intersection of two medical
sub-fields (e.g., (Sung, Hung, & Hu, 2021)) or by
testing their methods on a diverse range of disciplines
(e.g., (Gan, Shen, An, Xu, & Liu, 2020)). Figure 3
showcases the distribution of studies per medical sub-
field, focusing solely on sub-fields addressed by at
least 2% of the selected papers.
The findings revealed that oncology is the most
extensively studied discipline, accounting for 31.2%
(54 papers) of the selected studies. As per the World
Health Organization (WHO), cancer is a leading
cause of mortality globally, accounting for
approximately 10 million deaths in 2020 alone
("Cancer," 2020). The significance of accurate and
timely diagnosis and treatment is paramount, and ML
techniques hold great promise in this regard.
However, cancer is a highly heterogeneous disease
that can manifest differently in each patient.
Additionally, patients often present with complex
medical histories and comorbidities, which can
complicate diagnosis and treatment. These factors can
contribute to imbalanced medical data, making CSL
an attractive approach to address these challenges and
improve cancer care.
Cardiology and neurology received significant
focus in subsequent order, constituting 15% (26
papers) and 12.7% (22 papers) of the investigated
literature, respectively. CSL has demonstrated
significant benefits in addressing cardiovascular and
neurological diseases, widely recognized as
significant health concerns. This finding is in line
with the WHO's report ("Cardiovascular Diseases
(CVDs)," 2021), which identifies cardiovascular
diseases as the primary cause of mortality globally,
responsible for 17.9 million deaths in 2019.
Additionally, the WHO acknowledges that
neurological disorders such as stroke, Alzheimer's
disease, and other dementias are among the leading
causes of disability and death worldwide ("Mental
Health: Neurological Disorders," 2016.). Given the
high mortality rate associated with these diseases,
accurate predictions are imperative. However, data
imbalance can lead to biased models that fail to
capture important patterns in the data. By adopting
CSL, researchers aim to improve prediction accuracy
and contribute to preserving human life.
Infectious diseases occupied the fourth position,
representing 8.7% (15 papers) of the total studies.
Notable attention has been dedicated to researching
this sub-field since 2020. This trend is not surprising,
considering the urgency and global impact of the
COVID-19 pandemic, which first emerged in 2019
and has since garnered substantial research attention.
Mapping Cost-Sensitive Learning for Imbalanced Medical Data: Research Trends and Applications
269
Figure 3: Distribution of the selected papers per medical
discipline.
Additionally, imbalanced data is a common issue in
COVID-19 studies due to various factors such as
differences in testing availability and criteria,
variations in reporting standards, differences in
demographics, healthcare infrastructure, and
compliance with public health measures. Besides,
there may be a publication bias towards COVID-19
studies due to the pandemic's global impact, and
funding agencies may have prioritized research on
this topic. Lastly, data availability may have
contributed to the popularity of COVID-19 as a
research subject matter.
Other medical sub-fields, such as ophthalmology,
endocrinology, and hepatology, were investigated by
11 papers (6.8%) each, demonstrating the relevance
of cost-sensitive methods in these domains. Galdran
and colleagues (Galdran, Dolz, Chakor, Lombaert, &
Ben Ayed, 2020) highlighted the value of cost-
sensitive classifiers in addressing two critical
challenges in diabetic retinopathy grading. These
classifiers can effectively model the complex
structure of a heterogeneous label space and are also
advantageous in addressing severely class-
imbalanced scenarios. Fan et al. (Fan, Xie, Cheng, &
Li, 2022) pointed out the inadequacy of conventional
models in considering the imbalanced distribution of
diabetic datasets and the varying misclassification
costs across distinct patient categories. In a previous
study by Yang et al. (2021), the predictive accuracy
of traditional ML methods and cost-sensitive models
were compared for predicting hepatic encephalopathy
in cirrhotic patients. The study's results demonstrated
the superiority of cost-sensitive models, underscoring
their high suitability and potential for future
prognosis studies.
Pulmonology was featured in 8 articles (4.6%),
and nephrology, dermatology, and medical and health
services were each investigated by six studies (3.5%).
On the other hand, emergency medicine (2.9%),
radiology (2.9%), and obstetrics & gynecology
(2.9%) received relatively little attention, as did
orthopaedics, which was addressed by only 2.3% of
the selected studies (four papers).
Disciplines that received the least amount of
attention in the selected studies were classified as
"other", which included geriatric psychiatry and
neonatology, each addressed by two papers (1.2%),
as well as intensive care, radiomics, urology, and
podiatry, which were each the focus of only one study
(0.6%). This may be explained by factors such as
limited data availability and researchers prioritizing
other research areas deemed more crucial and
pertinent to patient care.
3.4 MQ3: Which CSL Approaches
Were Most Frequently Used in
Medicine?
This study seeks to categorize the selected papers
according to the CSL approaches they have
employed, with the goal of obtaining a thorough
understanding of the distribution and prevalence of
these approaches within the medical literature.
Figure 4 illustrates the distribution of cost-
sensitive approaches used in the selected studies.
Direct approaches account for the largest share of
papers, representing 76% (133 papers) of the
qualified studies. Some researchers modified the
objective function of the model to minimize the
expected cost of misclassification (e.g., (Al-Sawwa &
Ludwig, 2019)), while others incorporated the cost
matrix directly into the loss function (e.g., (Ben
naceur, Akil, Saouli, & Kachouri, 2020)). The ease of
implementation is the primary factor contributing to
this trend since most ML libraries offer readily
available implementations (Sterner et al., 2021).
Figure 4: Distribution of the selected studies per CSL
approach.
31,2%
15,0%
12,7%
8,7%
6,4%
6,4%
6,4%
4,6%
3,5%
3,5%
3,5%
2,9%
2,9%
2,9%
2,3%
4,6%
0% 5% 10% 15% 20% 25% 30% 35%
Oncology
Cardiology
Neurology
Infectious Diseases
Ophtalmology
Endocrinology
Hepatology
Pulmonology
Nephrology
Dermatology
Medical and health services
Emergency Medicine
Radiology
Obstetrics & Gynecology
Orthopaedics
Other
Percentage of papers
KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval
270
Moreover, certain packages offer the flexibility to
apply custom loss functions directly to the algorithm,
allowing users to employ cost-sensitive loss functions
tailored to their specific applications.
A considerable share of the selected studies
(16.6%) adopted meta-learning approaches.
Precisely, preprocessing was applied in 24 papers
(13.7%), and postprocessing was employed in 5
papers (2.9%). Preprocessing was carried out using
weighting (e.g. (K. J. Wang, Makond, & Wang,
2013)) or MetaCost (e.g., (Afzal et al., 2013)), while
postprocessing relied on thresholding (e.g., (Zhao,
Wong, & Tsui, 2018)). Preprocessing techniques are
adopted by researchers as they alter the training data
instead of the underlying algorithm (Fernández et al.,
2018), rendering them a suitable approach for
different types of classifiers. Thresholding is less
frequently employed in the selected studies due to the
arduous task of selecting the most suitable threshold
from a large pool of possibilities (Liu et al., 2021).
Note that the direct and preprocessing approaches
were utilized together in two papers, resulting in
double counting in these categories. Moreover, 13
articles (7.4%) did not provide information on the
cost-sensitive approach they adopted and were thus
categorized as "unspecified". Incomplete reporting
may hinder the reproducibility and comparability of
results and the identification of effective methods for
dealing with imbalanced medical data. Given the
importance of transparency in medical research,
future studies should provide a clear and detailed
description of the implemented cost-sensitive
techniques, including any modifications made to the
model, to allow for better understanding, comparison
and replication of findings.
4 CONCLUSION AND FUTURE
WORK
This SMS aimed to provide a thorough overview of
the current state of research on CSL for imbalanced
medical data. 173 papers published between January
2010 and December 2022 were selected from five
digital libraries and classified according to
publication years, channels and sources, medical
disciplines, and CSL approaches. The main findings
per MQ are: (MQ1) The use of CSL for imbalanced
medical data has garnered increasing interest,
particularly since 2020, with most papers (69.9%)
published in journals. (MQ2) Oncology was the most
extensively investigated discipline. (MQ3) Most
papers (76%) employed CSL direct approaches. This
SMS lays the groundwork for our forthcoming
research, which will involve a more targeted and
comprehensive review of CSL for imbalanced
medical data.
ACKNOWLEDGEMENTS
This work was conducted under the research project
"Machine Learning based Breast Cancer Diagnosis
and Treatment", 2020-2023. The authors would like
to thank the Moroccan Ministry of Higher Education
and Scientific Research, Digital Development
Agency (ADD), and CNRST for their support.
REFERENCES
Afzal, Z., Schuemie, M. J., Van Blijderveen, J. C., Sen, E.
F., Sturkenboom, M. C., & Kors, J. A. (2013).
Improving sensitivity of machine learning methods for
automated case identification from free-text electronic
medical records. BMC Medical Informatics and
Decision Making, 13(1), 1–11. https://doi.org/
10.1186/1472-6947-13-30/TABLES/10
Al-Sawwa, J., & Ludwig, S. A. (2019). A Cost-Sensitive
Centroid-based Differential Evolution Classification
Algorithm applied to Cancer Data Sets. 2019 IEEE
Symposium Series on Computational Intelligence, SSCI
2019, 2514–2521. https://doi.org/10.1109/SSCI448
17.2019.9002660
Ben naceur, M., Akil, M., Saouli, R., & Kachouri, R.
(2020). Fully automatic brain tumor segmentation with
deep learning-based selective attention using
overlapping patches and multi-class weighted cross-
entropy. Medical Image Analysis, 63, 101692.
https://doi.org/10.1016/J.MEDIA.2020.101692
Cancer. (2020). Retrieved March 16, 2023, from
https://www.who.int/news-room/fact-
sheets/detail/cancer
Cardiovascular diseases (CVDs). (2021). Retrieved March
16, 2023, from https://www.who.int/news-room/fact-
sheets/detail/cardiovascular-diseases-(cvds)
Elkan, C. (2001). The Foundations of Cost-Sensitive
Learning. 17th International Joint Conference on
Artificial Intelligence (IJCAI'01), 973–978.
Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V.,
DePristo, M., Chou, K., … Dean, J. (2019). A guide to
deep learning in healthcare. Nature Medicine 2019
25:1, 25(1), 24–29. https://doi.org/10.1038/s41591-
018-0316-z
Fan, B., Xie, Z., Cheng, H., & Li, P. (2022). Risk Prediction
of Diabetic Readmission Based on Cost Sensitive
Convolutional Neural Network. Communications in
Computer and Information Science, 1563 CCIS, 299–
311. https://doi.org/10.1007/978-981-19-0852-1_23/
COVER
Mapping Cost-Sensitive Learning for Imbalanced Medical Data: Research Trends and Applications
271
Fernández, A., García, S., Galar, M., Prati, R. C.,
Krawczyk, B., & Herrera, F. (2018). Cost-Sensitive
Learning. Learning from Imbalanced Data Sets, 63–78.
https://doi.org/10.1007/978-3-319-98074-4_4
Freitas, A., Brazdil, P., & Costa-Pereira, A. (2009). Cost-
sensitive learning in medicine. In Data Mining and
Medical Knowledge Management: Cases and
Applications (pp. 57–75). IGI Global.
https://doi.org/10.4018/978-1-60566-218-3.ch003
Galdran, A., Dolz, J., Chakor, H., Lombaert, H., & Ben
Ayed, I. (2020). Cost-Sensitive Regularisation for
Diabetic Retinopathy Grading from Eye Fundus
Images. Lecture Notes in Computer Science (Including
Subseries Lecture Notes in Artificial Intelligence and
Lecture Notes in Bioinformatics), 12265 LNCS, 665–
674. https://doi.org/10.1007/978-3-030-59722-1_64/
COVER
Gan, D., Shen, J., An, B., Xu, M., & Liu, N. (2020).
Integrating TANBN with cost sensitive classification
algorithm for imbalanced data in medical diagnosis.
Computers & Industrial Engineering, 140, 106266.
https://doi.org/10.1016/J.CIE.2019.106266
Hu, K., Huang, Y., Huang, W., Tan, H., Chen, Z., Zhong,
Z., Gao, X. (2021). Deep supervised learning using
self-adaptive auxiliary loss for COVID-19 diagnosis
from imbalanced CT images. Neurocomputing, 458,
232–245. https://doi.org/10.1016/J.NEUCOM.2021.0
6.012
Johnson, K. W., Torres Soto, J., Glicksberg, B. S., Shameer,
K., Miotto, R., Ali, M., Dudley, J. T. (2018).
Artificial Intelligence in Cardiology. Journal of the
American College of Cardiology, 71(23), 2668–2679.
https://doi.org/10.1016/J.JACC.2018.03.521
Khan, S. H., Hayat, M., Bennamoun, M., Sohel, F. A., &
Togneri, R. (2018). Cost-sensitive learning of deep
feature representations from imbalanced data. IEEE
Transactions on Neural Networks and Learning
Systems, 29(8), 3573–3587. https://doi.org/10.1109/
TNNLS.2017.2732482
Khushi, M., Shaukat, K., Alam, T. M., Hameed, I. A.,
Uddin, S., Luo, S., Reyes, M. C. (2021). A
Comparative Performance Analysis of Data
Resampling Methods on Imbalance Medical Data.
IEEE Access, 9, 109960–109975. https://doi.org/
10.1109/ACCESS.2021.3102399
Kitchenham, B., & Charters, S. (2007, April). Guidelines
for performing Systematic Literature Reviews in
Software Engineering. Technical Report, Ver. 2.3
Technical Report EBSE.
Liu, Y., Li, Q., Wang, K., Liu, J., He, R., Yuan, Y., &
Zhang, H. (2021). Automatic Multi-Label ECG
Classification with Category Imbalance and Cost-
Sensitive Thresholding. Biosensors, 11(11), 453.
https://doi.org/10.3390/BIOS11110453
López, V., Fernández, A., García, S., Palade, V., & Herrera,
F. (2013). An insight into classification with
imbalanced data: Empirical results and current trends
on using data intrinsic characteristics. Information
Sciences, 250
, 113–141. https://doi.org/10.1016/
J.INS.2013.07.007
Mental health: neurological disorders. (2016). Retrieved
March 16, 2023, from https://www.who.int/news-
room/questions-and-answers/item/mental-health-
neurological-disorders
Petersen, K., Feldt, R., Mujtaba, S., & Mattsson, M. (2008).
Systematic Mapping Studies in Software Engineering.
12th International Conference on Evaluation and
Assessment in Software Engineering, EASE 2008.
https://doi.org/10.14236/EWIC/EASE2008.8
Petersen, K., Vakkalanka, S., & Kuzniarz, L. (2015).
Guidelines for conducting systematic mapping studies
in software engineering: An update. Information and
Software Technology, 64, 1–18. https://doi.org/
10.1016/j.infsof.2015.03.007
Sahin, Y., Bulkan, S., & Duman, E. (2013). A cost-sensitive
decision tree approach for fraud detection. Expert
Systems with Applications, 40(15), 5916–5923.
https://doi.org/10.1016/J.ESWA.2013.05.021
Specialty Profiles | Careers in Medicine. (2023). Retrieved
March 15, 2023, from https://careersinmedicine.a
amc.org/explore-options/specialty-profiles
Sterner, P., Goretzko, D., & Pargent, F. (2021). Everything
has its price: Foundations of cost-sensitive learning and
its application in psychology. [Preprint]. PsyArXiv.
Https://Doi. Org/10.31234/Osf. Io/7asgz.
Sung, S. F., Hung, L. C., & Hu, Y. H. (2021). Developing
a stroke alert trigger for clinical decision support at
emergency triage using machine learning. International
Journal of Medical Informatics, 152. https://doi.org/
10.1016/J.IJMEDINF.2021.104505
Wang, K. J., Makond, B., & Wang, K. M. (2013). An
improved survivability prognosis of breast cancer by
using sampling and feature selection technique to solve
imbalanced patient classification data. BMC Medical
Informatics and Decision Making, 13(1), 124.
https://doi.org/10.1186/1472-6947-13-124
Yang, H., Li, X., Cao, H., Cui, Y., Luo, Y., Liu, J., &
Zhang, Y. (2021). Using machine learning methods to
predict hepatic encephalopathy in cirrhotic patients
with unbalanced data. Computer Methods and
Programs in Biomedicine, 211. https://doi.org/10.1016/
J.CMPB.2021.106420
Zhao, Y., Wong, Z. S. Y., & Tsui, K. L. (2018). A
Framework of Rebalancing Imbalanced Healthcare
Data for Rare Events' Classification: A Case of Look-
Alike Sound-Alike Mix-Up Incident Detection. Journal
of Healthcare Engineering, 2018, 6275435.
https://doi.org/10.1155/2018/627543
KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval
272