Mapping Cost-Sensitive Learning for Imbalanced Medical Data:

Research Trends and Applications

Imane Araf

, Ali Idri

1,2 b

and Ikram Chairi

Mohammed VI Polytechnic University, Ben Guerir, Morocco

Mohammed V University, Rabat, Morocco

Keywords: Machine Learning, Data Imbalance, Cost-Sensitive Learning, Medical Data, Systematic Mapping Study.

Abstract: Incorporating Machine Learning (ML) in medicine has opened up new avenues for leveraging complex

medical data to enhance patient outcomes and advance the field. However, the imbalanced nature of medical

data poses a significant challenge, resulting in biased ML models that perform poorly on the minority class

of interest. To address this issue, researchers have proposed various approaches, among which Cost-Sensitive

Learning (CSL) stands out as a promising technique to improve the accuracy of ML models. To the best of

our knowledge, this paper presents the first systematic mapping study on CSL for imbalanced medical data.

To comprehensively investigate the scope of existing literature, papers published from January 2010 to

December 2022 and sourced from five major digital libraries were thoroughly explored. A total of 173 papers

were selected and analyzed according to three classification criteria: publication years, channels and sources;

medical disciplines; and CSL approaches. This study provides a valuable resource for researchers seeking to

explore the current state of research and advance the application of CSL for imbalanced data in medicine.

1 INTRODUCTION

Medicine is a dynamic and intricate field that has

witnessed remarkable progress in recent decades,

attributed to the advances in technology and medical

imaging (Johnson et al., 2018). These developments

have endowed healthcare providers with powerful

tools, improving patient outcomes and extending life

expectancies. Nevertheless, with the escalating

complexity and abundance of medical data, medical

practitioners now face new challenges in accurately

diagnosing and treating patients.

To address these challenges, Machine Learning

(ML), a branch of artificial intelligence, has emerged

as a promising solution in recent years. ML

techniques enable the analysis of massive amounts of

data, recognizing patterns, and predicting outcomes.

Consequently, it has opened up new perspectives into

the fundamental disease mechanisms, ultimately

facilitating improved healthcare delivery systems and

more effective treatments and therapies.

Additionally, ML harbours a tremendous potential to

https://orcid.org/0000-0001-7278-6848

https://orcid.org/0000-0002-4586-4158

https://orcid.org/0000-0001-9175-0074

transform medical research, potentially unlocking

novel discoveries and revolutionizing the field.

However, medical data is often imbalanced,

meaning one class is underrepresented compared to

the other. For instance, in cancer screening, the

number of patients with cancer is typically much

smaller than that of healthy patients. This data

imbalance can lead to biased ML models that perform

poorly on the minority class, which is more often than

not the class of interest. ML researchers proposed

various approaches to address this issue, including

resampling (Khushi et al., 2021) and Cost-Sensitive

Learning (CSL) (Elkan, 2001).

Resampling techniques aim to balance the data

either by oversampling the minority class or

undersampling the majority class. While resampling

can enhance models' performance, it may result in

overfitting or information loss (Hu et al., 2021). On

the other hand, CSL tackles the class imbalance

problem without any data modifications by assigning

different misclassification costs to each class. In

particular, cost-sensitive methods assign higher costs

Araf, I., Idri, A. and Chairi, I.

Mapping Cost-Sensitive Learning for Imbalanced Medical Data: Research Trends and Applications.

DOI: 10.5220/0012176000003598

In Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2023) - Volume 1: KDIR, pages 265-272

ISBN: 978-989-758-671-2; ISSN: 2184-3228

265

for misclassifying examples of the minority class and

seek to minimize the high-cost errors (López,

Fernández, García, Palade, & Herrera, 2013). This

approach is advantageous in many real-world

applications, including medical ones, where certain

misclassifications can have more severe

consequences (Sterner, Goretzko, & Pargent, 2021).

For example, misclassifying a patient with cancer as

healthy is more detrimental than the opposite, as it

can delay treatment and lead to further complications.

The misclassification costs are often specified as cost

matrices, which can be expert-defined or estimated

from training data (Fernández et al., 2018).

CSL techniques can be broadly classified into

direct approaches and meta-learning approaches

(Fernández et al., 2018; Liu et al., 2021). The former

modify the learning algorithms by incorporating

misclassification costs during the model training

phase (Fernández et al., 2018). Conversely, the latter

do not alter the learning algorithms per se (Liu et al.,

2021). Instead, meta-learning approaches adjust the

training data (preprocessing) or the model's outputs

(postprocessing) to ensure cost sensitivity. Popular

preprocessing techniques include instance weighting

based on a cost matrix and MetaCost (Fernández et

al., 2018), which relabels the training data according

to misclassification costs. Postprocessing techniques,

meanwhile, often involve adjusting the decision

thresholds based on the pre-defined costs (Fernández

et al., 2018; Liu et al., 2021).

Despite the potential of CSL in medical research,

existing reviews on the topic (Freitas, Brazdil, &

Costa-Pereira, 2009; Sterner et al., 2021) suffer from

limitations, including a lack of systematic approach,

limited scope or outdatedness. As such, a Systematic

Mapping Study (SMS) was conducted to address CSL

for imbalanced medical data, which, to the best of our

knowledge, is the first of its kind. The contributions

of this paper are two-fold. Firstly, a systematic and

comprehensive overview of the current state of

research on CSL for imbalanced data in the medical

field is presented. Secondly, the existing literature's

strengths and limitations are critically evaluated, and

potential future research directions are suggested. To

comprehensively investigate the scope of existing

literature, materials from January 2010 to December

2022 were extensively explored. The materials were

sourced from five major digital libraries: PubMed,

ScienceDirect, IEEE Xplore, SpringerLink, and

Google Scholar. The 173 selected papers were

subsequently analyzed to answer three Mapping

Questions (MQs): (i) publication years, channels and

sources, (ii) medical disciplines, and (iii) CSL

approaches.

The remainder of this paper is structured as

follows. Section 2 details the research methodology.

Section 3 reports the results of this study and provides

an in-depth discussion of the findings, highlighting

trends, strengths and gaps in the existing literature.

Finally, Section 4 concludes the paper by

summarising the main findings and outlining future

work.

2 METHODOLOGY

An SMS systematically categorizes and classifies

existing research in a particular field and often gives

a visual summary of its results (Petersen, Feldt,

Mujtaba, & Mattsson, 2008). It aims to determine the

scope and extent of existing research on a topic,

identify gaps and trends, and provide a foundation for

future research. The present study follows the

mapping process proposed by Peterson, Vakkalanka,

and Kuzniarz (2015). This process covers: (i) clearly

defining the research questions, (ii) developing a

comprehensive search strategy to identify relevant

papers, (iii) screening the identified papers based on

inclusion and exclusion criteria, (iv) designing a

classification scheme, and (v) data extraction and

analysis, resulting in a systematic map.

2.1 Mapping Questions

This study aims to provide an overview and a

structured understanding of the existing literature on

using CSL for imbalanced medical data by addressing

three MQs:

MQ1: In which years, publication channels and

sources were the selected papers published?

MQ2: In which disciplines of medicine was CSL

mainly employed?

MQ3: Which CSL approaches were most

frequently used in medicine?

2.2 Search Strategy

The search is conducted in five digital libraries:

PubMed, ScienceDirect, IEEE Xplore, SpringerLink

and Google Scholar from January 2010 until

December 2022. These libraries were chosen based

on their extensive coverage of peer-reviewed

publications in medicine and health sciences, as well

as computer science and engineering.

The search string was formulated based on the

principal terms from the MQs, as well as the PICO

(Population, Intervention, Comparison and

KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval

266

Outcomes) framework (Kitchenham & Charters,

2007). Note that the third and fourth letters of PICO

were not included in the search string formulation

since neither empirical comparison nor measurable

outcomes were considered in this study. Additionally,

the search string was expanded to include alternative

spellings and synonyms of the derived terms to ensure

a comprehensive search.

The main search terms were initially linked with

their substitutes using the Boolean operator "OR" and

were joined using "AND" afterwards. The complete

search string was defined as follows:

(Health* OR Medic* OR Disease OR Clinic*) AND

("Machine Learning" OR "Deep Learning" OR

Intelligen* OR Classif* OR Predict* OR Diagnos*

OR Prognos*) AND (Technique OR Method OR

Tool OR Model OR Algorithm OR Approach OR

Framework) AND ("Cost sensitive" OR Cost-

sensitive OR "weighted cost function" OR "weighted

loss function" OR "class weighting" OR re-

weighting) AND (Imbalance* OR unbalance* OR

"skewed class distribution" OR under-represented

OR "majority class" OR "minority class").

2.3 Study Selection

The Inclusion Criteria (IC) and Exclusion Criteria

(EC) used to identify the relevant papers are

presented below.

IC1: Studies developing new or using existing

cost-sensitive techniques in medicine.

IC2: Papers focusing mainly on cost-sensitive

models in medicine, whether or not comparing them

to other balancing techniques.

IC3: Papers presenting fair comparisons of

several balancing techniques in medicine, including

cost-sensitive methods.

IC4: Papers presenting comparisons between

CSL methods in medicine without proposing any

newly developed techniques.

IC5: Papers providing an overview of studies

investigating cost-sensitive methods in medicine.

IC6: Papers combining cost-sensitive methods

with other balancing techniques in medicine.

EC1: Papers published earlier than January 2010

or later than December 2022.

EC2: Papers using several datasets from multiple

areas with a mere presence of medical ones.

EC3: Papers using cost-sensitive techniques in

public health, biology, pharmacology or genomics.

EC4: Papers available as abstracts, posters, book

chapters, or presentations.

EC5: Non-peer-reviewed papers.

EC6: Duplicate publications of the same study.

EC7: Studies published in languages other than

English.

EC8: Short papers.

EC9: Papers for which the full texts are not

available.

The suitability of a study for inclusion was

determined by examining its title, abstract, and

keywords. All the articles were further screened by

reviewing their introduction, discussion, and

conclusion sections. Full-text reading was conducted

in case of doubt. Initially, one author examined the

papers, and the remaining authors subsequently

evaluated the final selection.

Furthermore, each paper was evaluated by two

authors based on a set of Quality Assessment (QA)

criteria to ensure that the selected studies are of

sufficient quality and provide reliable and valid

evidence to address the MQs. The criteria included

clear empirical results, justified empirical design,

performance evaluation, comparison with other

methods, explicit presentation of benefits and

limitations, and publication in a recognized source.

2.4 Data Extraction Strategy and

Synthesis

During this phase, a data extraction form was used for

each selected paper to answer the MQs.

MQ1: Publication years, channels (journal,

conference or workshop), and sources were extracted

to address this question.

MQ2: Each paper was examined to determine its

specific medical focus, encompassing disciplines

such as oncology, cardiology, ophthalmology, and

others, as detailed exhaustively in ("Specialty Profiles

| Careers in Medicine," 2023).

MQ3: The proposed cost-sensitive methods in the

selected studies were identified. These methods can

be classified as either direct or meta-learning

approaches. The latter could further be classified as

preprocessing or postprocessing methods (Fernández

et al., 2018).

3 RESULTS AND DISCUSSION

This section provides an overview of the study

selection. It also presents and discusses the mapping

results according to the proposed MQs.

3.1 Study Selection

Figure 1 displays the number of articles at each stage

of the selection process. Initially, 49325 candidate

Mapping Cost-Sensitive Learning for Imbalanced Medical Data: Research Trends and Applications

267

papers were identified, from which 49124 studies

were discarded according to the IC and EC.

28 studies that did not fulfil the QA criteria were

later excluded. Eventually, 173 papers were retained

to answer the MQs. Given space limitations, the list

of selected papers and their extracted data can be

obtained through an email request to the authors.

3.2 MQ1: In Which Years, Publication

Channels and Sources Were the

Selected Papers Published?

Figure 2 shows the number of selected studies per

publication channel from January 2010 to December

2022. Three main channels were identified: journals,

conferences and workshops. Out of the 173 selected

studies, the majority, precisely 69.9% (121 papers),

were published in journals, 27.2% (47 papers) were

published in conference proceedings, and only 2.9%

(five papers) were published in workshops. Table 1

outlines the publication sources that have published

more than two papers.

Figure 1: Selection process.

Figure 2: Distribution of the selected papers per publication

year and channel.

Table 1: Publication sources.

Journal source No.

Papers

Percentage

Computer Methods and

Programs in Biomedicine

9 5.2%

Computers in Biology and

Medicine

8 4.6%

BMC Medical Informatics and

Decision Making

5 2.9%

Neurocomputing 5 2.9%

Multimedia Tools

and Applications

5 2.9%

Medical Image Analysis 4 2.3%

Biomedical Signal Processing

and Control

4 2.3%

Artificial Intelligence in

Medicine

3 1.7%

lied Soft Com

utin

3 1.7%

Other 75 43.4%

Conference source No.

ers

Percentage

International Conference on

Medical Image Computing and

Computer-Assisted

Intervention (MICCAI)

5 2.9%

Othe

42 24.3%

Workshop source No.

Papers

Percentage

International Workshop on

Machine Learning in Medical

Imaging (MLMI)

2 1.7%

Othe

2 1.2%

The findings indicate that Computer Methods and

Programs in Biomedicine was the most commonly

targeted journal venue, while the International

Conference on Medical Image Computing and

Computer-Assisted Intervention (MICCAI) and the

International Workshop on Machine Learning in

Medical Imaging (MLMI) emerged as the most

frequently occurring sources for conference and

workshop papers, respectively.

Chronologically speaking, conference papers

were the dominant publication type in 2012 and 2013.

However, the trend shifted in 2014 as the journal

publication frequency surpassed that of conference

papers in subsequent years. A key observation is that

the gap between the two types of publications became

increasingly pronounced from 2020 onwards. The

analysis further revealed a growing trend of

publications, particularly since 2020, when the count

peaked significantly. Notably, no study was

published in 2010, and only one workshop paper was

published in 2011.

Number of papers

Years

Journal Conference Workshop

KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval

268

The dearth of published papers in 2010-2011 and

the dominance of conference papers until 2013

suggest that CSL research in the medical field was in

its early stages. However, as the field progressed,

researchers started prioritizing top-tier journals due to

their strict review processes and higher publication

standards, resulting in more rigorous research. This

shift towards journal publications began in 2014

when the number of journal articles surpassed

conference papers and continued to widen in

subsequent years. This trend indicates a maturing

field and researchers increasingly meeting the

demanding standards of high-quality journals.

The growing interest and abundance of

publications on CSL can be attributed to several key

factors. Firstly, the development of high-throughput

technologies has resulted in massive amounts of

medical data (Johnson et al., 2018), including clinical

data, electronic health records, and data from

wearable devices. These advancements in data

collection have created an urgent need for novel

methods to analyze and leverage this data for

improved medical outcomes. Secondly, the inherent

imbalanced nature of this collected data poses a

critical challenge that impacts the accuracy and

reliability of ML models in medical applications.

Thirdly, the significant advances in CSL algorithms

(Khan, Hayat, Bennamoun, Sohel, & Togneri, 2018)

and their success in other fields (Sahin, Bulkan, &

Duman, 2013) have encouraged researchers to apply

these techniques in the medical domain, where they

are much needed. Additionally, the advances in deep

learning have been a significant catalyst for progress

in medical data analysis (Esteva et al., 2019). Finally,

the increasing availability of public datasets and tools

for analyzing medical data has facilitated the

dissemination and replication of research findings. As

a result, the research community has become more

aware of the importance of addressing the class

imbalance problem, leading to a surge in publications

on this topic, particularly in recent years.

Besides, the findings revealed diverse publication

sources covering various disciplines such as

medicine, medical informatics, computer science, and

artificial intelligence. This diversity reflects the

interdisciplinary nature of the research topic,

requiring a multi-faceted approach that draws on

expertise from different fields.

3.3 MQ2: In Which Disciplines of

Medicine Was CSL Mainly Used?

The 173 selected studies collectively explored 21

distinct medical disciplines. Interestingly, 17 papers

addressed more than one discipline, either by

investigating a topic at the intersection of two medical

sub-fields (e.g., (Sung, Hung, & Hu, 2021)) or by

testing their methods on a diverse range of disciplines

(e.g., (Gan, Shen, An, Xu, & Liu, 2020)). Figure 3

showcases the distribution of studies per medical sub-

field, focusing solely on sub-fields addressed by at

least 2% of the selected papers.

The findings revealed that oncology is the most

extensively studied discipline, accounting for 31.2%

(54 papers) of the selected studies. As per the World

Health Organization (WHO), cancer is a leading

cause of mortality globally, accounting for

approximately 10 million deaths in 2020 alone

("Cancer," 2020). The significance of accurate and

timely diagnosis and treatment is paramount, and ML

techniques hold great promise in this regard.

However, cancer is a highly heterogeneous disease

that can manifest differently in each patient.

Additionally, patients often present with complex

medical histories and comorbidities, which can

complicate diagnosis and treatment. These factors can

contribute to imbalanced medical data, making CSL

an attractive approach to address these challenges and

improve cancer care.

Cardiology and neurology received significant

focus in subsequent order, constituting 15% (26

papers) and 12.7% (22 papers) of the investigated

literature, respectively. CSL has demonstrated

significant benefits in addressing cardiovascular and

neurological diseases, widely recognized as

significant health concerns. This finding is in line

with the WHO's report ("Cardiovascular Diseases

(CVDs)," 2021), which identifies cardiovascular

diseases as the primary cause of mortality globally,

responsible for 17.9 million deaths in 2019.

Additionally, the WHO acknowledges that

neurological disorders such as stroke, Alzheimer's

disease, and other dementias are among the leading

causes of disability and death worldwide ("Mental

Health: Neurological Disorders," 2016.). Given the

high mortality rate associated with these diseases,

accurate predictions are imperative. However, data

imbalance can lead to biased models that fail to

capture important patterns in the data. By adopting

CSL, researchers aim to improve prediction accuracy

and contribute to preserving human life.

Infectious diseases occupied the fourth position,

representing 8.7% (15 papers) of the total studies.

Notable attention has been dedicated to researching

this sub-field since 2020. This trend is not surprising,

considering the urgency and global impact of the

COVID-19 pandemic, which first emerged in 2019

and has since garnered substantial research attention.

Mapping Cost-Sensitive Learning for Imbalanced Medical Data: Research Trends and Applications

269

Figure 3: Distribution of the selected papers per medical

discipline.

Additionally, imbalanced data is a common issue in

COVID-19 studies due to various factors such as

differences in testing availability and criteria,

variations in reporting standards, differences in

demographics, healthcare infrastructure, and

compliance with public health measures. Besides,

there may be a publication bias towards COVID-19

studies due to the pandemic's global impact, and

funding agencies may have prioritized research on

this topic. Lastly, data availability may have

contributed to the popularity of COVID-19 as a

research subject matter.

Other medical sub-fields, such as ophthalmology,

endocrinology, and hepatology, were investigated by

11 papers (6.8%) each, demonstrating the relevance

of cost-sensitive methods in these domains. Galdran

and colleagues (Galdran, Dolz, Chakor, Lombaert, &

Ben Ayed, 2020) highlighted the value of cost-

sensitive classifiers in addressing two critical

challenges in diabetic retinopathy grading. These

classifiers can effectively model the complex

structure of a heterogeneous label space and are also

advantageous in addressing severely class-

imbalanced scenarios. Fan et al. (Fan, Xie, Cheng, &

Li, 2022) pointed out the inadequacy of conventional

models in considering the imbalanced distribution of

diabetic datasets and the varying misclassification

costs across distinct patient categories. In a previous

study by Yang et al. (2021), the predictive accuracy

of traditional ML methods and cost-sensitive models

were compared for predicting hepatic encephalopathy

in cirrhotic patients. The study's results demonstrated

the superiority of cost-sensitive models, underscoring

their high suitability and potential for future

prognosis studies.

Pulmonology was featured in 8 articles (4.6%),

and nephrology, dermatology, and medical and health

services were each investigated by six studies (3.5%).

On the other hand, emergency medicine (2.9%),

radiology (2.9%), and obstetrics & gynecology

(2.9%) received relatively little attention, as did

orthopaedics, which was addressed by only 2.3% of

the selected studies (four papers).

Disciplines that received the least amount of

attention in the selected studies were classified as

"other", which included geriatric psychiatry and

neonatology, each addressed by two papers (1.2%),

as well as intensive care, radiomics, urology, and

podiatry, which were each the focus of only one study

(0.6%). This may be explained by factors such as

limited data availability and researchers prioritizing

other research areas deemed more crucial and

pertinent to patient care.

3.4 MQ3: Which CSL Approaches

Were Most Frequently Used in

Medicine?

This study seeks to categorize the selected papers

according to the CSL approaches they have

employed, with the goal of obtaining a thorough

understanding of the distribution and prevalence of

these approaches within the medical literature.

Figure 4 illustrates the distribution of cost-

sensitive approaches used in the selected studies.

Direct approaches account for the largest share of

papers, representing 76% (133 papers) of the

qualified studies. Some researchers modified the

objective function of the model to minimize the

expected cost of misclassification (e.g., (Al-Sawwa &

Ludwig, 2019)), while others incorporated the cost

matrix directly into the loss function (e.g., (Ben

naceur, Akil, Saouli, & Kachouri, 2020)). The ease of

implementation is the primary factor contributing to

this trend since most ML libraries offer readily

available implementations (Sterner et al., 2021).

Figure 4: Distribution of the selected studies per CSL

approach.

31,2%

15,0%

12,7%

8,7%

6,4%

4,6%

3,5%

2,9%

2,3%

4,6%

0% 5% 10% 15% 20% 25% 30% 35%

Oncology

Cardiology

Neurology

Infectious Diseases

Ophtalmology

Endocrinology

Hepatology

Pulmonology

Nephrology

Dermatology

Medical and health services

Emergency Medicine

Radiology

Obstetrics & Gynecology

Orthopaedics

Other

Percentage of papers

KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval

270

Moreover, certain packages offer the flexibility to

apply custom loss functions directly to the algorithm,

allowing users to employ cost-sensitive loss functions

tailored to their specific applications.

A considerable share of the selected studies

(16.6%) adopted meta-learning approaches.

Precisely, preprocessing was applied in 24 papers

(13.7%), and postprocessing was employed in 5

papers (2.9%). Preprocessing was carried out using

weighting (e.g. (K. J. Wang, Makond, & Wang,

2013)) or MetaCost (e.g., (Afzal et al., 2013)), while

postprocessing relied on thresholding (e.g., (Zhao,

Wong, & Tsui, 2018)). Preprocessing techniques are

adopted by researchers as they alter the training data

instead of the underlying algorithm (Fernández et al.,

2018), rendering them a suitable approach for

different types of classifiers. Thresholding is less

frequently employed in the selected studies due to the

arduous task of selecting the most suitable threshold

from a large pool of possibilities (Liu et al., 2021).

Note that the direct and preprocessing approaches

were utilized together in two papers, resulting in

double counting in these categories. Moreover, 13

articles (7.4%) did not provide information on the

cost-sensitive approach they adopted and were thus

categorized as "unspecified". Incomplete reporting

may hinder the reproducibility and comparability of

results and the identification of effective methods for

dealing with imbalanced medical data. Given the

importance of transparency in medical research,

future studies should provide a clear and detailed

description of the implemented cost-sensitive

techniques, including any modifications made to the

model, to allow for better understanding, comparison

and replication of findings.

4 CONCLUSION AND FUTURE

WORK

This SMS aimed to provide a thorough overview of

the current state of research on CSL for imbalanced

medical data. 173 papers published between January

2010 and December 2022 were selected from five

digital libraries and classified according to

publication years, channels and sources, medical

disciplines, and CSL approaches. The main findings

per MQ are: (MQ1) The use of CSL for imbalanced

medical data has garnered increasing interest,

particularly since 2020, with most papers (69.9%)

published in journals. (MQ2) Oncology was the most

extensively investigated discipline. (MQ3) Most

papers (76%) employed CSL direct approaches. This

SMS lays the groundwork for our forthcoming

research, which will involve a more targeted and

comprehensive review of CSL for imbalanced

medical data.

ACKNOWLEDGEMENTS

This work was conducted under the research project

"Machine Learning based Breast Cancer Diagnosis

and Treatment", 2020-2023. The authors would like

to thank the Moroccan Ministry of Higher Education

and Scientific Research, Digital Development

Agency (ADD), and CNRST for their support.

REFERENCES

Afzal, Z., Schuemie, M. J., Van Blijderveen, J. C., Sen, E.

F., Sturkenboom, M. C., & Kors, J. A. (2013).

Improving sensitivity of machine learning methods for

automated case identification from free-text electronic

medical records. BMC Medical Informatics and

Decision Making, 13(1), 1–11. https://doi.org/

10.1186/1472-6947-13-30/TABLES/10

Al-Sawwa, J., & Ludwig, S. A. (2019). A Cost-Sensitive

Centroid-based Differential Evolution Classification

Algorithm applied to Cancer Data Sets. 2019 IEEE

Symposium Series on Computational Intelligence, SSCI

2019, 2514–2521. https://doi.org/10.1109/SSCI448

17.2019.9002660

Ben naceur, M., Akil, M., Saouli, R., & Kachouri, R.

(2020). Fully automatic brain tumor segmentation with

deep learning-based selective attention using

overlapping patches and multi-class weighted cross-

entropy. Medical Image Analysis, 63, 101692.

https://doi.org/10.1016/J.MEDIA.2020.101692

Cancer. (2020). Retrieved March 16, 2023, from

https://www.who.int/news-room/fact-

sheets/detail/cancer

Cardiovascular diseases (CVDs). (2021). Retrieved March

16, 2023, from https://www.who.int/news-room/fact-

sheets/detail/cardiovascular-diseases-(cvds)

Elkan, C. (2001). The Foundations of Cost-Sensitive

Learning. 17th International Joint Conference on

Artificial Intelligence (IJCAI'01), 973–978.

Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V.,

DePristo, M., Chou, K., … Dean, J. (2019). A guide to

deep learning in healthcare. Nature Medicine 2019

25:1, 25(1), 24–29. https://doi.org/10.1038/s41591-

018-0316-z

Fan, B., Xie, Z., Cheng, H., & Li, P. (2022). Risk Prediction

of Diabetic Readmission Based on Cost Sensitive

Convolutional Neural Network. Communications in

Computer and Information Science, 1563 CCIS, 299–

311. https://doi.org/10.1007/978-981-19-0852-1_23/

COVER

Mapping Cost-Sensitive Learning for Imbalanced Medical Data: Research Trends and Applications

271

Fernández, A., García, S., Galar, M., Prati, R. C.,

Krawczyk, B., & Herrera, F. (2018). Cost-Sensitive

Learning. Learning from Imbalanced Data Sets, 63–78.

https://doi.org/10.1007/978-3-319-98074-4_4

Freitas, A., Brazdil, P., & Costa-Pereira, A. (2009). Cost-

sensitive learning in medicine. In Data Mining and

Medical Knowledge Management: Cases and

Applications (pp. 57–75). IGI Global.

https://doi.org/10.4018/978-1-60566-218-3.ch003

Galdran, A., Dolz, J., Chakor, H., Lombaert, H., & Ben

Ayed, I. (2020). Cost-Sensitive Regularisation for

Diabetic Retinopathy Grading from Eye Fundus

Images. Lecture Notes in Computer Science (Including

Subseries Lecture Notes in Artificial Intelligence and

Lecture Notes in Bioinformatics), 12265 LNCS, 665–

674. https://doi.org/10.1007/978-3-030-59722-1_64/

COVER

Gan, D., Shen, J., An, B., Xu, M., & Liu, N. (2020).

Integrating TANBN with cost sensitive classification

algorithm for imbalanced data in medical diagnosis.

Computers & Industrial Engineering, 140, 106266.

https://doi.org/10.1016/J.CIE.2019.106266

Hu, K., Huang, Y., Huang, W., Tan, H., Chen, Z., Zhong,

Z., … Gao, X. (2021). Deep supervised learning using

self-adaptive auxiliary loss for COVID-19 diagnosis

from imbalanced CT images. Neurocomputing, 458,

232–245. https://doi.org/10.1016/J.NEUCOM.2021.0

6.012

Johnson, K. W., Torres Soto, J., Glicksberg, B. S., Shameer,

K., Miotto, R., Ali, M., … Dudley, J. T. (2018).

Artificial Intelligence in Cardiology. Journal of the

American College of Cardiology, 71(23), 2668–2679.

https://doi.org/10.1016/J.JACC.2018.03.521

Khan, S. H., Hayat, M., Bennamoun, M., Sohel, F. A., &

Togneri, R. (2018). Cost-sensitive learning of deep

feature representations from imbalanced data. IEEE

Transactions on Neural Networks and Learning

Systems, 29(8), 3573–3587. https://doi.org/10.1109/

TNNLS.2017.2732482

Khushi, M., Shaukat, K., Alam, T. M., Hameed, I. A.,

Uddin, S., Luo, S., … Reyes, M. C. (2021). A

Comparative Performance Analysis of Data

Resampling Methods on Imbalance Medical Data.

IEEE Access, 9, 109960–109975. https://doi.org/

10.1109/ACCESS.2021.3102399

Kitchenham, B., & Charters, S. (2007, April). Guidelines

for performing Systematic Literature Reviews in

Software Engineering. Technical Report, Ver. 2.3

Technical Report EBSE.

Liu, Y., Li, Q., Wang, K., Liu, J., He, R., Yuan, Y., &

Zhang, H. (2021). Automatic Multi-Label ECG

Classification with Category Imbalance and Cost-

Sensitive Thresholding. Biosensors, 11(11), 453.

https://doi.org/10.3390/BIOS11110453

López, V., Fernández, A., García, S., Palade, V., & Herrera,

F. (2013). An insight into classification with

imbalanced data: Empirical results and current trends

on using data intrinsic characteristics. Information

Sciences, 250

, 113–141. https://doi.org/10.1016/

J.INS.2013.07.007

Mental health: neurological disorders. (2016). Retrieved

March 16, 2023, from https://www.who.int/news-

room/questions-and-answers/item/mental-health-

neurological-disorders

Petersen, K., Feldt, R., Mujtaba, S., & Mattsson, M. (2008).

Systematic Mapping Studies in Software Engineering.

12th International Conference on Evaluation and

Assessment in Software Engineering, EASE 2008.

https://doi.org/10.14236/EWIC/EASE2008.8

Petersen, K., Vakkalanka, S., & Kuzniarz, L. (2015).

Guidelines for conducting systematic mapping studies

in software engineering: An update. Information and

Software Technology, 64, 1–18. https://doi.org/

10.1016/j.infsof.2015.03.007

Sahin, Y., Bulkan, S., & Duman, E. (2013). A cost-sensitive

decision tree approach for fraud detection. Expert

Systems with Applications, 40(15), 5916–5923.

https://doi.org/10.1016/J.ESWA.2013.05.021

Specialty Profiles | Careers in Medicine. (2023). Retrieved

March 15, 2023, from https://careersinmedicine.a

amc.org/explore-options/specialty-profiles

Sterner, P., Goretzko, D., & Pargent, F. (2021). Everything

has its price: Foundations of cost-sensitive learning and

its application in psychology. [Preprint]. PsyArXiv.

Https://Doi. Org/10.31234/Osf. Io/7asgz.

Sung, S. F., Hung, L. C., & Hu, Y. H. (2021). Developing

a stroke alert trigger for clinical decision support at

emergency triage using machine learning. International

Journal of Medical Informatics, 152. https://doi.org/

10.1016/J.IJMEDINF.2021.104505

Wang, K. J., Makond, B., & Wang, K. M. (2013). An

improved survivability prognosis of breast cancer by

using sampling and feature selection technique to solve

imbalanced patient classification data. BMC Medical

Informatics and Decision Making, 13(1), 124.

https://doi.org/10.1186/1472-6947-13-124

Yang, H., Li, X., Cao, H., Cui, Y., Luo, Y., Liu, J., &

Zhang, Y. (2021). Using machine learning methods to

predict hepatic encephalopathy in cirrhotic patients

with unbalanced data. Computer Methods and

Programs in Biomedicine, 211. https://doi.org/10.1016/

J.CMPB.2021.106420

Zhao, Y., Wong, Z. S. Y., & Tsui, K. L. (2018). A

Framework of Rebalancing Imbalanced Healthcare

Data for Rare Events' Classification: A Case of Look-

Alike Sound-Alike Mix-Up Incident Detection. Journal

of Healthcare Engineering, 2018, 6275435.

https://doi.org/10.1155/2018/627543

KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval

272