The Advancements and Applications of Artificial Intelligence in

Gastric Cancer Diagnosis

Zilong Xu

Computer Science and Software Engineering, Fuzhou University, Fuzhou, China

Keywords: Gastric cancer, Artificial Intelligence (AI), Machine learning, Deep learning, Convolutional Neural Network

(CNN).

Abstract: Gastric cancer, a common and deadly malignancy with early detection challenges, benefits from Artificial

Intelligence (AI)-enhanced diagnosis, offering faster and cost-effective solutions through advanced imaging

and data analysis. This study aims to provide a comprehensive review of the detection of gastric cancer by

AI. The research methods mainly include machine learning and deep learning aspects, covering the

Convolutional Neural Network (CNN) and random forest and other methods. In terms of traditional machine

learning methods, this paper detailed the application of random forest and Support Vector Machine (SVM) in

the detection of gastric cancer. Random forest is used to predict patient survival status, improving the

generalization ability of the algorithm by weighting methods. The SVM is used to identify Microsatellite

Instability (MSI) and Lymph Node Metastasis (LNM) to provide doctors with important information to guide

treatment decisions. In terms of deep learning methods, this paper focused on the application of CNNs for

gastric cancer detection. The research team developed a model for the detection and depth prediction of Early

Gastric Cancer (EGC), which improved the detection accuracy of EGC by segmenting endoscopic images

and classifying them using the VGG-16 model. The discussion section discusses in detail the shortcomings

of AI models in gastric cancer detection, such as poor interpretability, insufficient data diversity, difficulties

with physician and model coordination, and privacy and ethical issues. Relevant suggestions are made,

including injecting more domain knowledge, enhancing data diversity, optimizing real-time models,

enhancing collaboration between doctors and models, and adopting privacy protection technologies.

1 INTRODUCTION

Gastric cancer is a malignant tumor originating from

gastric tissue, which usually develops relatively

slowly and may have no obvious symptoms at the

initial stage. The harm of gastric cancer is mainly

reflected in the following aspects: It is challenging to

detect at the early stage, because the initial symptoms

of gastric cancer are atypical, and patients may feel

slight discomfort or indigestion, which is easy to be

ignored. Therefore, many times have progressed to

the end of detection. Nowadays, the treatment of

gastric cancer is mainly based on hospital diagnosis,

which leads to poor real-time performance required to

be improved, and high labor costs. Therefore, it is

necessary to find a better auxiliary method. Artificial

intelligence has the powerful capability to extract

high-level representations and predict based on the

image analysis, data integration and analysis, and

surgical assistance, which can be considered in this

case.

Machine learning is a very important application

of artificial intelligence, which uses mathematical

methods to solve this mathematical model, so as to

solve the problems in real life. The mainstream

algorithms of it include many methods, such as

random forest, Support Vector Machine (SVM). In

addition, there are also neural network methods for

deep learning. Random forest is a classifier of tree

structure, and the most popular tree is selected when

there are many trees in the structure (Le Gall, 2005).

The SVM is an algorithm used to solve the binary

classification problem, where the goal is to find a

division so that all the elements of the two classes to

this division add up to the maximum (Cortes &

Vapnik, 1995). A neural network is a computational

model that simulates the working mode of human

brain nerves (Domingos, 2012).

Xu, Z.

The Advancements and Applications of Artiﬁcial Intelligence in Gastric Cancer Diagnosis.

DOI: 10.5220/0012838200004547

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 1st International Conference on Data Science and Engineering (ICDSE 2024), pages 339-343

ISBN: 978-989-758-690-3

339

In recent years, many studies have considered the

combination of artificial intelligence algorithms and

medicine, especially for gastric cancer detection, with

specific related work. For instance, Yohei et al

constructed a Convolutional Neural Network (CNN)

using over 13,000 EGD images and tested it by

comparing its ability to diagnose early gastric cancer

with that of many endoscopists (Ikenoyama et al,

2021). Hong et al built an optimized Early Gastric

Cancer (EGC) detection and depth prediction model

and studied the influencing factors of Artificial

Intelligence (AI) diagnosis (Yoon et al, 2019). Cheng

Xu et al used a weighted improved random forest

algorithm to complete the gastric cancer test of the

110,697 patients (Xu et al, 2022). Shuang-Li Zhu et

al used Gradient Boosting Decision Tree (GBDT), a

type of machine learning method, to construct a

predictive model for the diagnosis of gastric cancer

and evaluate the accuracy of the model (Zhu et al,

2020). Due to the rapid development of this field and

its great importance, it is necessary to make a

comprehensive review of this field.

The rest of this article contains three sections,

which are methods, discussion and conclusions. In the

second part, this paper will investigate some classic

machine learning and deep learning algorithms and

gastric cancer detection work and introduce their

implementation process. In the third part, it discusses

the shortcomings of the current algorithm and the

future development direction, and in the fourth part,

the conclusion summarizes the full text.

2 METHOD

2.1 Framework of Machine Learning

Model

Machine learning model in gastric cancer medical

prediction, it mainly consists of 6 procedures

including collecting data, data preprocessing, feature

extraction, model building, model testing and

evaluation as well as model deployment as shown in

Fig. 1. More details can be found as below.

Figure 1: The detailed workflow of developing machine learning models (Morris).

Collecting data: Multiple patient-related data

were collected, including clinical medical records,

imaging data (e.g. CT, MRI scans), laboratory

findings, etc. A large-scale patient data is also

required to cover information on different cases,

different disease stages and treatment options.

Data preprocessing: Processing patient data,

including processing missing values, outliers,

standardized data, etc. Integrate data from different

sources to ensure data consistency and availability.

Data processing is specific to gastric cancer, such as

labeling of tumor size, location, pathological type,

etc.

Feature Extraction: Extracting key features from

the raw data, which can include extraction of features

ICDSE 2024 - International Conference on Data Science and Engineering

340

such as texture, shape, from images or key indicators

from clinical data.

Model building: Select the appropriate machine

learning models, such as support vector machines,

deep learning neural networks, decision trees, etc.

Depending on the task, classification models (for

cancer detection), regression models (for predicting

survival, etc.) can be constructed. The model was

trained using labeled training data to enable it to learn

potential patterns and associations. Optimize the

model parameters to improve the performance.

Model Testing and Evaluation: The performance

of the model was evaluated using independent test

datasets including accuracy, sensitivity, specificity.

The robustness of the model was validated using

techniques such as cross-validation.

Model Deployment: Deploy trained models into

real clinical settings for the diagnosis and prediction

of gastric cancer. The model can be integrated into a

medical information system to support the physician

decision-making process.

2.2 Traditional Machine

Learning-Based Detection

Approaches

2.2.1 Random Forest for Gastric Cancer

Detection

To predict patient survival status for prognosis, help

doctors evaluate treatment decisions, Xu et al propose

a new random forest weighting method and apply it

to data on gastric cancer patients coming from the

Surveillance, Epidemiology, and End Results (SEER)

project. The generalization ability of this weighted

random forest algorithm was evaluated on 10 public

healthcare datasets. Furthermore, for the same

weighted pattern, the difference between using the

outer pouch data (OOB) and all the training sets were

explored. Next comes the specific approach Bagging

tree is a means to extract the sample features. First,

they used 110,697 patient cases from 1975 to 2016

from the SEER database as a sample, and then used a

modified bagging tree version of random Forest to

build a decision tree (DT) to extract and process the

data samples. DT is then weighted by using OOB data

weighted random forest to detect the generalization

performance of each tree. Tree-level weighted

random forest (TLWRF) was also used to optimize

the problem of different OOB data on each tree. With

this optimization, it is possible to predict the survival

status of the patients (Xu et al, 2022).

Random survival forests (RSF) were used by

Adham et al to identify important factors affecting

gastric cancer outcomes. Data from 128 patients with

gastric cancer were first collected through a historical

cohort study conducted in Hamadan district, Iran

from 2007 to 2013. The RSF is an influential related

variable that can be found in the covariate set. They

used this algorithm to obtain each tree with different

influencing factors, and then calculated the

cumulative hazard function (CHF) for each tree. The

variable importance (VIMP) and Harrell's

concordance index (C-index) were calculated, and the

R software was used to analyze the data. The final

mean C-index was calculated from the results

obtained from 1000 sets of bootstrap data (Adham et

al, 2017).

2.2.2 Support Vector Machine for Gastric

Cancer Detection

To identify the classification of Microsatellite

instability (MSI) in cancer patients, Tao Chen et al

developed a model for MSI-related prediction in

gastric cancer patients. The researchers first used

gastric adenocarcinoma data from the Cancer

Genome Atlas (TCGA) as a sample, and then

normalized the data. RMS software and MATLAB

software are then used to extract features and reduce

redundancy. MSI was then classified using the

IncRNAs model of SVM and its accuracy was

evaluated by C index (Chen et al, 2019).

Lymph Node Metastasis (LNM) is an important

factor affecting the life safety of patients, but several

commonly used gastric imaging methods cannot

achieve high sensitivity and specificity, so they

cannot evaluate the status of gastric cancer lymph

nodes well. Xiao-Peng Zhang et al uses support

vector machine technology to solve this problem. The

researchers first selected 175 gastric cancer patients

who received MDCT before surgery and used

univariate analysis to obtain the relationship between

different characteristics of gastric cancer and lymph

node metastasis. These indicators are used as input to

the support vector machine and output the patient's

lymph node metastasis. A 5-fold cross-validation was

then used to train and test the aforementioned SVM

model (Zhang et al, 2011).

2.3 Deep Learning-based Detection

Approaches

2.3.1 Convolutional Neural Network for

Gastric Cancer Detection

Hong Jin Yoon et al Developed an EGC detection and

depth prediction model to help in the diagnosis of

The Advancements and Applications of Artiﬁcial Intelligence in Gastric Cancer Diagnosis

341

EGC. The study team first divided the endoscopic

maps into EGC and non-EGC using the Visual

Geometry Group (VGG) -16 model. A loss function

was then used to simultaneously measure

classification and localization errors simultaneously,

and then 11,539 endoscopic images collected were

used as samples of the experiment to obtain the

probability of EGC detection and deep neural

network detection (Yoon et al, 2019).

To automatically detect endoscopic images of

gastric cancer, Toshiaki Hirasawa et al developed a

CNN. The team collected 13,584 images from 2,639

cancer lesions, and the team followed them up with a

deep neural network called the Single Shot MultiBox

Detector (SSD) to build this CNN algorithm. To

detect the accuracy of the CNN, the team also

collected 2,296 images to serve as an independent test

set and apply them to the CNN. Finally, by using the

input data to this model, to see whether the output of

multiple input figures of the same lesion is consistent,

if it is regarded as the correct answer (Hirasawa et al,

2018).

3 DISCUSSION

Although significant progresses have been achieved

in the past, there are still some deficiencies in the

application of AI model in gastric cancer detection:

1) Poor interpretability. In the medical field, the

explanatory nature of the model is very crucial

(Cheng et al, 2020, Zhang et al, 2017). Physicians

need to understand the decision-making process of

the model in order to trust and accept the suggestions

of the model. AI models may sometimes be different

from the concerns that humans need, so they are

questioned by doctors and patients. To solve this

problem, the AI model developed later needs to inject

more domain knowledge and combine it with expert

advice at the time of training, while extracting the key

parts to the AI model.

2) Lack of sufficient diversity and

representativeness. AI models may lack

representation of various populations and different

cases due to insufficient training data. This may lead

to decreased performance of the model on specific

patient populations or in rare cases. To address this

issue, it should be ensured that the training data cover

different populations, disease stages and disease

types to improve the robustness and applicability of

the model (Qiu et al, 2022).

3) Lack of real-time and immediate feedback: In

gastric cancer detection, timely results are crucial for

the treatment and decision-making process. Some AI

models may have problems with slow processing and

an inability to provide real-time feedback, which may

affect their practical application in the clinical setting.

To solve this problem, the inference speed of the

model can be optimized, and techniques such as

lightweight model structure or hardware acceleration

can be adopted to ensure that the model achieves

better performance in real-time performance.

4) Collaborative difficulties between doctors and

AI models: In practical clinical scenarios, the

collaboration between doctors and AI models may

face communication barriers and operational

difficulties. There may be uncertainty among doctors

about how to understand, interpret, and integrate the

output of the AI model, leading to the model's

recommendations not being fully utilized. To address

this problem, physician understanding of AI models

can be strengthened through regular training and

educational activities, and closer collaborative

mechanisms can be established to ensure that doctors

can fully use the information provided by the model

to make more accurate diagnosis and treatment

decisions.

5) Privacy and ethical issues: When using the AI

model for gastric cancer detection, the personal health

information and medical data of the patients are

involved (Kaissis et al, 2021, Ziller et al, 2021).

Protecting patient privacy is a serious challenge,

especially in the context of data sharing and model

deployment. To address this issue, privacy protection

technologies, such as differential privacy, can be used

to ensure the security of patient data. In addition, a

transparent ethical framework and regulations are

established to regulate the use of AI models in the

medical field to balance the relationship between

technological innovation and patients' rights and

interests and improve public trust in AI models.

4 CONCLUSION

In this paper, a review of the detection of gastric

cancer using the AI model was completed, and the

discussed research method is mainly based on

artificial intelligence methods. It mainly focuses on

ML and DL, including CNN and RF methods.

Additionally, this paper deeply discusses the

shortcomings of AI prediction model and puts

forward corresponding suggestions. The application

of AI in gastric cancer identification has had a

significant impact on the field. First of all, AI

technology can improve the diagnosis rate, through

automated and intelligent methods, faster analysis of

large amounts of medical imaging data, to provide

ICDSE 2024 - International Conference on Data Science and Engineering

342

doctors with timely and accurate auxiliary diagnosis.

Secondly, the application of AI can reduce the

medical cost, reduce the burden of medical staff,

improve work efficiency, while reducing the

possibility of repeated examination and misdiagnosis.

However, the paper also points out the shortcomings

of the current model. Models that mainly focus on

CNN and RF may have limitations in specific aspects,

such as poor interpretability and insufficient ability to

deal with uncertainty. To solve these problems, this

paper puts forward the future prospect. It includes

expanding the research direction of the model e.g.

exploring the video dynamic detection and the

introduction of recurrent neural network. They can

further improve the performance and applicability of

the model.

REFERENCES

J. F. Le Gall, Random trees and applications, (2005)

C. Cortes, V. Vapnik, Machine Learning 20, 273-297

(1995)

P. Domingos, Commun. ACM 55(10), 78-87 (2012)

Y. Ikenoyama, T. Hirasawa, M. Ishioka, K. Namikawa, S.

Yoshimizu, Y. Horiuchi, T. Tada, Dig. Endosc. 33(1),

141-150 (2021)

H.J. Yoon, S. Kim, J.H. Kim, J.S. Keum, S.I. Oh, J. Jo, S.H.

Noh, J. Clin. Med. 8(9), 1310 (2019)

C. Xu, J. Wang, T. Zheng, Y. Cao, F. Ye, Arch. Med. Sci.

18(5), 1208 (2022)

Morris. Buidling a Machine Learning Platform

https://mlops.community/building-a-machine-

learning-platform/

S. L. Zhu, J. Dong, C. Zhang, Y.B. Huang, W. Pan, PLoS

One 15(12), e0244869 (2020)

D. Adham, N. Abbasgholizadeh, M. Abazari, Asian Pac. J.

Cancer Prev. 18(1), 129 (2017)

T. Chen, C. Zhang, Y. Liu, Y. Zhao, D. Lin, Y. Hu, G. Li,

BMC Genomics 20(1), 1-7 (2019)

X.P. Zhang, Z.L. Wang, L. Tang, Y.S. Sun, K. Cao, Y. Gao,

BMC Cancer 11(1), 1-6 (2011)

T. Hirasawa, K. Aoyama, T. Tanimoto, S. Ishihara, S.

Shichijo, T. Ozawa, T. Tada, Gastric Cancer 21, 653-

660 (2018)

K. Cheng, N. Wang, M. Li, Interpretability of deep

learning: A survey, in The International Conference on

Natural Computation, Fuzzy Systems and Knowledge

Discovery, 475-486. Cham: Springer International

Publishing (2020)

Z. Zhang, Y. Xie, F. Xing, M. McGough, L. Yang, Mdnet:

A semantically and visually interpretable medical

image diagnosis network, in Proceedings of the IEEE

Conference on Computer Vision and Pattern

Recognition, 6428-6436 (2017)

Y. Qiu, J. Wang, Z. Jin, H. Chen, M. Zhang, L. Guo,

Biomed. Signal Process. Control 72, 103323 (2022)

G. Kaissis, A. Ziller, J. Passerat-Palmbach, T. Ryffel, D.

Usynin, A. Trask, I. Lima Jr, J. Mancuso, F. Jungmann,

M.M. Steinborn, A. Saleh, Nat. Mach. Intell. 3(6), 473-

484 (2021)

A. Ziller, D. Usynin, R. Braren, M. Makowski, D. Rueckert,

G. Kaissis, Sci. Rep. 11(1), 13524 (2021)

The Advancements and Applications of Artiﬁcial Intelligence in Gastric Cancer Diagnosis

343