The Advancements and Applications of Artificial Intelligence in
Gastric Cancer Diagnosis
Zilong Xu
Computer Science and Software Engineering, Fuzhou University, Fuzhou, China
Keywords: Gastric cancer, Artificial Intelligence (AI), Machine learning, Deep learning, Convolutional Neural Network
(CNN).
Abstract: Gastric cancer, a common and deadly malignancy with early detection challenges, benefits from Artificial
Intelligence (AI)-enhanced diagnosis, offering faster and cost-effective solutions through advanced imaging
and data analysis. This study aims to provide a comprehensive review of the detection of gastric cancer by
AI. The research methods mainly include machine learning and deep learning aspects, covering the
Convolutional Neural Network (CNN) and random forest and other methods. In terms of traditional machine
learning methods, this paper detailed the application of random forest and Support Vector Machine (SVM) in
the detection of gastric cancer. Random forest is used to predict patient survival status, improving the
generalization ability of the algorithm by weighting methods. The SVM is used to identify Microsatellite
Instability (MSI) and Lymph Node Metastasis (LNM) to provide doctors with important information to guide
treatment decisions. In terms of deep learning methods, this paper focused on the application of CNNs for
gastric cancer detection. The research team developed a model for the detection and depth prediction of Early
Gastric Cancer (EGC), which improved the detection accuracy of EGC by segmenting endoscopic images
and classifying them using the VGG-16 model. The discussion section discusses in detail the shortcomings
of AI models in gastric cancer detection, such as poor interpretability, insufficient data diversity, difficulties
with physician and model coordination, and privacy and ethical issues. Relevant suggestions are made,
including injecting more domain knowledge, enhancing data diversity, optimizing real-time models,
enhancing collaboration between doctors and models, and adopting privacy protection technologies.
1 INTRODUCTION
Gastric cancer is a malignant tumor originating from
gastric tissue, which usually develops relatively
slowly and may have no obvious symptoms at the
initial stage. The harm of gastric cancer is mainly
reflected in the following aspects: It is challenging to
detect at the early stage, because the initial symptoms
of gastric cancer are atypical, and patients may feel
slight discomfort or indigestion, which is easy to be
ignored. Therefore, many times have progressed to
the end of detection. Nowadays, the treatment of
gastric cancer is mainly based on hospital diagnosis,
which leads to poor real-time performance required to
be improved, and high labor costs. Therefore, it is
necessary to find a better auxiliary method. Artificial
intelligence has the powerful capability to extract
high-level representations and predict based on the
image analysis, data integration and analysis, and
surgical assistance, which can be considered in this
case.
Machine learning is a very important application
of artificial intelligence, which uses mathematical
methods to solve this mathematical model, so as to
solve the problems in real life. The mainstream
algorithms of it include many methods, such as
random forest, Support Vector Machine (SVM). In
addition, there are also neural network methods for
deep learning. Random forest is a classifier of tree
structure, and the most popular tree is selected when
there are many trees in the structure (Le Gall, 2005).
The SVM is an algorithm used to solve the binary
classification problem, where the goal is to find a
division so that all the elements of the two classes to
this division add up to the maximum (Cortes &
Vapnik, 1995). A neural network is a computational
model that simulates the working mode of human
brain nerves (Domingos, 2012).
Xu, Z.
The Advancements and Applications of Artificial Intelligence in Gastric Cancer Diagnosis.
DOI: 10.5220/0012838200004547
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 1st International Conference on Data Science and Engineering (ICDSE 2024), pages 339-343
ISBN: 978-989-758-690-3
Proceedings Copyright © 2024 by SCITEPRESS – Science and Technology Publications, Lda.
339
In recent years, many studies have considered the
combination of artificial intelligence algorithms and
medicine, especially for gastric cancer detection, with
specific related work. For instance, Yohei et al
constructed a Convolutional Neural Network (CNN)
using over 13,000 EGD images and tested it by
comparing its ability to diagnose early gastric cancer
with that of many endoscopists (Ikenoyama et al,
2021). Hong et al built an optimized Early Gastric
Cancer (EGC) detection and depth prediction model
and studied the influencing factors of Artificial
Intelligence (AI) diagnosis (Yoon et al, 2019). Cheng
Xu et al used a weighted improved random forest
algorithm to complete the gastric cancer test of the
110,697 patients (Xu et al, 2022). Shuang-Li Zhu et
al used Gradient Boosting Decision Tree (GBDT), a
type of machine learning method, to construct a
predictive model for the diagnosis of gastric cancer
and evaluate the accuracy of the model (Zhu et al,
2020). Due to the rapid development of this field and
its great importance, it is necessary to make a
comprehensive review of this field.
The rest of this article contains three sections,
which are methods, discussion and conclusions. In the
second part, this paper will investigate some classic
machine learning and deep learning algorithms and
gastric cancer detection work and introduce their
implementation process. In the third part, it discusses
the shortcomings of the current algorithm and the
future development direction, and in the fourth part,
the conclusion summarizes the full text.
2 METHOD
2.1 Framework of Machine Learning
Model
Machine learning model in gastric cancer medical
prediction, it mainly consists of 6 procedures
including collecting data, data preprocessing, feature
extraction, model building, model testing and
evaluation as well as model deployment as shown in
Fig. 1. More details can be found as below.
Figure 1: The detailed workflow of developing machine learning models (Morris).
Collecting data: Multiple patient-related data
were collected, including clinical medical records,
imaging data (e.g. CT, MRI scans), laboratory
findings, etc. A large-scale patient data is also
required to cover information on different cases,
different disease stages and treatment options.
Data preprocessing: Processing patient data,
including processing missing values, outliers,
standardized data, etc. Integrate data from different
sources to ensure data consistency and availability.
Data processing is specific to gastric cancer, such as
labeling of tumor size, location, pathological type,
etc.
Feature Extraction: Extracting key features from
the raw data, which can include extraction of features
ICDSE 2024 - International Conference on Data Science and Engineering
340
such as texture, shape, from images or key indicators
from clinical data.
Model building: Select the appropriate machine
learning models, such as support vector machines,
deep learning neural networks, decision trees, etc.
Depending on the task, classification models (for
cancer detection), regression models (for predicting
survival, etc.) can be constructed. The model was
trained using labeled training data to enable it to learn
potential patterns and associations. Optimize the
model parameters to improve the performance.
Model Testing and Evaluation: The performance
of the model was evaluated using independent test
datasets including accuracy, sensitivity, specificity.
The robustness of the model was validated using
techniques such as cross-validation.
Model Deployment: Deploy trained models into
real clinical settings for the diagnosis and prediction
of gastric cancer. The model can be integrated into a
medical information system to support the physician
decision-making process.
2.2 Traditional Machine
Learning-Based Detection
Approaches
2.2.1 Random Forest for Gastric Cancer
Detection
To predict patient survival status for prognosis, help
doctors evaluate treatment decisions, Xu et al propose
a new random forest weighting method and apply it
to data on gastric cancer patients coming from the
Surveillance, Epidemiology, and End Results (SEER)
project. The generalization ability of this weighted
random forest algorithm was evaluated on 10 public
healthcare datasets. Furthermore, for the same
weighted pattern, the difference between using the
outer pouch data (OOB) and all the training sets were
explored. Next comes the specific approach Bagging
tree is a means to extract the sample features. First,
they used 110,697 patient cases from 1975 to 2016
from the SEER database as a sample, and then used a
modified bagging tree version of random Forest to
build a decision tree (DT) to extract and process the
data samples. DT is then weighted by using OOB data
weighted random forest to detect the generalization
performance of each tree. Tree-level weighted
random forest (TLWRF) was also used to optimize
the problem of different OOB data on each tree. With
this optimization, it is possible to predict the survival
status of the patients (Xu et al, 2022).
Random survival forests (RSF) were used by
Adham et al to identify important factors affecting
gastric cancer outcomes. Data from 128 patients with
gastric cancer were first collected through a historical
cohort study conducted in Hamadan district, Iran
from 2007 to 2013. The RSF is an influential related
variable that can be found in the covariate set. They
used this algorithm to obtain each tree with different
influencing factors, and then calculated the
cumulative hazard function (CHF) for each tree. The
variable importance (VIMP) and Harrell's
concordance index (C-index) were calculated, and the
R software was used to analyze the data. The final
mean C-index was calculated from the results
obtained from 1000 sets of bootstrap data (Adham et
al, 2017).
2.2.2 Support Vector Machine for Gastric
Cancer Detection
To identify the classification of Microsatellite
instability (MSI) in cancer patients, Tao Chen et al
developed a model for MSI-related prediction in
gastric cancer patients. The researchers first used
gastric adenocarcinoma data from the Cancer
Genome Atlas (TCGA) as a sample, and then
normalized the data. RMS software and MATLAB
software are then used to extract features and reduce
redundancy. MSI was then classified using the
IncRNAs model of SVM and its accuracy was
evaluated by C index (Chen et al, 2019).
Lymph Node Metastasis (LNM) is an important
factor affecting the life safety of patients, but several
commonly used gastric imaging methods cannot
achieve high sensitivity and specificity, so they
cannot evaluate the status of gastric cancer lymph
nodes well. Xiao-Peng Zhang et al uses support
vector machine technology to solve this problem. The
researchers first selected 175 gastric cancer patients
who received MDCT before surgery and used
univariate analysis to obtain the relationship between
different characteristics of gastric cancer and lymph
node metastasis. These indicators are used as input to
the support vector machine and output the patient's
lymph node metastasis. A 5-fold cross-validation was
then used to train and test the aforementioned SVM
model (Zhang et al, 2011).
2.3 Deep Learning-based Detection
Approaches
2.3.1 Convolutional Neural Network for
Gastric Cancer Detection
Hong Jin Yoon et al Developed an EGC detection and
depth prediction model to help in the diagnosis of
The Advancements and Applications of Artificial Intelligence in Gastric Cancer Diagnosis
341
EGC. The study team first divided the endoscopic
maps into EGC and non-EGC using the Visual
Geometry Group (VGG) -16 model. A loss function
was then used to simultaneously measure
classification and localization errors simultaneously,
and then 11,539 endoscopic images collected were
used as samples of the experiment to obtain the
probability of EGC detection and deep neural
network detection (Yoon et al, 2019).
To automatically detect endoscopic images of
gastric cancer, Toshiaki Hirasawa et al developed a
CNN. The team collected 13,584 images from 2,639
cancer lesions, and the team followed them up with a
deep neural network called the Single Shot MultiBox
Detector (SSD) to build this CNN algorithm. To
detect the accuracy of the CNN, the team also
collected 2,296 images to serve as an independent test
set and apply them to the CNN. Finally, by using the
input data to this model, to see whether the output of
multiple input figures of the same lesion is consistent,
if it is regarded as the correct answer (Hirasawa et al,
2018).
3 DISCUSSION
Although significant progresses have been achieved
in the past, there are still some deficiencies in the
application of AI model in gastric cancer detection:
1) Poor interpretability. In the medical field, the
explanatory nature of the model is very crucial
(Cheng et al, 2020, Zhang et al, 2017). Physicians
need to understand the decision-making process of
the model in order to trust and accept the suggestions
of the model. AI models may sometimes be different
from the concerns that humans need, so they are
questioned by doctors and patients. To solve this
problem, the AI model developed later needs to inject
more domain knowledge and combine it with expert
advice at the time of training, while extracting the key
parts to the AI model.
2) Lack of sufficient diversity and
representativeness. AI models may lack
representation of various populations and different
cases due to insufficient training data. This may lead
to decreased performance of the model on specific
patient populations or in rare cases. To address this
issue, it should be ensured that the training data cover
different populations, disease stages and disease
types to improve the robustness and applicability of
the model (Qiu et al, 2022).
3) Lack of real-time and immediate feedback: In
gastric cancer detection, timely results are crucial for
the treatment and decision-making process. Some AI
models may have problems with slow processing and
an inability to provide real-time feedback, which may
affect their practical application in the clinical setting.
To solve this problem, the inference speed of the
model can be optimized, and techniques such as
lightweight model structure or hardware acceleration
can be adopted to ensure that the model achieves
better performance in real-time performance.
4) Collaborative difficulties between doctors and
AI models: In practical clinical scenarios, the
collaboration between doctors and AI models may
face communication barriers and operational
difficulties. There may be uncertainty among doctors
about how to understand, interpret, and integrate the
output of the AI model, leading to the model's
recommendations not being fully utilized. To address
this problem, physician understanding of AI models
can be strengthened through regular training and
educational activities, and closer collaborative
mechanisms can be established to ensure that doctors
can fully use the information provided by the model
to make more accurate diagnosis and treatment
decisions.
5) Privacy and ethical issues: When using the AI
model for gastric cancer detection, the personal health
information and medical data of the patients are
involved (Kaissis et al, 2021, Ziller et al, 2021).
Protecting patient privacy is a serious challenge,
especially in the context of data sharing and model
deployment. To address this issue, privacy protection
technologies, such as differential privacy, can be used
to ensure the security of patient data. In addition, a
transparent ethical framework and regulations are
established to regulate the use of AI models in the
medical field to balance the relationship between
technological innovation and patients' rights and
interests and improve public trust in AI models.
4 CONCLUSION
In this paper, a review of the detection of gastric
cancer using the AI model was completed, and the
discussed research method is mainly based on
artificial intelligence methods. It mainly focuses on
ML and DL, including CNN and RF methods.
Additionally, this paper deeply discusses the
shortcomings of AI prediction model and puts
forward corresponding suggestions. The application
of AI in gastric cancer identification has had a
significant impact on the field. First of all, AI
technology can improve the diagnosis rate, through
automated and intelligent methods, faster analysis of
large amounts of medical imaging data, to provide
ICDSE 2024 - International Conference on Data Science and Engineering
342
doctors with timely and accurate auxiliary diagnosis.
Secondly, the application of AI can reduce the
medical cost, reduce the burden of medical staff,
improve work efficiency, while reducing the
possibility of repeated examination and misdiagnosis.
However, the paper also points out the shortcomings
of the current model. Models that mainly focus on
CNN and RF may have limitations in specific aspects,
such as poor interpretability and insufficient ability to
deal with uncertainty. To solve these problems, this
paper puts forward the future prospect. It includes
expanding the research direction of the model e.g.
exploring the video dynamic detection and the
introduction of recurrent neural network. They can
further improve the performance and applicability of
the model.
REFERENCES
J. F. Le Gall, Random trees and applications, (2005)
C. Cortes, V. Vapnik, Machine Learning 20, 273-297
(1995)
P. Domingos, Commun. ACM 55(10), 78-87 (2012)
Y. Ikenoyama, T. Hirasawa, M. Ishioka, K. Namikawa, S.
Yoshimizu, Y. Horiuchi, T. Tada, Dig. Endosc. 33(1),
141-150 (2021)
H.J. Yoon, S. Kim, J.H. Kim, J.S. Keum, S.I. Oh, J. Jo, S.H.
Noh, J. Clin. Med. 8(9), 1310 (2019)
C. Xu, J. Wang, T. Zheng, Y. Cao, F. Ye, Arch. Med. Sci.
18(5), 1208 (2022)
Morris. Buidling a Machine Learning Platform
https://mlops.community/building-a-machine-
learning-platform/
S. L. Zhu, J. Dong, C. Zhang, Y.B. Huang, W. Pan, PLoS
One 15(12), e0244869 (2020)
D. Adham, N. Abbasgholizadeh, M. Abazari, Asian Pac. J.
Cancer Prev. 18(1), 129 (2017)
T. Chen, C. Zhang, Y. Liu, Y. Zhao, D. Lin, Y. Hu, G. Li,
BMC Genomics 20(1), 1-7 (2019)
X.P. Zhang, Z.L. Wang, L. Tang, Y.S. Sun, K. Cao, Y. Gao,
BMC Cancer 11(1), 1-6 (2011)
T. Hirasawa, K. Aoyama, T. Tanimoto, S. Ishihara, S.
Shichijo, T. Ozawa, T. Tada, Gastric Cancer 21, 653-
660 (2018)
K. Cheng, N. Wang, M. Li, Interpretability of deep
learning: A survey, in The International Conference on
Natural Computation, Fuzzy Systems and Knowledge
Discovery, 475-486. Cham: Springer International
Publishing (2020)
Z. Zhang, Y. Xie, F. Xing, M. McGough, L. Yang, Mdnet:
A semantically and visually interpretable medical
image diagnosis network, in Proceedings of the IEEE
Conference on Computer Vision and Pattern
Recognition, 6428-6436 (2017)
Y. Qiu, J. Wang, Z. Jin, H. Chen, M. Zhang, L. Guo,
Biomed. Signal Process. Control 72, 103323 (2022)
G. Kaissis, A. Ziller, J. Passerat-Palmbach, T. Ryffel, D.
Usynin, A. Trask, I. Lima Jr, J. Mancuso, F. Jungmann,
M.M. Steinborn, A. Saleh, Nat. Mach. Intell. 3(6), 473-
484 (2021)
A. Ziller, D. Usynin, R. Braren, M. Makowski, D. Rueckert,
G. Kaissis, Sci. Rep. 11(1), 13524 (2021)
The Advancements and Applications of Artificial Intelligence in Gastric Cancer Diagnosis
343