accuracy and recall. ResNet, using deep architectures
and residual learning, excels in tasks requiring the
detection of subtle differences, benefiting from its
efficient data use. Transformers handle complex CT
scans effectively, including those for COVID-19, by
managing long-range dependencies with attention
mechanisms and adaptability to various patch sizes
and pretraining approaches. This analysis highlights
the distinct advantages and contributions of each
model to medical imaging technology.
4 CONCLUSIONS
This research focuses on evaluating deep learning
models such as CNN, ResNet, and Transformer in
medical image processing, with the objective of
enhancing diagnostic accuracy across various
imaging modalities. The study involves
methodological applications and analyses of each
model on different pathological datasets, including
interstitial lung diseases and knee joint injuries,
through ILD and MRI scans.
Extensive experiments were conducted to
evaluate the proposed methods. The experimental
results revealed that CNN excels in automatic feature
extraction, particularly in environments with limited
data and ambiguous visual structures. ResNet
demonstrated superior performance in managing
depth and complexity, significantly enhancing the
model's training and generalization capabilities in
deeper network architectures. Meanwhile,
Transformers displayed their advantage in handling
complex, high-dimensional image data, utilizing their
attention mechanisms to enhance model predictive
capabilities on large and diverse datasets.
Future research will explore integrating
multimodal imaging data to analyze the combined
effects of various imaging modalities using advanced
machine learning frameworks. This aims to enhance
diagnostic precision and robustness, addressing the
limits of single-modality analysis and advancing AI-
driven diagnostic tools in clinical settings, potentially
improving patient outcomes and healthcare efficiency.
REFERENCES
Dai, Y., Gao, Y., & Liu, F. 2021. Transmed: Transformers
advance multi-modal medical image classification.
Diagnostics, 11(8), 1384.
He, K., Gan, C., Li, Z., Rekik, I., Yin, Z., Ji, W., ... & Shen,
D. 2023. Transformers in medical image analysis.
Intelligent Medicine, 3(1), 59-78.
He, K., Zhang, X., Ren, S., & Sun, J. 2015. Spatial pyramid
pooling in deep convolutional networks for visual
recognition. IEEE transactions on pattern analysis and
machine intelligence, 37(9), 1904-1916.
Ildgdb. 2024. Dataset. http://ildgdb.org/
Jannin, P., Krupinski, E., & Warfield, S. K. 2006.
Validation in medical image processing. IEEE
Transactions on Medical Imaging, 25(11), 1405-9.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. 2012.
Imagenet classification with deep convolutional neural
networks. Advances in neural information processing
systems, 25.
Lin, M., Chen, Q., & Yan, S. 2013. Network in network.
arXiv preprint arXiv:1312.4400.
Li, Q., Cai, W., Wang, X., Zhou, Y., Feng, D. D., & Chen,
M. 2014. Medical image classification with
convolutional neural network. In 2014 13th
international conference on control automation robotics
& vision (ICARCV) (pp. 844-848). IEEE.
Sarwinda, D., Paradisa, R. H., Bustamam, A., & Anggia, P.
2021. Deep learning in image classification using
residual network (ResNet) variants for detection of
colorectal cancer. Procedia Computer Science, 179,
423-431.
Simonyan, K., & Zisserman, A. 2014. Very deep
convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556.
Sonka, M., & Fitzpatrick, J. M. 2000. Handbook of medical
imaging: Volume 2, Medical image processing and
analysis. SPIE.
Stanford. 2024. mrnet. https://stanfordmlgroup.github.io/
competitions/mrnet/
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles,
A., & Jégou, H. 2021. Training data-efficient image
transformers & distillation through attention. In
International conference on machine learning (pp.
10347-10357). PMLR.