the model trained with traditional cross-entropy loss,
the red regions are concentrated mainly in the middle
of the image, failing to effectively differentiate
between the two classes of samples. The experimental
results and visual inspection demonstrate that as
expected, Ordinal Loss enables the model to better
distinguish between adjacent classes, thus improving
the performance on ordinal regression tasks.
Figure 4: Visual inspection of models trained with two
different loss functions using GradCAM
(Photo/Picture
credit: Original).
4 CONCLUSIONS
This study concentrates on utilizing transformer
models for image classification tasks on MedMNIST
and enhancing the performance of ordinal regression
subtasks using a novel loss function. The MedViT
model, a hybrid architecture combining CNN and
transformer, is employed to classify all 12 2D datasets
in MedMNIST and compared against classical CNN
models. Experimental findings reveal that MedViT,
adept at capturing multi-scale features, showcases
significant advantages over traditional methods,
yielding superior performance across most of the 12
datasets. The development of Ordinal Loss aims to
address the observed performance limitations across
all models on the ordinal regression subdataset,
RetinaMNIST. This loss function combines
traditional cross-entropy loss with Rank Loss,
emphasizing similarity relationships between ordered
categories during model training. Comparative
experiments with unmodified cross-entropy loss
demonstrate that models trained with Ordinal Loss
achieve higher accuracy on RetinaMNIST for ordinal
regression tasks. Visual inspection using GradCAM
further illustrates that Ordinal Loss enables the model
to better discern key features for distinguishing
adjacent categories. In the realm of fine-grained
recognition, certain methods enhance model
performance by learning pairs of intra-class and inter-
class similar samples. In future research, this
approach could also be considered for integration into
the ordinal regression task to further enhance the
model's ability to discern similar samples effectively.
REFERENCES
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn,
D., Zhai, X., Unterthiner, T., ... & Houlsby, N. 2020.
An image is worth 16x16 words: Transformers for
image recognition at scale. arXiv:2010.11929.
He, K., Zhang, X., Ren, S., & Sun, J. 2016. Deep residual
learning for image recognition. In Proceedings of the
IEEE conference on computer vision and pattern
recognition. pp: 770-778.
Heo, B., Yun, S., Han, D., Chun, S., Choe, J., & Oh, S. J.
2021. Rethinking spatial dimensions of vision
transformers. In Proceedings of the IEEE/CVF
international conference on computer vision. pp:
11936-11945.
Hu, Q., Chen, C., Kang, S., Sun, Z., Wang, Y., Xiang, M., ...
& Wang, S. 2022. Application of computer-aided
detection (CAD) software to automatically detect
nodules under SDCT and LDCT scans with different
parameters. Computers in Biology and Medicine, vol.
146, p: 105538.
Hu, W., Li, C., Li, X., Rahaman, M. M., Ma, J., Zhang, Y., ...
& Grzegorzek, M. 2022. GasHisSDB: A new gastric
histopathology image dataset for computer aided
diagnosis of gastric cancer. Computers in biology and
medicine, vol. 142, p: 105207.
Lo, C. M., & Hung, P. H. 2022. Computer-aided diagnosis
of ischemic stroke using multi-dimensional image
features in carotid color Doppler. Computers in Biology
and Medicine, vol. 147, p: 105779.
Manzari, O. N., Ahmadabadi, H., Kashiani, H., Shokouhi,
S. B., & Ayatollahi, A. 2023. MedViT: a robust vision
transformer for generalized medical image
classification. Computers in Biology and Medicine, vol.
157, p: 106791.
Simonyan, K., & Zisserman, A. 2014. Very deep
convolutional networks for large-scale image
recognition. arXiv:1409.1556.
Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., ... & Ni,
B. 2023. Medmnist v2-a large-scale lightweight
benchmark for 2d and 3d biomedical image
classification. Scientific Data, vol. 10(1), p: 41.
Yang, X., & Stamp, M. 2021. Computer-aided diagnosis of
low grade endometrial stromal sarcoma (LGESS).
Computers in Biology and Medicine, vol. 138, p:
104874.