
REFERENCES
Bisong, E. et al. (2019). Building machine learning
and deep learning models on Google cloud platform.
Springer.
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov,
A., and Zagoruyko, S. (2020). End-to-end object de-
tection with transformers. In European conference on
computer vision, pages 213–229. Springer.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.
(2018). Bert: Pre-training of deep bidirectional trans-
formers for language understanding. arXiv preprint
arXiv:1810.04805.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn,
D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer,
M., Heigold, G., Gelly, S., et al. (2020). An image is
worth 16x16 words: Transformers for image recogni-
tion at scale. arXiv preprint arXiv:2010.11929.
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J.,
Blum, M., and Hutter, F. (2015). Efficient and robust
automated machine learning. Advances in neural in-
formation processing systems, 28.
Goodfellow, I. J., Shlens, J., and Szegedy, C. (2014). Ex-
plaining and harnessing adversarial examples. arXiv
preprint arXiv:1412.6572.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
Heo, B., Yun, S., Han, D., Chun, S., Choe, J., and Oh, S. J.
(2021). Rethinking spatial dimensions of vision trans-
formers. In Proceedings of the IEEE/CVF Interna-
tional Conference on Computer Vision, pages 11936–
11945.
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D.,
Wang, W., Weyand, T., Andreetto, M., and Adam,
H. (2017). Mobilenets: Efficient convolutional neu-
ral networks for mobile vision applications. arXiv
preprint arXiv:1704.04861.
Hu, Q., Chen, C., Kang, S., Sun, Z., Wang, Y., Xiang, M.,
Guan, H., Xia, L., and Wang, S. (2022a). Applica-
tion of computer-aided detection (cad) software to au-
tomatically detect nodules under sdct and ldct scans
with different parameters. Computers in Biology and
Medicine, 146:105538.
Hu, W., Li, C., Li, X., Rahaman, M. M., Ma, J., Zhang,
Y., Chen, H., Liu, W., Sun, C., Yao, Y., et al.
(2022b). Gashissdb: A new gastric histopathology
image dataset for computer aided diagnosis of gas-
tric cancer. Computers in biology and medicine,
142:105207.
Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger,
K. Q. (2017). Densely connected convolutional net-
works. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 4700–
4708.
Jin, H., Song, Q., and Hu, X. (2019). Auto-keras: An ef-
ficient neural architecture search system. In Proceed-
ings of the 25th ACM SIGKDD international confer-
ence on knowledge discovery & data mining, pages
1946–1956.
Liu, J., Li, Y., Cao, G., Liu, Y., and Cao, W. (2022a). Fea-
ture pyramid vision transformer for medmnist classifi-
cation decathlon. In 2022 International Joint Confer-
ence on Neural Networks (IJCNN), pages 1–8. IEEE.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin,
S., and Guo, B. (2021). Swin transformer: Hierar-
chical vision transformer using shifted windows. In
Proceedings of the IEEE/CVF international confer-
ence on computer vision, pages 10012–10022.
Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T.,
and Xie, S. (2022b). A convnet for the 2020s. In Pro-
ceedings of the IEEE/CVF conference on computer vi-
sion and pattern recognition, pages 11976–11986.
Lo, C.-M. and Hung, P.-H. (2022). Computer-aided diagno-
sis of ischemic stroke using multi-dimensional image
features in carotid color doppler. Computers in Biol-
ogy and Medicine, 147:105779.
Loshchilov, I. and Hutter, F. (2018). Fixing weight
decay regularization in adam. arXiv preprint
arXiv:2011.08042v1.
Manzari, O. N., Ahmadabadi, H., Kashiani, H., Shokouhi,
S. B., and Ayatollahi, A. (2023). Medvit: a ro-
bust vision transformer for generalized medical image
classification. Computers in Biology and Medicine,
157:106791.
Radford, A., Narasimhan, K., Salimans, T., and Sutskever,
I. (2018). Improving language understanding by gen-
erative pre-training.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,
Anguelov, D., Erhan, D., Vanhoucke, V., and Rabi-
novich, A. (2015). Going deeper with convolutions.
In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 1–9.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.
(2017). Attention is all you need. Advances in neural
information processing systems, 30.
Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D.,
Lu, T., Luo, P., and Shao, L. (2021). Pyramid vi-
sion transformer: A versatile backbone for dense pre-
diction without convolutions. In Proceedings of the
IEEE/CVF international conference on computer vi-
sion, pages 568–578.
Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfis-
ter, H., and Ni, B. (2023). Medmnist v2-a large-scale
lightweight benchmark for 2d and 3d biomedical im-
age classification. Scientific Data, 10(1):41.
Yang, X. and Stamp, M. (2021). Computer-aided diagno-
sis of low grade endometrial stromal sarcoma (lgess).
Computers in Biology and Medicine, 138:104874.
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov,
R. R., and Le, Q. V. (2019). Xlnet: Generalized au-
toregressive pretraining for language understanding.
Advances in neural information processing systems,
32.
Yuan, L., Chen, Y., Wang, T., Yu, W., Shi, Y., Jiang, Z.-H.,
Tay, F. E., Feng, J., and Yan, S. (2021). Tokens-to-
token vit: Training vision transformers from scratch
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence
838