
machine learning, compliant with the MST 3.0 speci-
fication. CoRR, abs/2201.08746.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn,
D., Zhai, X., Unterthiner, T., Dehghani, M., Min-
derer, M., Heigold, G., Gelly, S., Uszkoreit, J., and
Houlsby, N. (2020). An image is worth 16x16 words:
Transformers for image recognition at scale. CoRR,
abs/2010.11929.
Fonseca, F., Nunes, B., Salgado, M., and Cunha, A. (2022).
Abnormality classification in small datasets of cap-
sule endoscopy images. Procedia Computer Science,
196:469–476. International Conference on ENTER-
prise Information Systems / ProjMAN - International
Conference on Project MANagement / HCist - Inter-
national Conference on Health and Social Care Infor-
mation Systems and Technologies 2021.
Goel, N., Kaur, S., Gunjan, D., and Mahapatra, S. (2022).
Dilated cnn for abnormality detection in wireless cap-
sule endoscopy images. Soft Computing, pages 1–17.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR).
Hewett, D. G., Kahi, C. J., and Rex, D. K. (2010). Ef-
ficacy and effectiveness of colonoscopy: how do we
bridge the gap? Gastrointestinal Endoscopy Clinics,
20(4):673–684.
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D.,
Wang, W., Weyand, T., Andreetto, M., and Adam,
H. (2017). Mobilenets: Efficient convolutional neu-
ral networks for mobile vision applications. CoRR,
abs/1704.04861.
Hu, J., Shen, L., and Sun, G. (2018). Squeeze-and-
excitation networks. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR).
IARC/WHO (2022a). Cancer Today - Brazil Fact Sheet.
https://gco.iarc.who.int/media/globocan/factshee
ts/populations/76-brazil-fact-sheet.pdf. Accessed:
13/02/2024.
IARC/WHO (2022b). Cancer Today - World Fact Sheet.
https://gco.iarc.who.int/media/globocan/factsheets
/populations/900- world-fact-sheet.pdf. Accessed:
13/02/2024.
Kaufman, S., Rosset, S., Perlich, C., and Stitelman, O.
(2012). Leakage in data mining: Formulation, de-
tection, and avoidance. ACM Trans. Knowl. Discov.
Data, 6(4).
Kavic, S. M. and Basson, M. D. (2001). Complications
of endoscopy. The American Journal of Surgery,
181(4):319–332.
Li, K., Wu, Z., Peng, K.-C., Ernst, J., and Fu, Y. (2018).
Tell me where to look: Guided attention inference
network. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR).
Lin, K., Yang, H.-F., Hsiao, J.-H., and Chen, C.-S. (2015).
Deep learning of binary hash codes for fast image
retrieval. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR)
Workshops.
Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A.,
Ciompi, F., Ghafoorian, M., van der Laak, J. A., van
Ginneken, B., and S
´
anchez, C. I. (2017). A survey
on deep learning in medical image analysis. Medical
Image Analysis, 42:60–88.
Ma, L., Su, X., Ma, L., Gao, X., and Sun, M. (2023).
Deep learning for classification and localization of
early gastric cancer in endoscopic images. Biomed-
ical Signal Processing and Control, 79:104200.
Muruganantham, P. and Balakrishnan, S. M. (2022). At-
tention aware deep learning model for wireless cap-
sule endoscopy lesion classification and localiza-
tion. Journal of Medical and Biological Engineering,
42(2):157–168.
Ramsoekh, D., Haringsma, J., Poley, J. W., van Putten,
P., van Dekken, H., Steyerberg, E. W., van Leerdam,
M. E., and Kuipers, E. J. (2010). A back-to-back com-
parison of white light video endoscopy with autoflu-
orescence endoscopy for adenoma detection in high-
risk subjects. Gut, 59(6):785–793.
Recht, B., Roelofs, R., Schmidt, L., and Shankar, V. (2019).
Do ImageNet classifiers generalize to ImageNet? In
Chaudhuri, K. and Salakhutdinov, R., editors, Pro-
ceedings of the 36th International Conference on Ma-
chine Learning, volume 97 of Proceedings of Machine
Learning Research, pages 5389–5400. PMLR.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and
Chen, L.-C. (2018). Mobilenetv2: Inverted residu-
als and linear bottlenecks. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recogni-
tion (CVPR).
Selvaraju, R. R., Das, A., Vedantam, R., Cogswell, M.,
Parikh, D., and Batra, D. (2016). Grad-cam: Why
did you say that? visual explanations from deep
networks via gradient-based localization. CoRR,
abs/1610.02391.
Siegel, R. L., Miller, K. D., Goding Sauer, A., Fedewa,
S. A., Butterly, L. F., Anderson, J. C., Cercek, A.,
Smith, R. A., and Jemal, A. (2020). Colorectal cancer
statistics, 2020. CA: A Cancer Journal for Clinicians,
70(3):145–164.
Simonyan, K. and Zisserman, A. (2014). Very deep con-
volutional networks for large-scale image recognition.
arXiv preprint arXiv:1409.1556.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wo-
jna, Z. (2016). Rethinking the inception architecture
for computer vision. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR).
Tan, M. and Le, Q. (2019). EfficientNet: Rethinking model
scaling for convolutional neural networks. In Chaud-
huri, K. and Salakhutdinov, R., editors, Proceedings of
the 36th International Conference on Machine Learn-
ing, volume 97 of Proceedings of Machine Learning
Research, pages 6105–6114. PMLR.
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M.,
and Luo, P. (2021). Segformer: Simple and efficient
design for semantic segmentation with transformers.
In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang,
P., and Vaughan, J. W., editors, Advances in Neural
A Comparative Analysis of EfficientNet Architectures for Identifying Anomalies in Endoscopic Images
539