
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn,
D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer,
M., Heigold, G., Gelly, S., et al. (2020). An image is
worth 16x16 words: Transformers for image recogni-
tion at scale. arXiv preprint arXiv:2010.11929.
Gal, Y. and Ghahramani, Z. (2016). Dropout as a bayesian
approximation: Representing model uncertainty in
deep learning. In international conference on machine
learning, pages 1050–1059. PMLR.
Ganaie, M. A., Hu, M., Malik, A. K., Tanveer, M., and Sug-
anthan, P. N. (2022). Ensemble deep learning: A re-
view. Engineering Applications of Artificial Intelli-
gence, 115:105151.
Goan, E. and Fookes, C. (2020). Bayesian neural net-
works: An introduction and survey. Case Studies in
Applied Bayesian Data Science: CIRM Jean-Morlet
Chair, Fall 2018, pages 45–87.
H
´
enaff, O., Srinivas, A., Fauw, J., Razavi, A., Doersch,
C., Eslami, S., and van den Oord, A. (2020). Data-
efficient image recognition with contrastive predictive
coding. arxiv. arXiv preprint arXiv:1905.09272.
Jena, B., Nayak, G. K., and Saxena, S. (2022). High-
performance computing and its requirements in deep
learning. In High-Performance Medical Image Pro-
cessing, pages 255–288. Apple Academic Press.
Jia, M., Tang, L., Chen, B.-C., Cardie, C., Belongie, S.,
Hariharan, B., and Lim, S.-N. (2022). Visual prompt
tuning. In European Conference on Computer Vision,
pages 709–727. Springer.
Kim, S. and Yun, S.-Y. (2022). Calibration of few-shot clas-
sification tasks: Mitigating misconfidence from distri-
bution mismatch. IEEE Access, 10:53894–53908.
Lemley, J., Bazrafkan, S., and Corcoran, P. (2017). Deep
learning for consumer devices and services: pushing
the limits for machine learning, artificial intelligence,
and computer vision. IEEE Consumer Electronics
Magazine, 6(2):48–56.
Maji, S., Kannala, J., Rahtu, E., Blaschko, M., and Vedaldi,
A. (2013). Fine-grained visual classification of air-
craft. Technical report.
Miao, Y., Lei, Y., Zhou, F., and Deng, Z. (2024). Bayesian
exploration of pre-trained models for low-shot image
classification. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition,
pages 23849–23859.
Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak,
B., and Sutskever, I. (2021). Deep double descent:
Where bigger models and more data hurt. Jour-
nal of Statistical Mechanics: Theory and Experiment,
2021(12):124003.
Nguyen, A., Yosinski, J., and Clune, J. (2015). Deep neural
networks are easily fooled: High confidence predic-
tions for unrecognizable images. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR).
Nilsback, M.-E. and Zisserman, A. (2008). Automated
flower classification over a large number of classes.
In 2008 Sixth Indian conference on computer vision,
graphics & image processing, pages 722–729. IEEE.
Peng, Y., He, X., and Zhao, J. (2017). Object-part atten-
tion model for fine-grained image classification. IEEE
Transactions on Image Processing, 27(3):1487–1500.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G.,
Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark,
J., et al. (2021). Learning transferable visual models
from natural language supervision. In International
conference on machine learning, pages 8748–8763.
PMLR.
Seoh, R. (2020). Qualitative analysis of monte carlo
dropout. arXiv preprint arXiv:2007.01720.
Snell, J., Swersky, K., and Zemel, R. (2017). Prototypical
networks for few-shot learning. Advances in neural
information processing systems, 30.
Valdenegro-Toro, M. (2021). I find your lack of uncertainty
in computer vision disturbing. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition, pages 1263–1272.
Valdenegro-Toro, M. and Mori, D. S. (2022). A deeper look
into aleatoric and epistemic uncertainty disentangle-
ment. In 2022 IEEE/CVF Conference on Computer
Vision and Pattern Recognition Workshops (CVPRW),
pages 1508–1516. IEEE.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.
(2017). Attention is all you need. Advances in neural
information processing systems, 30.
Wah, C., Branson, S., Welinder, P., Perona, P., and Be-
longie, S. (2011). Technical Report CNS-TR-2011-
001, California Institute of Technology.
Wang, Y., Yao, Q., Kwok, J. T., and Ni, L. M. (2020). Gen-
eralizing from a few examples: A survey on few-shot
learning. ACM computing surveys (csur), 53(3):1–34.
Zhou, K., Yang, J., Loy, C. C., and Liu, Z. (2022a). Con-
ditional prompt learning for vision-language models.
In Proceedings of the IEEE/CVF conference on com-
puter vision and pattern recognition, pages 16816–
16825.
Zhou, K., Yang, J., Loy, C. C., and Liu, Z. (2022b). Learn-
ing to prompt for vision-language models. Inter-
national Journal of Computer Vision, 130(9):2337–
2348.
Adaptive Prompt Tuning: Vision Guided Prompt Tuning with Cross-Attention for Fine-Grained Few-Shot Learning
125