Adaptive Prompt Tuning: Vision Guided Prompt Tuning with Cross-Attention for Fine-Grained Few-Shot Learning
Eric Brouwer, Eric Brouwer, Jan Erik van Woerden, Gertjan Burghouts, Matias Valdenegro-Toro, Marco Zullich
2025
Abstract
Few-shot, fine-grained classification in computer vision poses significant challenges due to the need to differentiate subtle class distinctions with limited data. This paper presents a novel method that enhances the Contrastive Language-Image Pre-Training (CLIP) model through adaptive prompt tuning, guided by real-time visual inputs. Unlike existing techniques such as Context Optimization (CoOp) and Visual Prompt Tuning (VPT), which are constrained by static prompts or visual token reliance, the proposed approach leverages a cross-attention mechanism to dynamically refine text prompts for the image at hand. This enables an image-specific alignment of textual features with image patches extracted from the Vision Transformer, making the model more effective for datasets with high intra-class variance and low inter-class differences. The method is evaluated on several datasets, including CUBirds, Oxford Flowers, and FGVC Aircraft, showing significant performance gains over static prompt tuning approaches. To ensure these performance gains translate into trustworthy predictions, we integrate Monte-Carlo Dropout in our approach to improve the reliability of the model predictions and uncertainty estimates. This integration provides valuable insights into the model’s predictive confidence, helping to identify when predictions can be trusted and when additional verification is necessary. This dynamic approach offers a robust solution, advancing the state-of-the-art for few-shot fine-grained classification.
DownloadPaper Citation
in Harvard Style
Brouwer E., van Woerden J., Burghouts G., Valdenegro-Toro M. and Zullich M. (2025). Adaptive Prompt Tuning: Vision Guided Prompt Tuning with Cross-Attention for Fine-Grained Few-Shot Learning. In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP; ISBN 978-989-758-728-3, SciTePress, pages 114-125. DOI: 10.5220/0013163700003912
in Bibtex Style
@conference{visapp25,
author={Eric Brouwer and Jan van Woerden and Gertjan Burghouts and Matias Valdenegro-Toro and Marco Zullich},
title={Adaptive Prompt Tuning: Vision Guided Prompt Tuning with Cross-Attention for Fine-Grained Few-Shot Learning},
booktitle={Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP},
year={2025},
pages={114-125},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013163700003912},
isbn={978-989-758-728-3},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP
TI - Adaptive Prompt Tuning: Vision Guided Prompt Tuning with Cross-Attention for Fine-Grained Few-Shot Learning
SN - 978-989-758-728-3
AU - Brouwer E.
AU - van Woerden J.
AU - Burghouts G.
AU - Valdenegro-Toro M.
AU - Zullich M.
PY - 2025
SP - 114
EP - 125
DO - 10.5220/0013163700003912
PB - SciTePress