Vision Transformer Interpretability via Prediction of Image Reflected Relevance Among Tokens
Kento Sago, Kazuhiro Hotta
2024
Abstract
The Vision Transformer (ViT) has a complex structure. To use it effectively in a place of critical decision-making, it is necessary to visualize an area that affects the model’s predictions so that people can understand. In this paper, we propose a new visualization method based on Transformer Attribution which is widely used for visualizing the area for ViT’s predictions. This method estimates the influences of each token on predictions by considering the predictions of images reflected relevance among tokens, and produce saliency maps. Our method increased the accuracy by about 1.28%, 1.61% for deletion and insertion and about 3.01%, 0.94% for average drop and average increase on ILSVRC2012 validation data in comparison with conventional methods.
DownloadPaper Citation
in Harvard Style
Sago K. and Hotta K. (2024). Vision Transformer Interpretability via Prediction of Image Reflected Relevance Among Tokens. In Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM; ISBN 978-989-758-684-2, SciTePress, pages 100-106. DOI: 10.5220/0012419200003654
in Bibtex Style
@conference{icpram24,
author={Kento Sago and Kazuhiro Hotta},
title={Vision Transformer Interpretability via Prediction of Image Reflected Relevance Among Tokens},
booktitle={Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM},
year={2024},
pages={100-106},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012419200003654},
isbn={978-989-758-684-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM
TI - Vision Transformer Interpretability via Prediction of Image Reflected Relevance Among Tokens
SN - 978-989-758-684-2
AU - Sago K.
AU - Hotta K.
PY - 2024
SP - 100
EP - 106
DO - 10.5220/0012419200003654
PB - SciTePress