Vision Transformer Interpretability via Prediction of Image Reflected Relevance Among Tokens

Kento Sago; Kazuhiro Hotta

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Vision Transformer Interpretability via Prediction of Image Reflected Relevance Among Tokens

Topics: Deep Learning and Neural Networks; Image and Video Analysis and Understanding

In Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods ICPRAM - Volume 1, 100-106, 2024 , Rome, Italy

Authors: Kento Sago and Kazuhiro Hotta

Affiliation: Meijo University, Nagoya, Japan

Keyword(s): Explainable AI, Vision Transformer, Transformer Attribution.

Abstract: The Vision Transformer (ViT) has a complex structure. To use it effectively in a place of critical decision-making, it is necessary to visualize an area that affects the model’s predictions so that people can understand. In this paper, we propose a new visualization method based on Transformer Attribution which is widely used for visualizing the area for ViT’s predictions. This method estimates the influences of each token on predictions by considering the predictions of images reflected relevance among tokens, and produce saliency maps. Our method increased the accuracy by about 1.28%, 1.61% for deletion and insertion and about 3.01%, 0.94% for average drop and average increase on ILSVRC2012 validation data in comparison with conventional methods.

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.101

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Sago, K. and Hotta, K. (2024). Vision Transformer Interpretability via Prediction of Image Reflected Relevance Among Tokens. In Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - ICPRAM; ISBN 978-989-758-684-2; ISSN 2184-4313, SciTePress, pages 100-106. DOI: 10.5220/0012419200003654

@conference{icpram24,
author={Kento Sago and Kazuhiro Hotta},
title={Vision Transformer Interpretability via Prediction of Image Reflected Relevance Among Tokens},
booktitle={Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - ICPRAM},
year={2024},
pages={100-106},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012419200003654},
isbn={978-989-758-684-2},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - ICPRAM
TI - Vision Transformer Interpretability via Prediction of Image Reflected Relevance Among Tokens
SN - 978-989-758-684-2
IS - 2184-4313
AU - Sago, K.
AU - Hotta, K.
PY - 2024
SP - 100
EP - 106
DO - 10.5220/0012419200003654
PB - SciTePress