Accuracy Improvement of Neuron Concept Discovery Using CLIP with Grad-CAM-Based Attention Regions
Takahiro Sannomiya, Kazuhiro Hotta
2025
Abstract
WWW is a method that computes the similarity between image and text features using CLIP and assigns a concept to each neuron of the target model whose behavior is to be determined. However, because this method calculates similarity using center crop for images, it may include features that are not related to the original class of the image and may not correctly reflect the similarity between the image and text. Additionally, WWW uses cosine similarity to calculate the similarity between images and text. Cosine similarity can sometimes result in a broad similarity distribution, which may not accurately capture the similarity between vectors. To address them, we propose a method that leverages Grad-CAM to crop the model’s attention region, filtering out the features unrelated to the original characteristics of the image. By using t-vMF to measure the similarity between the image and text, we achieved a more accurate discovery of neuron concepts.
DownloadPaper Citation
in Harvard Style
Sannomiya T. and Hotta K. (2025). Accuracy Improvement of Neuron Concept Discovery Using CLIP with Grad-CAM-Based Attention Regions. In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP; ISBN 978-989-758-728-3, SciTePress, pages 497-502. DOI: 10.5220/0013247500003912
in Bibtex Style
@conference{visapp25,
author={Takahiro Sannomiya and Kazuhiro Hotta},
title={Accuracy Improvement of Neuron Concept Discovery Using CLIP with Grad-CAM-Based Attention Regions},
booktitle={Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP},
year={2025},
pages={497-502},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013247500003912},
isbn={978-989-758-728-3},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP
TI - Accuracy Improvement of Neuron Concept Discovery Using CLIP with Grad-CAM-Based Attention Regions
SN - 978-989-758-728-3
AU - Sannomiya T.
AU - Hotta K.
PY - 2025
SP - 497
EP - 502
DO - 10.5220/0013247500003912
PB - SciTePress