Accuracy Improvement of Neuron Concept Discovery Using CLIP with Grad-CAM-Based Attention Regions

Takahiro Sannomiya, Kazuhiro Hotta

2025

Abstract

WWW is a method that computes the similarity between image and text features using CLIP and assigns a concept to each neuron of the target model whose behavior is to be determined. However, because this method calculates similarity using center crop for images, it may include features that are not related to the original class of the image and may not correctly reflect the similarity between the image and text. Additionally, WWW uses cosine similarity to calculate the similarity between images and text. Cosine similarity can sometimes result in a broad similarity distribution, which may not accurately capture the similarity between vectors. To address them, we propose a method that leverages Grad-CAM to crop the model’s attention region, filtering out the features unrelated to the original characteristics of the image. By using t-vMF to measure the similarity between the image and text, we achieved a more accurate discovery of neuron concepts.

Download


Paper Citation


in Harvard Style

Sannomiya T. and Hotta K. (2025). Accuracy Improvement of Neuron Concept Discovery Using CLIP with Grad-CAM-Based Attention Regions. In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP; ISBN 978-989-758-728-3, SciTePress, pages 497-502. DOI: 10.5220/0013247500003912


in Bibtex Style

@conference{visapp25,
author={Takahiro Sannomiya and Kazuhiro Hotta},
title={Accuracy Improvement of Neuron Concept Discovery Using CLIP with Grad-CAM-Based Attention Regions},
booktitle={Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP},
year={2025},
pages={497-502},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013247500003912},
isbn={978-989-758-728-3},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP
TI - Accuracy Improvement of Neuron Concept Discovery Using CLIP with Grad-CAM-Based Attention Regions
SN - 978-989-758-728-3
AU - Sannomiya T.
AU - Hotta K.
PY - 2025
SP - 497
EP - 502
DO - 10.5220/0013247500003912
PB - SciTePress