Clustering for Explainability: Extracting and Visualising Concepts from

Activation

Alexandre Lambert

1,2,3 a

, Aakash Soni

1 b

, Assia Soukane

, Amar Ramdane Cherif

and Arnaud Rabat

LyRIDS, ECE Research Center Paris, France

LISV Laboratory, Universit

e de Versailles, Paris Saclay, Velizy, France

Unit

e d’Ergonomie Cognitive des Situations Op

erationnelles, IRBA, Br

etigny sur Orge, France

{alambert, aakash.soni}@ece.fr

Keywords:

Activations Explainability, Concept Extraction and Visualization, Clustering.

Abstract:

Despite signiﬁcant advances in computer vision with deep learning models (e.g. classiﬁcation, detection, and

segmentation), these models remain complex, making it challenging to assess their reliability, interpretability,

and consistency under diverse. There is growing interest in methods for extracting human-understandable con-

cepts from these models, but signiﬁcant challenges persist. These challenges include difﬁculties in extracting

concepts relevant to both model parameters and inference while ensuring the concepts are meaningful to indi-

viduals with varying expertise levels without requiring a panel of evaluators to validate the extracted concepts.

To tackle these challenges, we propose concept extraction by clustering activations. Activations represent a

model’s internal state based on its training, and can be grouped to represent learned concepts. We propose two

clustering methods for concept extraction, a metric for evaluating their importance, and a concept visualization

technique for concept interpretation. This approach can help identify biases in models and datasets.

1 INTRODUCTION

Deep neural networks (DNNs) and convolutional

neural networks (CNNs) are crucial for artiﬁcial

intelligence thanks to their widespread availability

and impressive performance on standardised bench-

marks, particularly in computer vision applications.

However, these models are often considered ”black

boxes”, leaving users uncertain about their decisions-

making process and the knowledge they acquire. This

lack of transparency make them less suitable for ap-

plications where interpretability is critical, such as

medical diagnosis, autonomous driving, and human-

centred models (Lambert et al., 2024). Thus, it is

crucial to develop simple explanation methods to un-

derstand these models. Moreover, the explanation

methods can provide several advantages. Firstly, they

can provide enhanced model comprehension, allow-

ing to interpret the model’s inner workings, under-

stand how it arrives at its predictions, and build trust

in the model’s decision-making process through bet-

ter evaluation and reﬁnement. Secondly, they can of-

https://orcid.org/0000-0001-5702-6445

https://orcid.org/0000-0002-0882-5280

fer valuable guidance during the training process and

ensure that the model learns the desired information

and avoid potential biases, leading to more robust and

accurate model. Finally, these methods can help bet-

ter understand outliers. In essence, these tools can

offer a powerful perspective allowing non-specialists

to gain deeper insights into the intricate world of

DNNs and CNNs, enabling their use in various ap-

plications (Sivanandan and Jayakumari, 2020; Zhang

et al., 2022; Atakishiyev et al., 2024) The state-of-

the-art explanation methods are divided into two cat-

egories:

Interpretable model are neural network models

designed to be inherently interpretable. They often

incorporate human-interpretable concepts by train-

ing on custom loss functions and adding semantic

knowledge into the networks (Wickramanayake et al.,

2021).

Post hoc explanations methods can be applied to

any model after it has been trained. These methods

analyze the model’s predictions and identify the most

important features for those predictions. It is done by

using feature maps, gradients or input perturbation.

Post-hoc explanations can provide visual insights into

Lambert, A., Soni, A., Soukane, A., Cherif, A. and Rabat, A.

Clustering for Explainability: Extracting and Visualising Concepts from Activation.

DOI: 10.5220/0012927900003838

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2024) - Volume 2: KEOD, pages 151-158

ISBN: 978-989-758-716-0; ISSN: 2184-3228

151

the model’s decision-making process and identify po-

tential biases in the model (Lapuschkin et al., 2019).

This paper focuses on post-hoc explanations, par-

ticularly through analyzing activations. While the

activation matrix shows the neural network’s inter-

nal state, it may not reveal the conceptual structures

meaningful to humans or that the model is learning.

To address this, we introduce a method to identify and

group informative subsets of activations, referred to as

concepts. Our method aims to make these extracted

concepts interpretable and to assess their importance

in relation to the model’s predictions. This paper pro-

poses: 1) A method to extract concepts that highlight

input image regions prioritized by the model for pre-

dictions. 2) A metric to assess the importance of these

concepts. 3) A technique to visualize these concepts

on the input image. Additionally, our code is publicly

available to support the development of use cases.

The paper is organised as follows: Section 2 re-

views related works. Section 3 presents our method-

ologies and two clustering methods for concept ex-

traction. Section 4 presents the concept extraction

results, followed by a discussion. Finally, the paper

concludes in Section 5.

2 RELATED WORKS

Among post-hoc explainability techniques, attribu-

tion methods are widely used to determine the input

variables contributing to a model’s prediction by gen-

erating importance maps. The Saliency method (Si-

monyan et al., 2014) creates heatmaps based on gra-

dients to highlight inﬂuential pixels. GradCAM (Sel-

varaju et al., 2016) method incorporates gradients into

class activation mapping. However, gradient-based

methods can be limited because they capture model

behavior in only a small local area around the in-

put, potentially leading to misleading importance es-

timates (Ghalebikesabi et al., 2021). This is partic-

ularly true for large vision models, where gradients

are often noisy and unreliable (Smilkov et al., 2017).

To address this, perturbation-based methods, like Rise

(Petsiuk et al., 2018), offer a valuable approach to

understanding ”where” a model focuses its attention,

though they may be prone to conﬁrmation bias, po-

tentially leading to misleading explanations. This has

lead to questions about their usefulness.The HIVE

framework (Kim et al., 2022), offers a way to as-

sess explanations in AI-assisted decision-making sce-

narios, enabling falsiﬁable hypothesis testing, cross-

method comparison, and human-centred evaluation of

visual interpretability methods.

Recent approaches like ACE (Ghorbani et al.,

2019) focus on concept extraction by segmenting im-

ages and analyzing neural network activations, clus-

tering them into ”concepts.” However, ACE can in-

clude irrelevant background segments, necessitating

post-processing to remove outliers. The ICE frame-

work (Zhang et al., 2021) improves upon ACE by

using Non-Negative Matrix Factorization (NMF) for

better interpretability and ﬁdelity, offering both lo-

cal and global concept-level explanations. Simi-

larly, CRAFT (Fel et al., 2023) employs NMF to ex-

tract concepts from model activations, reﬁning them

through recursive decomposition. However, CRAFT

is more suited for groups of images and its methods

for concept localization are complex, potentially chal-

lenging for non-experts.

To enhance interpretability, we propose a method

that avoids the complexity of existing approaches,

which often rely on ”banks of coefﬁcients” and com-

putationally intensive steps that may obscure under-

standing at the single-image level. Our methodol-

ogy to extract concepts uses less complex algorithm,

maintaining efﬁciency and clarity, and making it more

accessible to a broader audience.

3 METHODOLOGY

3.1 Overview of the Method

In this work, we investigate a supervised learning

scenario, involving a pre-trained black box predictor

M : X → Y with a set of n images X ∈ {x1, ..., xn}

and their corresponding labels Y ∈ {y1, ..., yn}. The

input images are represented as a Ch × H × W ma-

trix, where Ch represents the number of channels (e.g.

RBG, RGBA, LA), and H and W are the image height

and width. For each input image x, the predictor out-

puts M(x). We assume that M is a neural network

with ﬁxed settings that can be divided into two parts:

g transforms the input image into an intermediate rep-

resentation g(x), and h takes this intermediate repre-

sentation to produce the ﬁnal output M(x) = h(g(x)).

The intermediate representation is in a lower-

dimensional space, determined by the number and

nature of operations in g (e.g. convolution, pool-

ing, down-sampling and scaling). For a given input

x, g(x) produces a set of activations A with a shape

× A

, where A

is the number of activa-

tions, and A

, A

are the height and width of each

activation (A

In most pre-trained models, activations are typi-

cally non-negative due to the ReLU activation. The

activation values within A

can be viewed as a spa-

tial distribution feature in a small information matrix.

KEOD 2024 - 16th International Conference on Knowledge Engineering and Ontology Development

152

Combining these values can help identify where spe-

ciﬁc information useful for classiﬁcation is located.

Essentially, when an image is passed through g(x), A

shows what the model has learned during training and

where in the image it focuses during the forward pass,

as these activation values are determinant for classiﬁ-

cation when fed into the classiﬁer h(x).

Layer

(criterion)

...

Convolutional Layer

Pooling Layer

Linear Neural Networks

Output probability

Figure 1: Method overview for concept extraction from a

feature extractor g and a model classiﬁer h. Any CNN ar-

chitecture can replace g and h.

While the activation matrix comprehensively rep-

resents the neural network’s internal state, it may not

directly reveal the underlying conceptual structures

meaningful to humans. This motivates the explo-

ration of methods to identify informative subsets of

activations that can be grouped. We propose that

when a sufﬁcient number of A

exhibits similar be-

haviour, they can be considered a set of cohesive units

representing a learned concept C. Identifying these

concepts helps gain insights into the model’s inter-

nal knowledge representation and facilitates a more

nuanced understanding of the phenomena the model

processes.

This work demonstrates that multiple activation

patterns A

can be grouped into different concepts C

by satisfying speciﬁc criteria regarding a method K,

as summarized in Figure 1. This approach aims to

bridge the gap between raw activations and the high-

level conceptual knowledge encoded by the model.

In the following paragraph, we propose two con-

cept extraction methods: the ﬁrst focuses on the in-

ternal patterns within each activation, and the second

uses a relatively straightforward approach based on

the position of high activation values.

3.2 Concept Extraction via Clustering

For a given A , we aim to identify different con-

cepts by regrouping different subsets of A that sat-

isfy a given criterion in a clustering method K. As

mentioned earlier, the activation set A is of shape

× A

. However, to apply classical cluster-

ing algorithms without losing information, it is con-

venient to reshape A as A

×(A

×A

), without any

need for normalisation.

The classical clustering algorithms re-

quire as input A to produce a set of clusters

γ = {C

, C

, ...C

concept

} that exhibit the same clus-

tering criterion. A concept C

in γ obtained using a

clustering algorithm K, is deﬁned in Equation 1:

= {A

⊆ A | f

, C

)} (1)

where f

is minimised or maximised with respect to

other clusters, depending on the algorithm K.

This work explores two possible ways of cluster-

ing to extract concepts, as explained in the following

paragraphs.

3.2.1 Clustering Based on General Activations

Patterns (CGAP)

This ﬁrst approach focuses on obtaining concepts

based on general activation patterns observed in A .

To achieve that, all the non-zero activations in A are

passed to the clustering algorithm K. A non-zero ac-

tivation is A

with at least one non-zero value. Since

activations with all zero values do not play any role in

classiﬁcation, they can be ignored. K aims to regroup

all the A

that share similar activation values at similar

indices, such that each C

in γ contains unique sets of

from A.

Given the high dimensionality of A, applying

Clustering directly to A can be computationally in-

tensive and may lead to sub-optimal clustering per-

formance. As a solution, we employ Principal Com-

ponent Analysis (PCA) as a dimensionality reduc-

tion technique before Clustering. PCA transforms the

original high-dimensional activation data (A

× A

)

into a lower-dimensional space while preserving as

much variance as possible. This transformation helps

highlight the most signiﬁcant features contributing to

the activation patterns, thus enhancing the effective-

ness of the subsequent clustering process (Ding and

He, 2004). The size of the lower dimension space

depends on the number of desired concepts; in this

study, it equals N

concept

− 1. By reducing the num-

ber of dimensions, PCA helps enhancing computa-

tional efﬁciency and often improving the performance

of clustering algorithms by emphasising the most dis-

tinctive clusters. After applying PCA, the reduced-

dimensional activation data is fed into the clustering

algorithm K to identify distinct activation patterns,

extracting cohesive and informative concepts from the

model’s learned representations.

It is important to note that the uniqueness of each

cluster in γ can be evaluated and controlled using

some metrics and criteria. However, the size of each

cluster depends on the activation patterns, leading to

Clustering for Explainability: Extracting and Visualising Concepts from Activation

153

some clusters containing more activations than others,

particularly in case of large activation patterns.

Depending on the application, if concepts repre-

senting small patterns in the input image are desired,

the large clusters composing C

can be divided into

sub-clusters C

sub

by iteratively applying the clustering

algorithm K until the desired number of sub-concepts

is extracted. In this work, the maximum number of

sub-clusters is arbitrarily limited to 3.

3.2.2 Clustering Based on Position of High

Activations (CPHA)

The second approach privileges regrouping activa-

tions A

with higher values at similar spatial positions.

Our observations suggest that high activation values

often carry more weight in classiﬁcation, as they cor-

respond to the parts of the input image most relevant

to the model’s decision. Nevertheless, this may only

sometimes be the case and warrants further investiga-

tion for generalisation. We propose that clustering ac-

tivations with high values reveal concepts of relatively

higher inﬂuence in classiﬁcation and minimise redun-

dancy in concept extraction. For that purpose, ﬁrst, in

each A

the coordinates of max(A

), called C oord

, are

identiﬁed as deﬁned in the Equation 2

Coord

= argmaxA

(2)

Then, the clustering method K is applied on all the

Coord

to obtain γ. By using the set of Coord

clustering input, the concept extraction focuses on the

spatial position of high activation values. Thus, con-

cepts dispersed along the input image are identiﬁed,

and the activations most relevant to the model’s pre-

diction are distinctly regrouped.

3.3 Concept Importance

To assess the importance of each C

in classifying a

target class (label), we propose a concept importance

metric I

, regardless of the concept extraction method.

For a given image x of target class t, ﬁrst, we feed

the model classiﬁer h with A . As output, h predicts

the class t with a probability p

. Then, to assess the

importance of a concept C

in the prediction of t, all

the activation values of A

in C

are set to 0. The mod-

iﬁed activation set is then fed to h to obtain a new

prediction p

. Finally, the importance I

of concept

is then calculated from the difference between p

and p

as follows in Equation 3

− p

× 100 (3)

Note that, here, the concept importance is computed

w.r.t a concept of interest C

, and the sum of all con-

cept importance is not equal to 100%.

Computing the importance of individual concepts

provides valuable insights into how each concept con-

tributes to the overall prediction score. A positive in-

ﬂuence means that the given concept is responsible

for a higher certainty of the model’s prediction. In

contrast, a negative inﬂuence makes the model’s pre-

diction less conﬁdent.

3.4 Concept Visualisation

Each concept C

is a set of one or more activation A

shape A

× A

(usually 8 × 8), which is smaller than

the input image shape (in our work, it is 256 × 256).

So, to project concepts onto the input image, an in-

termediate transformation is needed. It is achieved

by, ﬁrst, applying an element-wise sum among all the

in C

and, then, interpolating the resulting matrix

(of shape 8 × 8) using bilinear interpolation to the in-

put image size (256 × 256). The resulting matrix (of

shape 256 × 256) is ﬁnally min-max normalised. In

the case of sub-clusters C

sub

, the normalisation is per-

formed using the minimum and maximum values of

the parent concept C

to ensure that the sub-concepts

are visualised proportionally within the context of the

overall concept.

4 RESULTS

A ResNet-50-based classiﬁcation model pre-trained

on the ImageNet-1k dataset is used to evaluate our

concept extraction methods. The following para-

graphs provide a brief description of the evaluation

environment followed by a discussion on evaluation

metrics and the result.

4.1 Evaluation Environment, Clustering

Algorithms and Metrics

Dataset: ImageNet-1k (Deng et al., 2009) is a

well-known extensive image database containing over

a million images categorised into 1,000 different

classes. We have arbitrarily chosen 11 classes for

this study: rabbit (300 images), tench (387 images),

english springier (395 images), cassette player (357

images), chain saw (386 images), church (409 im-

ages), french horn (394 images), garbage truck (389

images), gas pump (419 images), golf ball (399 im-

ages) and parachute (390 images).

Model: ResNet-50 (He et al., 2016) is a CNN ar-

chitecture designed for image classiﬁcation. It excels

at identifying objects within images. thanks to its

KEOD 2024 - 16th International Conference on Knowledge Engineering and Ontology Development

154

deep architecture that learns complex patterns from

the image. For the results presented in this paper, we

use a pre-trained ResNet variant, called Norm-Free

ResNet50 (Brock et al., 2021b; Brock et al., 2021a),

that removes all normalization layers. The model has

= 2048 activations in the last layer, each sized

8×8, and is initialized with ImageNet-1k weight con-

ﬁguration. The input image size is 256 × 256.

Clustering Algorithms and Metrics: We test our

concept extraction method using four well-known

clustering algorithms: k-means, Agglomerative,

Birch and Gaussian Mixture Model (GMM). To eval-

uate the cluster quality representing the extracted con-

cept, three metrics are used:

Silhouette Score (SS) measures the separation be-

tween clusters, with values range from -1 to 1. A

score of 1 indicates well-separated clusters, 0 sug-

gests overlapping clusters, and negative values indi-

cate potential misassignments.

Calinski-Harabasz Index (CHI) (or Variance

Ratio Criterion) evaluates between-cluster and

within-cluster dispersion. Higher values indicate

denser, more distinct clusters.

Davies-Bouldin Index (DBI) measures the aver-

age cluster ’similarity’ by comparing inter-cluster dis-

tance with intra-cluster size. A lower index indicates

better partitioning.

4.2 Evaluating Concepts Quality Based

on the Clustering Metrics

In this section, we compare the performance of the

clustering algorithms using the two methods (CGAP

and CPHA) proposed in Section 3.2 for concept ex-

traction. For comparison, the four clustering algo-

rithms (Agglomerative, Birch, GMM and k-means)

are used to extract N

concept

= 5 concepts from each

input image (belonging to the 11 output labels) inde-

pendently. The uniqueness and clustering consistency

is assessed by comparing the clustering metrics for all

the algorithms.

Table 1 shows the mean value of clustering met-

rics for different clustering algorithms using CGAP

and CPHA methods. We observe that the k-means

algorithm shows the best performance on all the

metrics: 0.64 SS, 441.05 CHI and 0.97 DBI using

CGPA, and 0.43 SS, 984.69 CHI and 0.84 DBI using

CPHA. Both Agglomerative and Birch show similar

or slightly lower performance than k-means. In con-

trast, GMM shows the worst performance. Addition-

ally, the average execution time (in seconds) required

for clustering for each algorithm is also compared in

Table 1, where Agglomerative is observed to be the

fastest and GMM is the slowest.

Table 1: Comparison clustering method (mean over all la-

bels).

Method Cls SS CHI DBI Time

CGAP

A 0.61 398.70 1.00 0.14

B 0.62 398.14 0.97 0.22

G -0.03 111.39 1.84 0.73

k 0.64 441.05 0.97 0.20

CPHA

A 0.41 892.39 0.84 0.14

B 0.37 724.83 0.90 0.22

G 0.32 564.51 1.41 0.73

k 0.43 948.69 0.84 0.20

SS: Silhouette Score, CHI: Calinski-Harabasz Index, DBI:

Davies-Bouldin Index, Cls: Clustering algorithm, A:

Agglomerative, B: Birch, G: GMM, k: k-means

For further comparison, the clustering metrics ob-

tained using CGAP and CPHA for different target la-

bels are shown separately by the boxplots in Figure

2. The clustering metrics for Agglomerative, Birch,

GMM, and k-means are represented by pink, blue,

green, and purple box plots respectively. The y-axis

for each ﬁgure in a row is common, where each tick

represents one of the 11 target labels. The x-axis

represents one of the three clustering metrics. The

boxplot edges correspond to the 25th and 75th per-

centiles, the whiskers show the extreme values, and

the dots highlights the outliers. Figure 2 conﬁrms

the same results as Table 1, where Agglomerative and

Birch show similar or slightly lower clustering met-

rics for all the target labels, as compared to k-means.

Meanwhile, the GMM performs worst in all cases.

CGAP and CPHA methods can also be compared

based on the clustering metrics in Figure 2 and Table

1. A common trend is observed where CPHA yields

higher CHI and lower DBI than CGAP, suggest-

ing better cluster compactness and separation with

CPHA. On the contrary, SS is smaller using CPHA

than CGAP, suggesting some loss in overall cluster

distinctness. Nevertheless, in all the cases, k -means

outperforms the other algorithms.

These results suggest that k-means produces more

distinct and consistent clusters. Although Agglom-

erative and Birch produce similar results, the rest of

the evaluation focuses only on k-means for clarity and

space constraints. Full results for all algorithms are

available on our GitHub project page: https://github.c

om/AlexandreLamb/Clustering-for-Explainability.

Table 2 compares the impact of varying the num-

ber of extracted concepts on the clustering metrics.

For CGAP, increasing N

concepts

from 3 to 9 resulted in

a decreased SS and CHI, indicating less distinct and

more overlapping clusters. Conversely, for CPHA, it

Clustering for Explainability: Extracting and Visualising Concepts from Activation

155

Figure 2: Clustering metrics for Agglomerative (pink), Birch (blue), GMM (green) and k-means (purple) on different labels.

The 4 clustering algorithms are compared using CGAP (top row) and CPHA (bottom row).

led to increased SS and CHI, suggesting more dis-

tinct clusters. On the other hand, DBI does not show

any speciﬁc pattern. It varies around the same range

of values, , implying limited usefulness in our study.

Based on these observations, to achieve high-quality

clusters, a smaller number of clusters is desirable for

CGAP, while a large number is preferable for CPHA.

In this study, we arbitrarily chose N

concept

= 5.

Table 2: k-means CGAP with PCA (mean overall label).

Method N

concept

SS CHI DBI

CGAP

3 0.75 966.00 0.72

5 0.64 441.05 0.97

7 0.57 288.34 1.10

9 0.52 218.93 1.17

CPHA

3 0.43 900.18 0.85

5 0.43 948.69 0.84

7 0.44 979.13 0.83

9 0.46 1018.18 0.81

SS: Silhouette Score, CHI: Calinski-Harabasz Index, DBI:

Davies-Bouldin Index

4.3 Concept Visualisation and

Interpretation

In this section, a visual representation of the ex-

tracted concepts is presented using the visualisation

method proposed in Section 3.4. For clear visual-

isation, the input colour images are transformed to

grayscale and the normalised activation values from

concepts are used to weight the original image and

are projected using the ”HOT” colourmap of openCV

(Itseez, 2015). As a result, the concepts are projected

with a colour scale in shades of blue, where bright

blue represents higher activation. For each concept

, the number of activations (A

) within C

and the

concept importance I

are also presented. The con-

cepts are sorted by decreasing order of I

4.3.1 Concept Visualisation Based on General

Activation Pattern

Figure 3 visualizes 5 concepts extracted using CGAP

for an image labelled ”Garbage Truck”. These con-

cepts highlight key general activation patterns used

by the model to predict the input image as a garbage

truck. The ﬁrst three concepts (C

, C

and C

) high-

KEOD 2024 - 16th International Conference on Knowledge Engineering and Ontology Development

156

light the different garbage truck regions, ex. chassis,

driver’s compartment and garbage container, with im-

portances of 46.2, 33.296, and 16.439, respectively.

The remaining activations are clustered into con-

cepts C

and C

. C

contains relatively larger pat-

terns, including the garbage truck and its surround-

ings, with an importance of 13.948. Recall that I

represents the average importance of all activations

within C

. However, such large activation patterns

can be decomposed into smaller clusters if the im-

portance of the small cluster is of interest, using the

sub-clustering proposed in Section 3.2.1. Figure 4

shows the sub-concepts obtained by decomposing C

The sub-clusters reveal that the activations represent-

ing the garbage truck (C

) have a higher importance

of 12.709, compared to 0.153 and 1.086 for the sur-

roundings (C

and C

). This sub-clustering conﬁrms

that the model prioritizes relevant concepts for pre-

dicting the garbage truck.

Figure 3: Concept visualisation using the CGAP on an im-

age labelled ”Garbage Truck” with N

concept

= 5.

CONCEPT

SUB CONCEPTS

Figure 4: Sub-clusters of concept C

in Figure 3.

The low importance of the surrounding areas, rep-

resented by concepts C

, C

, and C

, is notewor-

thy and may be attributed to potential similar back-

grounds in the training data, which the model asso-

ciated as a relevant concept (Fel et al., 2023). The

impact of these concepts on model predictions varies

by application, but the importance metric helps es-

timate their inﬂuence. Figure 5 provides additional

examples of such concepts. For the church, concept

initially seems to assign high importance (25.329)

to the upper part of the cross. But, decomposing C

reveals sub-concepts (C

and C

) where the activa-

tions highlighting the cross have the importance of

15.698 and 9.97, while the background (C

) has neg-

ative importance of -0.339. As stated earlier, a neg-

ative inﬂuence means that it makes the model’s pre-

diction less certain. Similarly, in the parachute ex-

ample, the sub-concept (C

), including the parachute

and a statue, has an importance of 40.89, whereas the

sub-concepts (C

and C

) including only the statue

have negative importance. Further decomposition of

could separate the parachute’s importance, though

this might introduce redundant sub-concepts.

CONCEPT

SUB CONCEPTS

Figure 5: Concepts decomposition into three sub-concepts

for different classes (rabbit, church and parachute).

4.3.2 Comparing CGAP and CPHA

Figure 6 compares chainsaw image concept extraction

in CGAP (top) and CPHA (bottom). The most evi-

dent observation is CPHA’s capacity to extract non-

redundant concepts. For CGAP, the essential concept

is C

with I

= 38.66 highlighting the wood log and

the chainsaw, which aligns well with this class. The

with I

= 10.022 also highlight the same area but in

a more disparate way. The other three concepts (C

and C

) redundantly focus on the chain saw en-

gine with a cumulative importance of 65.155. In con-

trast, the CPAH identiﬁes the chainsaw engine as the

most important concept, C

, with I

= 68.757, similar

to the combined importance of the three CGAP con-

cepts. CPHA also isolates the wood log into separate

concepts (C

and C

) with importances of 10.32 and

1.002, and highlights the chain and log interaction (C

and C

) with a cumulative importance of 33.735.

Clustering for Explainability: Extracting and Visualising Concepts from Activation

157

Figure 6: Concept visualisation using the CGAP (top row)

and CPHA (bottom row) on an image labelled ”Chain saw”.

5 CONCLUSION

Analyzing and visualizing concepts is key to under-

standing model predictions. By clustering activa-

tions with similar patterns, we gain insights into the

model’s learned knowledge. We use two methods for

concept extraction: CGAP, which focuses on general

activation patterns, and CPHA, which targets high

activation areas. Decomposing concepts into sub-

concepts helps avoid mixing conﬂicting elements and

compensates for clustering imperfections.

Our approach is limited by its focus on individual

images, neglecting relationships between activations

across images. Future work could explore clustering

within the same class. While our method highlights

relevant image parts for classiﬁcation, incorrect clas-

siﬁcations still require human interpretation.

ACKNOWLEDGEMENTS

We appreciate the ECE for funding the Lambda Quad

Max Deep Learning server, which is employed to ob-

tain the results in the present work.

REFERENCES

Atakishiyev, S., Salameh, M., Yao, H., and Goebel, R.

(2024). Explainable Artiﬁcial Intelligence for Au-

tonomous Driving: A Comprehensive Overview and

Field Guide for Future Research Directions.

Brock, A., De, S., and Smith, S. L. (2021a). Characteriz-

ing signal propagation to close the performance gap in

unnormalized ResNets.

Brock, A., De, S., Smith, S. L., and Simonyan, K. (2021b).

High-Performance Large-Scale Image Recognition

Without Normalization.

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-

Fei, L. (2009). Imagenet: A large-scale hierarchical

image database. In 2009 IEEE Conference on Com-

puter Vision and Pattern Recognition, pages 248–255.

Ding, C. and He, X. (2004). K-means clustering via prin-

cipal component analysis. In Twenty-First Interna-

tional Conference on Machine Learning - ICML ’04,

page 29, Banff, Alberta, Canada. ACM Press.

Fel, T., Picard, A., Bethune, L., Boissin, T., Vigouroux, D.,

Colin, J., Cad

ene, R., and Serre, T. (2023). CRAFT:

Concept Recursive Activation FacTorization for Ex-

plainability.

Ghalebikesabi, S., Ter-Minassian, L., Diaz-Ordaz, K., and

Holmes, C. (2021). On Locality of Local Explanation

Models.

Ghorbani, A., Wexler, J., Zou, J., and Kim, B. (2019). To-

wards Automatic Concept-based Explanations.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep Resid-

ual Learning for Image Recognition. In 2016 IEEE

Conference on Computer Vision and Pattern Recog-

nition (CVPR), pages 770–778, Las Vegas, NV, USA.

IEEE.

Itseez (2015). Open source computer vision library. https:

//github.com/itseez/opencv.

Kim, S. S. Y., Meister, N., Ramaswamy, V. V., Fong, R.,

and Russakovsky, O. (2022). HIVE: Evaluating the

Human Interpretability of Visual Explanations.

Lambert, A., Soni, A., Soukane, A., Cherif, A. R., and Ra-

bat, A. (2024). Artiﬁcial intelligence modelling hu-

man mental fatigue: A comprehensive survey. Neuro-

computing, 567:126999.

Lapuschkin, S., W

aldchen, S., Binder, A., Montavon, G.,

Samek, W., and M

uller, K.-R. (2019). Unmasking

Clever Hans predictors and assessing what machines

really learn. Nature Communications, 10(1):1096.

Petsiuk, V., Das, A., and Saenko, K. (2018). RISE: Ran-

domized Input Sampling for Explanation of Black-

box Models.

Selvaraju, R. R., Das, A., Vedantam, R., Cogswell, M.,

Parikh, D., and Batra, D. (2016). Grad-CAM: Why

did you say that? In NIPS. arXiv.

Simonyan, K., Vedaldi, A., and Zisserman, A. (2014). Deep

Inside Convolutional Networks: Visualising Image

Classiﬁcation Models and Saliency Maps.

Sivanandan, R. and Jayakumari, J. (2020). An Improved Ul-

trasound Tumor Segmentation Using CNN Activation

Map Clustering and Active Contours. In 2020 IEEE

5th International Conference on Computing Commu-

nication and Automation (ICCCA), pages 263–268,

Greater Noida, India. IEEE.

Smilkov, D., Thorat, N., Kim, B., Vi

egas, F., and Watten-

berg, M. (2017). SmoothGrad: Removing noise by

adding noise.

Wickramanayake, S., Hsu, W., and Lee, M. L. (2021).

Comprehensible Convolutional Neural Networks via

Guided Concept Learning.

Zhang, R., Madumal, P., Miller, T., Ehinger, K. A., and Ru-

binstein, B. I. P. (2021). Invertible Concept-based Ex-

planations for CNN Models with Non-negative Con-

cept Activation Vectors.

Zhang, Y., Weng, Y., and Lund, J. (2022). Applications

of Explainable Artiﬁcial Intelligence in Diagnosis and

Surgery. Diagnostics, 12(2):237.

KEOD 2024 - 16th International Conference on Knowledge Engineering and Ontology Development

158