perception of the CapsNet.
Subsequently, we use the term CapsNet for the
model that consists of the capsule network itself and
also the decoder. In Sabour et al. (2017), the CapsNet
is trained on the MNIST dataset (Lecun et al., 1998).
It was shown that by modifying the elements inside
the last capsules, features such as stroke thickness,
width or scale change in the digit of the decoded im-
age. Because these features are comparatively human
understandable we examine the potential of CapsNets
to create explanatory results.
We train a CapsNet model on the EMNIST let-
ters (Cohen et al., 2017) dataset. The focus is set on
the ability of the CapsNet to create explanatory image
rankings. The term image ranking refers to the order
of images based on their predicted class probability.
The higher the position of the image in the ranking,
the more it is associated with the considered class.
Firstly, we show that the vectors produced by the
CapsNet are applicable for the creation of image rank-
ings. Secondly, we create and explain the image rank-
ings. The explanation is performed by the visualiza-
tion of those areas that contributed to the prediction
of the correct class. We extend the explanation by vi-
sualizing those features that contributed to the predic-
tion of other classes. Finally, we explore the specific
characteristics of letters that are displayed in an im-
age.
Overall, the main contributions of our work is the
examination of a CapsNet’s potential and usability to
• create comprehensible image rankings for images
of the same label and
• improve investigation techniques regarding the
explainability.
2 EXPLANATORY APPROACHES
OF CNNS
As mentioned above, there are in fact explanatory ap-
proaches for CNNs. In this chapter, we provide a
brief overview about the properties of three funda-
mental explanatory approaches of CNNs: We cover
the LIME approach (Ribeiro et al., 2016), occlusion
maps (Zeiler and Fergus, 2014), saliency maps (Si-
monyan et al., 2014) and the Grad-CAM algorithm
(Selvaraju et al., 2017).
The LIME (Local Interpretable Model”=Agnostic
Explanations) approach is a general method to ex-
plain single results of an AI model. It is not limited
to any specific model architecture. The core idea of
the LIME approach is the substitution of a multidi-
mensional non”=human”=understandable model with
an easier interpretable but linear model as approxima-
tion. It is extended to non-linear approximations by
anchors (Ribeiro et al., 2018). Both approaches result
in the examination and isolation of those image areas
that highly impact the class probability. However, the
results of both approaches show that the isolated areas
differ from those features that humans would use for
their perception.
Occlusion maps as first proposed in Zeiler and
Fergus (2014) are created by occluding different parts
of the input image and hence this approach is model-
agnostic as well. Rectangles filled with gray or ran-
dom noise are often used as occluder. By shifting it
through the image and recording the predicted class
probability, it can provide insights which parts of the
image are important for a specific class. However, a
drawback is that the size of the occluder can influence
the quality of the map. Also when different objects of
the same or different classes are visible and a softmax
output is used, occluding other objects can decrease
or increase the class probability, respectively, which
might lead to a wrong impression.
The approach to create saliency maps and the
Grad-CAM algorithm are model-specific and directly
applied to an available trained CNN model. Saliency
maps visualize prominent pixels from a specified
layer of a CNN by either using guided backpropaga-
tion (Springenberg et al., 2015) or inserting the output
of a layer into the inverted model structure (Zeiler and
Fergus, 2014). Both methods provide a rough orien-
tation for the important features of a class. However,
due to the evaluation of single outputs, the resulting
features are not related to each other and no explana-
tion for the decision-making of the CNN is included.
The Grad-CAM (Gradient-based Class Activation
Map) algorithm (Selvaraju et al., 2017) computes the
gradient of the last feature maps w. r. t. a specific
class. The mean gradient of a feature map is used
as its weighting, because it describes its importance
for class. The positive values of the weighted av-
erage of the feature maps yields the class activation
map. It highlights areas in the original image that
increased the predicted class probability. Similar as
saliency maps the results of the Grad-CAM algorithm
are reasonable for a rough orientation for the CNN’s
decision. However, they primarily show that CNNs
rely on different features for the classification than hu-
mans.
ICAART 2022 - 14th International Conference on Agents and Artificial Intelligence
344