Domain Shift in Capsule Networks

Rajath S.

∗

, Sumukh Aithal K.

∗

and S. Natarajan

†

Department of Computer Science, PES University, Bangalore, India

Keywords:

Capsule Networks, Domain Shift, Convolutional Neural Networks.

Abstract:

Capsule Networks are an exciting deep learning architecture which overcomes some of the shortcomings of

Convolutional Neural Networks (CNNs). Capsule networks aim to capture spatial relationships between parts

of an object and exhibits viewpoint invariance. In practical computer vision, the training data distribution is

different from the test distribution and the covariate shift affects the performance of the model. This prob-

lem is called Domain Shift. In this paper, we analyze how well capsule networks adapt to new domains by

experimenting with multiple routing algorithms and comparing it with CNNs.

1 INTRODUCTION

Collecting and labelling datasets for every new ma-

chine learning task and domain is extremely expen-

sive and time-consuming. There will be scenarios

where sufﬁcient training data will not be available.

Luckily, in this era, there are lots of open-source

datasets available for many domains and tasks. How-

ever, due to a lot of reasons, there is always a distri-

bution change or domain shift between two domains

that can degrade the performance.

Domain shift occurs when the train distribution

is different from the actual test distribution. Several

works have been proposed to learn indiscriminate fea-

tures from the source distribution and the target dis-

tribution. Thus domain shift is an important problem

in practical computer vision. Despite CNNs work-

ing very well in the deep learning paradigm, there are

a lot of concerns regarding its robustness to shape,

rotation, and noise. The idea of capsules and view-

point invariance properties of Capsule Networks and

a complete model was ﬁrst introduced by Sabour et

al. (Sabour et al., 2017). Capsule Networks have re-

cently been applied to many domains, such as Genera-

tive Models (Jaiswal et al., 2018), Object Localization

(Liu et al., 2018), and Graph Networks (Verma and

Zhang, 2018). There have been many unsupervised

and supervised methods to route information between

layers in Capsule nets. Several routing techniques like

EM-routing, Self routing, and Dynamic Routing have

been previously proposed.

∗

Student, equal contribution

†

Professor

CNNs are said to learn local features and not the

global object shape (Baker et al., 2018). CNNs are

also said to be highly texture biased and do not rely

on shape as humans do (Geirhos et al., 2018). In con-

trast, Capsule networks are said to explain global vi-

sual processing (Doerig et al., 2019).

In this paper, we analyze the domain shift prop-

erties of capsule networks on a few popular routing

algorithms and compare it with Convolutional Neural

Networks.

2 RELATED WORK

2.1 Capsule Nets

Convolutional Neural Networks do not capture spatial

and hierarchical relations between the parts of an ob-

ject. This problem is addressed by Capsule Networks

A capsule consists of multiple neurons that together

depict different properties of the same entity. A group

of capsules makes a layer in the capsule network. The

output of a capsule is a vector that describes various

properties of the entity in the image like pose, skew,

texture and the length of the vector denotes the prob-

ability that the object is in the image. A non-linear

process between layers are used in Capsule networks

in order to convert activation probabilities and poses

of capsules in a lower layer into the activation proba-

bilities and poses of capsules in the higher layer. Due

to this structure, Capsule Networks have properties

like equivariance and viewpoint invariance unlike tra-

ditional CNNs.

S., R., K., S. and Natarajan, S.

Domain Shift in Capsule Networks.

DOI: 10.5220/0010252002750278

In Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2021), pages 275-278

ISBN: 978-989-758-486-2

275

2.2 Formation of Capsules

If Ω

shows the set of capsules contained in layer l,

then for each capsule i ∈ Ω

there is a pose vector u

and an activation scalar α

associated with it. Along

with this, there exists a weight matrix W

pose

i j

which

predicts pose changes for every capsule j ∈ Ω

l+1

shown in equation 1

ˆu

j|i

= W

pose

i j

(1)

The pose vector of capsule j is a linear combination

of the prediction vectors as shown in equation 2

∑

i j

ˆu

j|i

(2)

In equation 2, c

i j

is a routing coefﬁcient which is

determined by the routing algorithm used in the cap-

sule network. Hence capsules are formed based on

routing algorithms.

2.3 Routing Techniques

The routing algorithm decides how to assign each

lower-level capsule to one higher level capsule. These

routing techniques are crucial as they enable upper-

level capsules to learn higher-level features by com-

bining the features of capsules at the lower layer.

Dynamic Routing (Sabour et al., 2017) Self Routing

(Hahn et al., 2019) and EM routing (Hinton et al.,

2018) are recent and popular routing algorithms used

on Capsule Nets.

In Dynamic routing, the pose is represented using

a vector, and the length of the vector determines its

activation. In contrast, the EM routing technique has

a matrix that is used to denote the pose and a sepa-

rate activation scalar is deﬁned. In the Self Routing

method, a vector is used to represent the pose and a

separate activation scalar is deﬁned.

EM routing (Hinton et al., 2018) is an unsuper-

vised routing technique where the routing procedure

is based on the Expectation Maximization algorithm.

In this technique, higher-level features are determined

based on the votes from lower level features. The vote

of a capsule is calculated by multiplying the pose ma-

trix with a learnable transformation invariant matrix

W. The viewpoint invariant transformation matrix is

learnt discriminatively and the coefﬁcients are itera-

tively updated by the EM-Algorithm. The paper (Hin-

ton et al., 2018) also shows a reduction in error rates

on datasets suitable for shape recognition tasks when

compared to CNNs.

Dynamic Routing is an unsupervised routing tech-

nique that was initially introduced by Sabour et

al.(Sabour et al., 2017). In this algorithm, an iterative

routing-by-agreement technique is used. A capsule in

a lower layer is inﬂuenced to send its output to cap-

sules in the higher layer whose activity vectors have a

big scalar product with the prediction incoming from

capsules in the lower layer.

Unlike Dynamic Routing, Self Routing(Hahn

et al., 2019) is a supervised routing algorithm, where

agreement between capsules is not required. Instead,

every capsule is routed independently based on the

subordinates in the same layer. Hence, the way acti-

vations and poses of higher capsules are obtained is

similar to that of Mixture Of Experts.

2.4 Domain Shift

Domain Adaptation aims to minimize the domain

shift. Several Deep Domain Adaptation techniques

have been proposed (Wang and Deng, 2018) based

on the concept of adversarial training. Domain Ad-

versarial Neural Networks (Ganin et al., 2016) aims

to achieve domain transfer by learning a domain in-

variant feature representation. A domain classiﬁer is

trained to discriminate whether the feature belongs to

the source domain or the target domain. The feature

extractor must extract features such that the domain

classiﬁer cannot classify whether the sample belongs

to the source domain or the target domain. Essen-

tially, the network should not contain discriminative

information about the origin of the sample.

Other methods aim to minimize the divergence be-

tween the source and target data distribution by using

divergence measures like Maximum Mean Discrep-

ancy and Correlation Alignment (Sun et al., 2017).

Maximum Mean Discrepancy aims is a divergence

measure which compares whether two samples be-

long to the same distribution by comparing the means

of the features after mapping them to a reproducible

Kernel Hilbert Space.

3 MOTIVATION

Our hypothesis is that capsule networks will have a

smaller domain shift as compared to CNNs. The mo-

tivation behind this hypothesis is that since capsule

networks claim to capture the spatial relationship be-

tween parts of an object (Sabour et al., 2017), the net-

work should be less susceptible to domain shift when

compared to CNNs.

ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods

276

4 EXPERIMENTS AND

EVALUATION

4.1 Architecture

To compare the domain shift on different datasets, we

use a common architecture to train both CIFAR-10

and SVHN. A ResNet-20 block is used for the base

CNN architecture. As the ResNet-20 block consists

of 19 convolution layers followed by average pooling

and fully connected layers, a Capsule Network is built

on top of it by replacing the last two layers with a

primary capsule and fully-connected capsule layer. In

order to have an equal comparison between all routing

techniques, we use the same base CNN architecture

for all of them.

The main reason for choosing a ResNet-20 block

was that it is applied in various architectures and per-

forms very well on most vision-related tasks.

To compare domain shift, we also have a CNN

baseline which consists of a standard ResNet-18

block. This CNN baseline network trained on datasets

like CIFAR-10 and SVHN.

4.2 Implementational Details

We have used a Stochastic Gradient Descent(SGD)

Optimizer to optimize our parameters with an initial

learning rate of 0.1, momentum with value 0.9, and a

learning rate decay factor of 10

−4

. Negative log likeli-

hood loss is used while training the models. All mod-

els were trained for 100 epochs and the model with

the best validation accuracy was chosen for predict-

ing on the test set. The number of capsules per layer

is set to 16 and routing is performed once between the

primary capsule layer and the fully connected layer in

the case of Capsule Networks.

4.3 Datasets

Datasets used to examine domain shift are CIFAR-

10 (Krizhevsky et al., 2009), STL-10 (Coates et al.,

2011), SVHN (Netzer et al., 2011) and MNIST (Le-

Cun et al., 2010).

MNIST is a database of handwritten digits has a

dataset size of 70,000 samples.

MNIST-M is a dataset that is synthetically gener-

ated by randomly replacing the foreground and back-

ground of MNIST samples with natural images.

The Street View House Numbers dataset (SVHN)

(Coates et al., 2011) used, contains around 100,000

digit images procured from Google Street View Im-

ages. This real-world dataset has images which are of

size 32X32.

The CIFAR-10 (Krizhevsky et al., 2009) dataset

contains around 60,000 coloured images belonging to

10 classes which are 32x32 in size.

The STL-10 (Coates et al., 2011) consists of the

same 10 classes as that of the CIFAR-10 dataset, but

with higher resolution images of size 96X96.

While training on CIFAR-10 and SVHN, we use

augmentation techniques like random ﬂip and random

crop.

5 RESULTS AND ANALYSIS

We analyze different routing algorithms and their do-

main shift when trained on two important datasets:

CIFAR-10 and SVHN. Capsule Networks with dif-

ferent routing techniques are trained on the source

dataset and tested on the target dataset. In all the ex-

periments, the test accuracies on the source and target

domains are reported.

In the ﬁrst experiment, as shown in Table 1

the model is trained on the SVHN dataset and pre-

dicted on the MNIST dataset. From this experi-

ment, we show that EM-Routing has minimal domain

shift when compared to other routing techniques and

CNNs.

In the second experiment as shown in Table 2,

we choose the SVHN dataset as our source domain

and MNIST-M as the target domain. The domain

shift of EM-Routing and CNNs are comparable. Self-

routing and Dynamic Routing algorithms underper-

form in terms of domain shift.

Finally, in the last experiment as depicted in Ta-

ble 3, CIFAR-10 is trained as the source dataset and

its performance is evaluated on the target STL-10

dataset. Domain shift of EM-Routing is on par with

CNN. Dynamic and Self-Routing techniques slightly

underperform.

We can hence show that EM-routing performs

well amongst all routing techniques, and most of the

time performing better than CNNs in terms of mini-

mizing domain shift. Dynamic Routing technique and

Self-Routing are more susceptible to domain shift de-

pending on the experiments performed.

6 CONCLUSION AND FUTURE

WORK

In this paper, we have carried out a comprehensive

analysis of Domain Shift in Capsule Networks by

considering different routing algorithms. Using a

Capsule network with different routing techniques,

Domain Shift in Capsule Networks

277

Table 1: Source SVHN Target MNIST.

Model Source Accuracy Target Accuracy Domain Shift

Dynamic Routing 95.25 69.79 25.46

EM-Routing 94.3 75.13 19.17

Self-Routing 92.91 60.03 32.88

CNN 96.11 74.01 22.1

Table 2: Source SVHN Target MNIST-M.

Model Source Accuracy Target Accuracy Domain Shift

Dynamic Routing 95.25 47.94 47.31

EM-Routing 94.30 51.31 42.99

Self-Routing 92.91 46.92 45.99

CNN 96.11 53.23 42.88

Table 3: Source CIFAR-10 Target STL10.

Model Source Accuracy Target Accuracy Domain Shift

Dynamic Routing 85.15 30.62 54.53

EM-Routing 82.67 39 43.67

Self-Routing 79.63 38.55 41.08

CNN 91.88 47.06 44.82

we examined how well these models adapt to new

domains. These Capsule network models are then

compared with a baseline CNN architecture to prove

the former’s superiority in adapting to new domains.

A lower domain shift hence proves the Capsule net-

work’s viewpoint invariance and equivariance proper-

ties. This can be further enhanced by experimenting

on larger different datasets and routing techniques to

better understand the Domain Shift in Capsule Net-

works. Further work can be done to use Capsule net-

works for domain adaptation and domain generaliza-

tion.

REFERENCES

Baker, N., Lu, H., Erlikhman, G., and Kellman, P. J. (2018).

Deep convolutional networks do not classify based

on global object shape. PLoS computational biology,

14(12):e1006613.

Coates, A., Ng, A., and Lee, H. (2011). An analy-

sis of single-layer networks in unsupervised feature

learning. In Proceedings of the fourteenth interna-

tional conference on artiﬁcial intelligence and statis-

tics, pages 215–223.

Doerig, A., Schmittwilken, L., Sayim, B., Manassi, M., and

Herzog, M. H. (2019). Capsule networks but not clas-

sic cnns explain global visual processing. bioRxiv,

page 747394.

Ganin, Y., Ustinova, E., Ajakan, H., Germain, P.,

Larochelle, H., Laviolette, F., Marchand, M., and

Lempitsky, V. S. (2016). Domain-adversarial training

of neural networks. J. Mach. Learn. Res., 17:59:1–

59:35.

Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wich-

mann, F. A., and Brendel, W. (2018). Imagenet-

trained cnns are biased towards texture; increasing

shape bias improves accuracy and robustness. arXiv

preprint arXiv:1811.12231.

Hahn, T., Pyeon, M., and Kim, G. (2019). Self-routing

capsule networks. In Advances in Neural Information

Processing Systems, pages 7658–7667.

Hinton, G. E., Sabour, S., and Frosst, N. (2018). Matrix

capsules with EM routing. In International conference

on learning representations.

Jaiswal, A., AbdAlmageed, W., Wu, Y., and Natarajan, P.

(2018). Capsulegan: Generative adversarial capsule

network. In Proceedings of the European Conference

on Computer Vision (ECCV), pages 0–0.

Krizhevsky, A., Hinton, G., et al. (2009). Learning multiple

layers of features from tiny images.

LeCun, Y., Cortes, C., and Burges, C. (2010). MNIST hand-

written digit database. ATT Labs [Online]. Available:

http://yann.lecun.com/exdb/mnist, 2.

Liu, W., Barsoum, E., and Owens, J. D. (2018). Object

localization with a weakly supervised capsnet. arXiv

preprint arXiv:1805.07706.

Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., and

Ng, A. Y. (2011). Reading digits in natural images

with unsupervised feature learning.

Sabour, S., Frosst, N., and Hinton, G. E. (2017). Dynamic

routing between capsules. In Advances in neural in-

formation processing systems, pages 3856–3866.

Sun, B., Feng, J., and Saenko, K. (2017). Correlation Align-

ment for Unsupervised Domain Adaptation, pages

153–171. Springer International Publishing, Cham.

Verma, S. and Zhang, Z.-L. (2018). Graph cap-

sule convolutional neural networks. arXiv preprint

arXiv:1805.08090.

Wang, M. and Deng, W. (2018). Deep visual domain adap-

tation: A survey. Neurocomputing, 312:135 – 153.

ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods

278