While CNNs have been the go-to solution for
medical image classification, these limitations
highlight the need for more advanced techniques
capable of addressing the challenges in medical
imaging. Capsule Networks (CapsNets) have been
introduced as a potential solution. CapsNets are
designed to overcome the shortcomings of CNNs by
encoding both the presence and spatial orientation of
features, thus preserving important geometric
relationships. This ability to maintain spatial
hierarchies and relationships allows CapsNets to
perform better on tasks that involve complex image
structures, such as brain tumor classification. By
preserving the location and orientation of features,
CapsNets offer the potential to improve the accuracy
of tumor classification and overcome the
shortcomings of CNNs.
Despite the promising results of CapsNets,
challenges remain in their application to brain tumor
detection. Current CapsNet-based methods have
shown improvements over traditional CNN
approaches, but they still face issues related to
computational complexity and suboptimal
segmentation accuracy. Additionally, training
CapsNets on smaller, limited datasets can hinder their
ability to generalize to unseen variations in tumor
characteristics. These gaps underscore the need for
further research and refinement of CapsNet
architectures, along with the development of more
diverse and augmented medical image datasets, to
fully realize their potential in brain tumor
classification.
In this study, we present a modified Capsule
Network (CapsNet) model tailored for brain tumor
classification. Capsule Networks are designed to
address the limitations of traditional Convolutional
Neural Networks (CNNs) by leveraging capsules—
groups of neurons that output vectors representing
both the probability and spatial properties (pose) of
features. A key advantage of CapsNet is its ability to
recognize spatial relationships and part-whole
hierarchies, which enhances generalization across
transformed data.
Our model begins with standard convolutional
layers to extract lower-level features from the input
images. These features are then processed by a
custom Capsule Layer, which performs feature
detection by utilizing a weight matrix and
encapsulating these features as vectors. The Capsule
Layer uses a routing mechanism (such as dynamic
routing, though simplified in this implementation) to
route outputs from lower-level capsules to higher-
level ones, ensuring that the spatial relationships
between detected features are preserved.
The model’s output layer uses softmax activation
to classify images based on the output from the
capsule layer, enabling the network to learn complex
feature hierarchies and improve accuracy. The
network is trained using standard backpropagation,
with the training process monitored using validation
data over multiple epochs.
2 LITERATURE REVIEW
Recent studies have extensively compared popular
deep learning architectures such as CNN, VGG, and
ResNet for brain tumor classification, highlighting
both their strengths and limitations. For instance,
(Anwar, 2024) explored the use of CNNs for brain
tumor detection and segmentation, demonstrating the
model's strong capability for image classification.
However, the study also highlighted issues such as
feature loss during downsampling and the need for
more efficient feature representations (Anwar, 2024).
VGG and ResNet, although effective for image
classification tasks, face challenges in accurately
capturing fine-grained details necessary for precise
tumor segmentation. In particular, VGG, known for
its depth and simplicity, and ResNet, which utilizes
residual connections to avoid the vanishing gradient
problem, often struggle to handle complex spatial
relationships in medical images, such as in the case of
brain tumor segmentation (Ibrahim, 2023). These
findings underscore the need for improved models
that can better preserve the spatial hierarchies of
features in medical image data.
The drawbacks of CNNs and traditional
architectures have led to the development of alternative
models, notably Capsule Networks (CapsNet).
CapsNet, introduced by (Hinton, 2018), addresses
some of the shortcomings of CNNs, particularly in
terms of capturing spatial hierarchies and rotation
invariance. Capsule Networks preserve spatial
relationships between features by using "capsules,"
which are groups of neurons encoding both the
presence and orientation of objects. This approach
improves the model's robustness in recognizing
complex patterns and spatial features, making it
particularly suited for medical image analysis,
including brain tumor detection (Sabour, 2017).
Mathematically, the basic operation of a Capsule
Network is described by dynamic routing, where
capsules use a dynamic algorithm to route
information between layers. This allows capsules to
better maintain the spatial relationships between
features, overcoming the problem of information loss
seen in CNNs during max-pooling operations. Max-