
their superior performance in terms of accuracy and
robustness makes them valuable for complex image
classification tasks. The trade-off between computa-
tional complexity and model performance should be
carefully considered based on the specific needs of
the application.
6 CONCLUSION
In this paper, we introduced a novel classification
approach that integrates Kolmogorov-Arnold Net-
works (KANs) with Convolutional Neural Networks
(CNNs), termed CNN-KAN. This innovative archi-
tecture harnesses the powerful feature extraction ca-
pabilities of CNNs and the sophisticated modeling of
complex relationships provided by KANs. Our exten-
sive evaluations across multiple datasets demonstrate
that CNN-KAN consistently outperforms traditional
CNN architectures in terms of accuracy, precision,
and recall.
We also explored the application of pretrained
models, showing that the proposed CNN-KAN mod-
els not only enhance efficiency but also improve ro-
bustness. The experimental results clearly indicate
that the integration of KAN layers into CNN architec-
tures leads to significant performance gains in diverse
image classification tasks.
In summary, the ConvKAN approach represents a
promising advancement in the field of computer vi-
sion, offering a robust and efficient solution for com-
plex image classification challenges. This work paves
the way for future research to further optimize and
expand the capabilities of CNN-KAN models.
REFERENCES
Amosa, T. I., Sebastian, P., Izhar, L. I., Ibrahim, O., Ayinla,
L. S., Bahashwan, A. A., Bala, A., and Samaila, Y. A.
(2023). Multi-camera multi-object tracking: a review
of current trends and future advances. Neurocomput-
ing, 552:126558.
Balazevic, I., Steiner, D., Parthasarathy, N., Arandjelovi
´
c,
R., and Henaff, O. (2024). Towards in-context scene
understanding. Advances in Neural Information Pro-
cessing Systems, 36.
Chen, X., Peng, H., Wang, D., Lu, H., and Hu, H. (2023).
Seqtrack: Sequence to sequence learning for visual
object tracking. In Proceedings of the IEEE/CVF con-
ference on computer vision and pattern recognition,
pages 14572–14581.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn,
D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer,
M., Heigold, G., Gelly, S., et al. (2020). An image is
worth 16x16 words: Transformers for image recogni-
tion at scale. arXiv preprint arXiv:2010.11929.
Fan, D.-P., Ji, G.-P., Xu, P., Cheng, M.-M., Sakaridis, C.,
and Van Gool, L. (2023). Advances in deep concealed
scene understanding. Visual Intelligence, 1(1):16.
Genet, R. and Inzirillo, H. (2024). A temporal kolmogorov-
arnold transformer for time series forecasting. arXiv
preprint arXiv:2406.02486.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
Jmour, N., Zayen, S., and Abdelkrim, A. (2018). Convo-
lutional neural networks for image classification. In
2018 international conference on advanced systems
and electric technologies (IC
ASET), pages 397–402.
IEEE.
Kim, H. E., Cosa-Linan, A., Santhanam, N., Jannesari, M.,
Maros, M. E., and Ganslandt, T. (2022). Transfer
learning for medical image classification: a literature
review. BMC medical imaging, 22(1):69.
Koonce, B. and Koonce, B. (2021). Vgg network. Con-
volutional Neural Networks with Swift for Tensorflow:
Image Recognition and Dataset Categorization, pages
35–50.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
agenet classification with deep convolutional neural
networks. In Advances in Neural Information Pro-
cessing Systems, volume 25, pages 1097–1105.
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998).
Gradient-based learning applied to document recogni-
tion. Proceedings of the IEEE, 86(11):2278–2324.
Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J.,
Solja
ˇ
ci
´
c, M., Hou, T. Y., and Tegmark, M. (2024).
Kan: Kolmogorov-arnold networks. arXiv preprint
arXiv:2404.19756.
Mathis, A., Mamidanna, P., Cury, K. M., Abe, T., Murthy,
V. N., Mathis, M. W., and Bethge, M. (2022).
Deeplabcut: markerless pose estimation of user-
defined body parts with deep learning. Nature Neu-
roscience, 21:1546–1726.
Meinhardt, T., Kirillov, A., Leal-Taixe, L., and Feichten-
hofer, C. (2022). Trackformer: Multi-object tracking
with transformers. In Proceedings of the IEEE/CVF
conference on computer vision and pattern recogni-
tion, pages 8844–8854.
Nath, S. S., Mishra, G., Kar, J., Chakraborty, S., and Dey,
N. (2014). A survey of image classification meth-
ods and techniques. In 2014 International conference
on control, instrumentation, communication and com-
putational technologies (ICCICCT), pages 554–557.
IEEE.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and
Chen, L.-C. (2018). Mobilenetv2: Inverted residu-
als and linear bottlenecks. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recogni-
tion, pages 4510–4520.
Shi, J.-C., Wang, M., Duan, H.-B., and Guan, S.-H.
(2024). Language embedded 3d gaussians for open-
vocabulary scene understanding. In Proceedings of
ConvKAN: Towards Robust, High-Performance and Interpretable Image Classification
57