A Comparative Study of CNNs and Vision-Language Models for Chart Image Classification

Bruno Côme, Bruno Côme, Maxime Devanne, Jonathan Weber, Germain Forestier

2025

Abstract

Chart image classification is a critical task in automating data extraction and interpretation from visualizations, which are widely used in domains such as business, research, and education. In this paper, we evaluate the performance of Convolutional Neural Networks (CNNs) and Vision-Language Models (VLMs) for this task, given their increasing use in various image classification and comprehension tasks. We constructed a diverse dataset of 25 chart types, each containing 1,000 images, and trained multiple CNN architectures while also assessing the zero-shot generalization capabilities of pre-trained VLMs. Our results demonstrate that CNNs, when trained specifically for chart classification, outperform VLMs, which nonetheless show promising potential without the need for task-specific training. These findings underscore the importance of CNNs in chart classification while highlighting the unexplored potential of VLMs with further fine-tuning, making this task crucial for advancing automated data visualization analysis.

Download


Paper Citation


in Harvard Style

Côme B., Devanne M., Weber J. and Forestier G. (2025). A Comparative Study of CNNs and Vision-Language Models for Chart Image Classification. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART; ISBN 978-989-758-737-5, SciTePress, pages 816-827. DOI: 10.5220/0013374500003890


in Bibtex Style

@conference{icaart25,
author={Bruno Côme and Maxime Devanne and Jonathan Weber and Germain Forestier},
title={A Comparative Study of CNNs and Vision-Language Models for Chart Image Classification},
booktitle={Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART},
year={2025},
pages={816-827},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013374500003890},
isbn={978-989-758-737-5},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART
TI - A Comparative Study of CNNs and Vision-Language Models for Chart Image Classification
SN - 978-989-758-737-5
AU - Côme B.
AU - Devanne M.
AU - Weber J.
AU - Forestier G.
PY - 2025
SP - 816
EP - 827
DO - 10.5220/0013374500003890
PB - SciTePress