A Comparative Study of CNNs and Vision-Language Models for Chart Image Classification
Bruno Côme, Bruno Côme, Maxime Devanne, Jonathan Weber, Germain Forestier
2025
Abstract
Chart image classification is a critical task in automating data extraction and interpretation from visualizations, which are widely used in domains such as business, research, and education. In this paper, we evaluate the performance of Convolutional Neural Networks (CNNs) and Vision-Language Models (VLMs) for this task, given their increasing use in various image classification and comprehension tasks. We constructed a diverse dataset of 25 chart types, each containing 1,000 images, and trained multiple CNN architectures while also assessing the zero-shot generalization capabilities of pre-trained VLMs. Our results demonstrate that CNNs, when trained specifically for chart classification, outperform VLMs, which nonetheless show promising potential without the need for task-specific training. These findings underscore the importance of CNNs in chart classification while highlighting the unexplored potential of VLMs with further fine-tuning, making this task crucial for advancing automated data visualization analysis.
DownloadPaper Citation
in Harvard Style
Côme B., Devanne M., Weber J. and Forestier G. (2025). A Comparative Study of CNNs and Vision-Language Models for Chart Image Classification. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART; ISBN 978-989-758-737-5, SciTePress, pages 816-827. DOI: 10.5220/0013374500003890
in Bibtex Style
@conference{icaart25,
author={Bruno Côme and Maxime Devanne and Jonathan Weber and Germain Forestier},
title={A Comparative Study of CNNs and Vision-Language Models for Chart Image Classification},
booktitle={Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART},
year={2025},
pages={816-827},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013374500003890},
isbn={978-989-758-737-5},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART
TI - A Comparative Study of CNNs and Vision-Language Models for Chart Image Classification
SN - 978-989-758-737-5
AU - Côme B.
AU - Devanne M.
AU - Weber J.
AU - Forestier G.
PY - 2025
SP - 816
EP - 827
DO - 10.5220/0013374500003890
PB - SciTePress