Vision-Language Models for E-commerce: Detecting Non-Compliant Product Images in Online Catalogs
Maciej Niemir, Dominika Grajewska, Bartłomiej Nitoń
2025
Abstract
This study explores the use of vision-language models (VLMs) for automated validation of product images in e-commerce, aiming to ensure visual consistency and accuracy without the need for extensive data annotation and specialized training. We evaluated two VLMs, LLaVA and Moondream2, to determine their effectiveness in classifying images based on suitability for online display, focusing on aspects such as visibility and representational clarity. Each model was tested with varying textual prompts to assess the impact of query phrasing on predictive accuracy. Moondream2 outperformed LLaVA in both precision and processing speed, making it a more practical solution for large-scale e-Commerce applications. Its high specificity and negative predictive value (NPV) highlight its effectiveness in identifying non-compliant images. Our results suggest that VLMs like Moondream2 provide a viable approach to visual validation in e-Commerce, offering benefits in scalability and implementation efficiency, particularly where a rapid and reliable assessment of product imagery is critical. This research demonstrates the potential of VLMs as effective alternatives to traditional image validation methods, underscoring their role in enhancing the quality of the digital catalog.
DownloadPaper Citation
in Harvard Style
Niemir M., Grajewska D. and Nitoń B. (2025). Vision-Language Models for E-commerce: Detecting Non-Compliant Product Images in Online Catalogs. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-737-5, SciTePress, pages 1116-1123. DOI: 10.5220/0013265000003890
in Bibtex Style
@conference{icaart25,
author={Maciej Niemir and Dominika Grajewska and Bartłomiej Nitoń},
title={Vision-Language Models for E-commerce: Detecting Non-Compliant Product Images in Online Catalogs},
booktitle={Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2025},
pages={1116-1123},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013265000003890},
isbn={978-989-758-737-5},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - Vision-Language Models for E-commerce: Detecting Non-Compliant Product Images in Online Catalogs
SN - 978-989-758-737-5
AU - Niemir M.
AU - Grajewska D.
AU - Nitoń B.
PY - 2025
SP - 1116
EP - 1123
DO - 10.5220/0013265000003890
PB - SciTePress