properties, and dependencies. For the sake of sim-
plicity, we only looked at Tetra Packs in our initial
implementation. Early results showed a lack of gen-
eralization from the YOLOv5 object detection model
(Jocher et al., 2020) when trained on synthetic data.
Therefore, future research will explore automating
annotated product image creation using generative ad-
versarial networks in combination with the developed
rule based approach. We also plan to evaluate large
language models such as GPT-4 (OpenAI, 2023) as an
alternative approach for extracting information from
product images.
CODE AVAILABILITY
All Jupyter Notebooks, scripts, and data can be found
inside the following repository: https://gitlab.rlp.net/
ISS/food-product-image-dataset.
ACKNOWLEDGEMENTS
This work was funded by the German Federal Min-
istry of Education and Research (FKZ 01IS20085).
REFERENCES
Chen, F., Zhang, H., Li, Z., Dou, J., Mo, S., Chen, H.,
Zhang, Y., Ahmed, U., Zhu, C., and Savvides, M.
(2022). Unitail: Detecting, Reading, and Matching
in Retail Scene. arXiv:2204.00298 [cs].
Collins, J., Goel, S., Deng, K., Luthra, A., Xu, L., Gun-
dogdu, E., Zhang, X., Yago Vicente, T. F., Dideriksen,
T., Arora, H., Guillaumin, M., and Malik, J. (2022).
Abo: Dataset and benchmarks for real-world 3d ob-
ject understanding. CVPR.
darrenl (2022). LabelImg. Available at https://github.com/
tzutalin/labelImg, last accessed 14.03.2023.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. (2009). Imagenet: A large-scale hierarchical
image database. In 2009 IEEE Conference on Com-
puter Vision and Pattern Recognition, pages 248–255.
European Parliament and the Council (2011). REGU-
LATION (EU) No 1169/2011 OF THE EUROPEAN
PARLIAMENT AND OF THE COUNCIL of 25 Oc-
tober 2011.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J.,
and Zisserman, A. (2010). The Pascal Visual Object
Classes (VOC) Challenge. International Journal of
Computer Vision, 88(2):303–338.
Follmann, P., B
¨
ottger, T., H
¨
artinger, P., K
¨
onig, R., and Ul-
rich, M. (2018). MVTec D2S: Densely Segmented
Supermarket Dataset.
George, M. and Floerkemeier, C. (2014). Recognizing
Products: A Per-exemplar Multi-label Image Classifi-
cation Approach. In Fleet, D., Pajdla, T., Schiele, B.,
and Tuytelaars, T., editors, Computer Vision – ECCV
2014, pages 440–455, Cham. Springer International
Publishing.
Georgiadis, K., Kordopatis-Zilos, G., Kalaganis, F.,
Migkotzidis, P., Chatzilari, E., Panakidou, V.,
Pantouvakis, K., Tortopidis, S., Papadopoulos, S.,
Nikolopoulos, S., and Kompatsiaris, I. (2021).
Products-6K: A Large-Scale Groceries Product
Recognition Dataset. In The 14th PErvasive
Technologies Related to Assistive Environments
Conference, pages 1–7, Corfu Greece. ACM.
Goldman, E., Herzig, R., Eisenschtat, A., Goldberger, J.,
and Hassner, T. (2019). Precise detection in densely
packed scenes. In Proc. Conf. Comput. Vision Pattern
Recognition (CVPR).
GS1 (2020). Global Product Classification (GPC) | GS1.
Available at https://www.gs1.org/standards/gpc, last
accessed 17.03.2023.
GS1 (2022). GS1 Product Image Specification Stan-
dard | GS1. Available at https://www.gs1.org/
standards/gs1-product-image-specification-standard/
current-standard, last accessed 17.03.2023.
GS1 (2023). GS1 Web Vocabulary Food Beverage
Tobacco Product. Available at https://www.gs1.
org/voc/FoodBeverageTobaccoProduct, last accessed
11.04.2022.
GS1 Netherlands (2022). Codes for types of pack-
aging - GS1 Netherlands. Available at https:
//gs1.nl/en/knowledge-base/gs1-datapools-overview/
codes-for-types-of-packaging/, last accessed
17.03.2023.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep Resid-
ual Learning for Image Recognition. In 2016 IEEE
Conference on Computer Vision and Pattern Recog-
nition (CVPR), pages 770–778, Las Vegas, NV, USA.
IEEE.
Heartex (2023). heartexlabs/label-studio. Available
at https://github.com/heartexlabs/label-studio, last ac-
cessed 12.04.2022.
Jocher, G., Nishimura, K., Mineeva, T., and Vilari
˜
no,
R. (2020). yolov5. Code repository https://github.
com/ultralytics/yolov5.
Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu,
X., and Pietik
¨
ainen, M. (2020). Deep Learning for
Generic Object Detection: A Survey. International
Journal of Computer Vision, 128(2):261–318.
Merler, M., Galleguillos, C., and Belongie, S. (2007). Rec-
ognizing groceries in situ using in vitro training data.
In 2007 IEEE Conference on Computer Vision and
Pattern Recognition, pages 1–8.
OpenAI (2023). GPT-4 Technical Report.
arXiv:2303.08774 [cs].
Wei, X.-S., Cui, Q., Yang, L., Wang, P., and Liu, L. (2019).
RPC: A Large-Scale Retail Product Checkout Dataset.
Number: arXiv:1901.07249 arXiv:1901.07249 [cs].
Wei, Y., Tran, S., Xu, S., Kang, B., and Springer, M. (2020).
Deep Learning for Retail Product Recognition: Chal-
lenges and Techniques. Computational Intelligence
and Neuroscience, 2020:1–23.
Creation and Evaluation of a Food Product Image Dataset for Product Property Extraction
495