TabProIS: A Transfer Learning-Based Model for Detecting Tables in Product Information Sheets

Michael Sildatke, Jan Delember, Bodo Kraft, Albert Zündorf

2023

Abstract

Product Information Sheets (PIS) are human-readable documents containing relevant product specifications. In these documents, tables often present the most important information. Hence, table detection is a crucial task for automating the process of Information Extraction (IE) from PIS. Modern table detection algorithms are Machine Learning (ML)-based and popular object detection networks like Faster R-CNN or Cascade Mask R-CNN form their foundation. State-of-the-art models like TableBank or CDeCNet are trained on publicly available table detection datasets. However, the documents in these datasets do not cover particular characteristics of PIS, e.g., background design elements like provider logos or watermarks. Consequently, these state-of-the-art models do not perform well enough on PIS. Transfer Learning (TL) and Ensembling describe two methods of reusing existing models to improve their performance on a specific problem. We use these techniques to build an optimized model for detecting tables in PIS, named TabProIS. This paper presents three main contributions: First, we provide a new table detection dataset containing 5,600 document images generated from PIS of the German energy industry. Second, we offer three TL-based models with different underlying network architectures, namely TableBank, CDeC-Net, and You Only Look Once (YOLO). Third, we present a pipeline to automatically optimize available models based on different Ensembling and post-processing strategies. A selection of our models and the dataset will be publicly released to enable the reproducibility of the results.

Download


Paper Citation


in Harvard Style

Sildatke M., Delember J., Kraft B. and Zündorf A. (2023). TabProIS: A Transfer Learning-Based Model for Detecting Tables in Product Information Sheets. In Proceedings of the 3rd International Conference on Image Processing and Vision Engineering - Volume 1: IMPROVE, ISBN 978-989-758-642-2, SciTePress, pages 27-36. DOI: 10.5220/0011840700003497


in Bibtex Style

@conference{improve23,
author={Michael Sildatke and Jan Delember and Bodo Kraft and Albert Zündorf},
title={TabProIS: A Transfer Learning-Based Model for Detecting Tables in Product Information Sheets},
booktitle={Proceedings of the 3rd International Conference on Image Processing and Vision Engineering - Volume 1: IMPROVE,},
year={2023},
pages={27-36},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011840700003497},
isbn={978-989-758-642-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 3rd International Conference on Image Processing and Vision Engineering - Volume 1: IMPROVE,
TI - TabProIS: A Transfer Learning-Based Model for Detecting Tables in Product Information Sheets
SN - 978-989-758-642-2
AU - Sildatke M.
AU - Delember J.
AU - Kraft B.
AU - Zündorf A.
PY - 2023
SP - 27
EP - 36
DO - 10.5220/0011840700003497
PB - SciTePress