loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Michael Sildatke 1 ; Jan Delember 2 ; Bodo Kraft 1 and Albert Zündorf 3

Affiliations: 1 FH Aachen, University of Applied Sciences, Germany ; 2 Maastricht University, The Netherlands ; 3 University of Kassel, Germany

Keyword(s): Table Detection, Transfer Learning, Document Images, Machine Learning, Model Optimization, Deep Learning, Neural Networks.

Abstract: Product Information Sheets (PIS) are human-readable documents containing relevant product specifications. In these documents, tables often present the most important information. Hence, table detection is a crucial task for automating the process of Information Extraction (IE) from PIS. Modern table detection algorithms are Machine Learning (ML)-based and popular object detection networks like Faster R-CNN or Cascade Mask R-CNN form their foundation. State-of-the-art models like TableBank or CDeCNet are trained on publicly available table detection datasets. However, the documents in these datasets do not cover particular characteristics of PIS, e.g., background design elements like provider logos or watermarks. Consequently, these state-of-the-art models do not perform well enough on PIS. Transfer Learning (TL) and Ensembling describe two methods of reusing existing models to improve their performance on a specific problem. We use these techniques to build an optimized model for det ecting tables in PIS, named TabProIS. This paper presents three main contributions: First, we provide a new table detection dataset containing 5,600 document images generated from PIS of the German energy industry. Second, we offer three TL-based models with different underlying network architectures, namely TableBank, CDeC-Net, and You Only Look Once (YOLO). Third, we present a pipeline to automatically optimize available models based on different Ensembling and post-processing strategies. A selection of our models and the dataset will be publicly released to enable the reproducibility of the results. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.188.227.192

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Sildatke, M.; Delember, J.; Kraft, B. and Zündorf, A. (2023). TabProIS: A Transfer Learning-Based Model for Detecting Tables in Product Information Sheets. In Proceedings of the 3rd International Conference on Image Processing and Vision Engineering - IMPROVE; ISBN 978-989-758-642-2; ISSN 2795-4943, SciTePress, pages 27-36. DOI: 10.5220/0011840700003497

@conference{improve23,
author={Michael Sildatke. and Jan Delember. and Bodo Kraft. and Albert Zündorf.},
title={TabProIS: A Transfer Learning-Based Model for Detecting Tables in Product Information Sheets},
booktitle={Proceedings of the 3rd International Conference on Image Processing and Vision Engineering - IMPROVE},
year={2023},
pages={27-36},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011840700003497},
isbn={978-989-758-642-2},
issn={2795-4943},
}

TY - CONF

JO - Proceedings of the 3rd International Conference on Image Processing and Vision Engineering - IMPROVE
TI - TabProIS: A Transfer Learning-Based Model for Detecting Tables in Product Information Sheets
SN - 978-989-758-642-2
IS - 2795-4943
AU - Sildatke, M.
AU - Delember, J.
AU - Kraft, B.
AU - Zündorf, A.
PY - 2023
SP - 27
EP - 36
DO - 10.5220/0011840700003497
PB - SciTePress