New Paths in Document Data Augmentation Using Templates and Language Models

Lucas Wojcik, Luiz Coelho, Roger Granada, David Menotti

2025

Abstract

Document Recognition has been tackled with a state of the art (SOTA) mostly composed of multi-modal transformers. Usually, these are trained in an unsupervised pre-training phase followed by a supervised fine-tuning phase where real-world tasks are solved, meaning both model and training procedures are borrowed from NLP research. However, there is a lack of available data with rich annotations for some of these downstream tasks, balanced by the copious amounts of pre-training data available. We can also solve this problem through data augmentation. We present two novel data augmentation methods for documents, each one used in different scopes. The first is based on simple structured graph objects that encode a document’s layout, called templates, used to augment the EPHOIE and NBID datasets. The other one uses a Large Language Model (LLM) to provide alternative versions of the document’s texts, used to augment the FUNSD dataset. These methods create instances by augmenting layout and text together (imageless), and so we use LiLT, a model that deals only with text and layout for validation. We show that our augmentation procedure significantly improves the model’s baseline, opening up many possibilities for future research.

Download


Paper Citation


in Harvard Style

Wojcik L., Coelho L., Granada R. and Menotti D. (2025). New Paths in Document Data Augmentation Using Templates and Language Models. In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP; ISBN 978-989-758-728-3, SciTePress, pages 356-366. DOI: 10.5220/0013145900003912


in Bibtex Style

@conference{visapp25,
author={Lucas Wojcik and Luiz Coelho and Roger Granada and David Menotti},
title={New Paths in Document Data Augmentation Using Templates and Language Models},
booktitle={Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP},
year={2025},
pages={356-366},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013145900003912},
isbn={978-989-758-728-3},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP
TI - New Paths in Document Data Augmentation Using Templates and Language Models
SN - 978-989-758-728-3
AU - Wojcik L.
AU - Coelho L.
AU - Granada R.
AU - Menotti D.
PY - 2025
SP - 356
EP - 366
DO - 10.5220/0013145900003912
PB - SciTePress