TokenOCR: An Attention Based Foundational Model for Intelligent Optical Character Recognition

Charith Gunasekara, Zachary Hamel, Feng Du, Connor Baillie

2025

Abstract

Optical Character Recognition (OCR) plays a pivotal role in digitizing and analyzing text from physical documents. Despite advancements in OCR technologies, challenges persist in handling diverse text layouts, poor-quality images, and complex fonts. In this paper, we present TokenOCR, an attention-based foundational model designed to overcome these limitations by integrating convolutional neural networks and transformer-based architectures. Unlike traditional OCR models that predict individual characters, TokenOCR predicts tokens, significantly enhancing recognition accuracy and efficiency. The model employs a ResNet50 feature extractor, an encoder with adaptive 2D positional embeddings, and a decoder utilizing multi-headed attention mechanisms for robust text recognition. To train TokenOCR, we used a dataset of six million images incorporating various real-world applications. Our training strategy integrates curriculum learning and adaptive learning rate scheduling to ensure efficient model convergence and generalization. Comprehensive evaluations using Word Error Rate (WER) and Character Error Rate (CER) demonstrate that TokenOCR consistently outperforms state-of-the-art models, including Tesseract and TrOCR, in both clean and degraded image conditions. These findings underscore TokenOCR’s potential to set new standards in OCR technology, offering a scalable, efficient, and highly accurate solution for diverse text recognition applications.

Download


Paper Citation


in Harvard Style

Gunasekara C., Hamel Z., Du F. and Baillie C. (2025). TokenOCR: An Attention Based Foundational Model for Intelligent Optical Character Recognition. In Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM; ISBN 978-989-758-730-6, SciTePress, pages 151-158. DOI: 10.5220/0013340100003905


in Bibtex Style

@conference{icpram25,
author={Charith Gunasekara and Zachary Hamel and Feng Du and Connor Baillie},
title={TokenOCR: An Attention Based Foundational Model for Intelligent Optical Character Recognition},
booktitle={Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM},
year={2025},
pages={151-158},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013340100003905},
isbn={978-989-758-730-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM
TI - TokenOCR: An Attention Based Foundational Model for Intelligent Optical Character Recognition
SN - 978-989-758-730-6
AU - Gunasekara C.
AU - Hamel Z.
AU - Du F.
AU - Baillie C.
PY - 2025
SP - 151
EP - 158
DO - 10.5220/0013340100003905
PB - SciTePress