TokenOCR: An Attention Based Foundational Model for Intelligent Optical Character Recognition
Charith Gunasekara, Zachary Hamel, Feng Du, Connor Baillie
2025
Abstract
Optical Character Recognition (OCR) plays a pivotal role in digitizing and analyzing text from physical documents. Despite advancements in OCR technologies, challenges persist in handling diverse text layouts, poor-quality images, and complex fonts. In this paper, we present TokenOCR, an attention-based foundational model designed to overcome these limitations by integrating convolutional neural networks and transformer-based architectures. Unlike traditional OCR models that predict individual characters, TokenOCR predicts tokens, significantly enhancing recognition accuracy and efficiency. The model employs a ResNet50 feature extractor, an encoder with adaptive 2D positional embeddings, and a decoder utilizing multi-headed attention mechanisms for robust text recognition. To train TokenOCR, we used a dataset of six million images incorporating various real-world applications. Our training strategy integrates curriculum learning and adaptive learning rate scheduling to ensure efficient model convergence and generalization. Comprehensive evaluations using Word Error Rate (WER) and Character Error Rate (CER) demonstrate that TokenOCR consistently outperforms state-of-the-art models, including Tesseract and TrOCR, in both clean and degraded image conditions. These findings underscore TokenOCR’s potential to set new standards in OCR technology, offering a scalable, efficient, and highly accurate solution for diverse text recognition applications.
DownloadPaper Citation
in Harvard Style
Gunasekara C., Hamel Z., Du F. and Baillie C. (2025). TokenOCR: An Attention Based Foundational Model for Intelligent Optical Character Recognition. In Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM; ISBN 978-989-758-730-6, SciTePress, pages 151-158. DOI: 10.5220/0013340100003905
in Bibtex Style
@conference{icpram25,
author={Charith Gunasekara and Zachary Hamel and Feng Du and Connor Baillie},
title={TokenOCR: An Attention Based Foundational Model for Intelligent Optical Character Recognition},
booktitle={Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM},
year={2025},
pages={151-158},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013340100003905},
isbn={978-989-758-730-6},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM
TI - TokenOCR: An Attention Based Foundational Model for Intelligent Optical Character Recognition
SN - 978-989-758-730-6
AU - Gunasekara C.
AU - Hamel Z.
AU - Du F.
AU - Baillie C.
PY - 2025
SP - 151
EP - 158
DO - 10.5220/0013340100003905
PB - SciTePress