
Ha, H. and Hor
´
ak, A. (2022). Information extraction from
scanned invoice images using text analysis and layout
features. In Signal Processing: Image Communica-
tion, volume 102, page 116601.
Harley, A. W., Ufkes, A., and Derpanis, K. G. (2015). Eval-
uation of deep convolutional nets for document image
classification and retrieval. In 2015 13th International
Conference on Document Analysis and Recognition
(ICDAR), pages 991–995.
Huang, Y., Lv, T., Cui, L., Lu, Y., and Wei, F. (2022). Lay-
outlmv3: Pre-training for document ai with unified
text and image masking.
Kang, L., Kumar, J., Ye, P., Li, Y., and Doermann, D. S.
(2014). Convolutional neural networks for document
image classification. In 2014 22nd International Con-
ference on Pattern Recognition, pages 3168–3172.
Katti, A. R., Reisswig, C., Guder, C., Brarda, S., Bickel,
S., H
¨
ohne, J., and Faddoul, J. B. (2018). Chargrid:
Towards understanding 2d documents. arXiv preprint.
Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998).
Gradient-based learning applied to document recogni-
tion. In Proceedings of the IEEE, volume 86, pages
2278–2324.
Lehtonen, R., Nevalainen, P., and Murtoj
¨
arvi, M. (2020).
Automated classification of receipts and invoices
along with document extraction. In University of
Turku.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D.,
Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov,
V. (2019). Roberta: A robustly optimized bert pre-
training approach.
Majumder, B. P., Potti, N., Tata, S., Wendt, J. B., Zhao,
Q., and Najork, M. (2020). Representation learning
for information extraction from form-like documents.
In Proceedings of the 58th Annual Meeting of the As-
sociation for Computational Linguistics, pages 6495–
6504.
Mikołajczyk-Bareła, A. and Grochowski, M. (2018). Data
augmentation for improving deep learning in image
classification problem. In IIPHDW, pages 117–122.
Oral, B., Emekligil, E., Arslan, S., and Eryi
ˇ
git, G. (2020).
Information extraction from text intensive and visu-
ally rich banking documents. In Information Process-
ing and Management, volume 57, page 102361.
Perez, L. and Wang, J. (2017). The effectiveness of data
augmentation in image classification using deep learn-
ing.
Rusi
˜
nol, M., Frinken, V., Karatzas, D., Bagdanov, A. D.,
and Llad
´
os, J. (2014). Multimodal page classification
in administrative document image streams. In Interna-
tional Journal on Document Analysis and Recognition
(IJDAR), volume 17, pages 331–341.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and
Chen, L.-C. (2019). Mobilenetv2: Inverted residuals
and linear bottlenecks.
Shorten, C. and Khoshgoftaar, T. (2019). A survey on image
data augmentation for deep learning. In Journal of Big
Data.
Simonyan, K. and Zisserman, A. (2015). Very deep convo-
lutional networks for large-scale image recognition.
Wang, Y., Du, J., Ma, J., Hu, P., Zhang, Z., and Zhang,
J. (2023). Ustc-iflytek at docile: A multi-modal ap-
proach using domain-specific graphdoc. In Confer-
ence and Labs of the Evaluation Forum.
Yang, S., Xiao, W., Zhang, M., Guo, S., Zhao, J., and Shen,
F. (2023). Image data augmentation for deep learning:
A survey.
¨
Omer Arslan and Uymaz, S. A. (2022). Classification of in-
voice images by using convolutional neural networks.
In Journal of Advanced Research in Natural and Ap-
plied Sciences, volume 8, pages 8–25. C¸ anakkale On-
sekiz Mart University.
ˇ
St
ˇ
ep
´
an
ˇ
Simsa,
ˇ
Sulc, M., U
ˇ
ri
ˇ
c
´
a
ˇ
r, M., Patel, Y., Hamdi, A.,
Koci
´
an, M., Matas, J., Doucet, A., Coustaty, M., and
Karatzas, D. (2023). Docile benchmark for document
information localization and extraction.
APPENDIX
Figure 5: F1-scores of different models tested on different
datasets for the information extraction task.
Deep Learning for Effective Classification and Information Extraction of Financial Documents
755