Baek, Y., Lee, B., Han, D., Yun, S., and Lee, H. (2019b).
Character Region Awareness for Text Detection. In
IEEE Conference on Computer Vision and Pattern
Recognition, CVPR 2019, Long Beach, CA, USA,
June 16-20, 2019, pages 9365–9374. Computer Vision
Foundation / IEEE.
Bulatov, K. B., Arlazarov, V. V., Chernov, T. S., Slavin, O.,
and Nikolaev, D. P. (2017). Smart IDReader: Doc-
ument Recognition in Video Stream. In 7th Interna-
tional Workshop on Camera-Based Document Anal-
ysis and Recognition, 14th IAPR International Con-
ference on Document Analysis and Recognition, CB-
DAR@ICDAR 2017, Kyoto, Japan, November 9-15,
2017, pages 39–44. IEEE.
Chen, X., Jin, L., Zhu, Y., Luo, C., and Wang, T. (2020).
Text Recognition in the Wild: A Survey. CoRR,
abs/2005.03492.
Chen, Y. and Shao, Y. (2019). Scene Text Recognition
Based on Deep Learning: A Brief Survey. In 2019
IEEE 11th International Conference on Communica-
tion Software and Networks (ICCSN), pages 688–693.
Chernov, T., Ilin, D., Bezmaternykh, P., Faradzhev, I.,
and Karpenko, S. (2016). Research of Segmenta-
tion Methods for Images of Document Textual Blocks
Based on the Structural Analysis and Machine Learn-
ing. Vestnik RFFI, pages 55–71.
Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2019).
BERT: Pre-training of Deep Bidirectional Transform-
ers for Language Understanding. In Burstein, J.,
Doran, C., and Solorio, T., editors, Proceedings of
the 2019 Conference of the North American Chap-
ter of the Association for Computational Linguistics:
Human Language Technologies, NAACL-HLT 2019,
Minneapolis, MN, USA, June 2-7, 2019, Volume 1
(Long and Short Papers), pages 4171–4186. Associ-
ation for Computational Linguistics.
Graves, A., Fern
´
andez, S., Gomez, F. J., and Schmid-
huber, J. (2006). Connectionist temporal classifica-
tion: labelling unsegmented sequence data with recur-
rent neural networks. In Cohen, W. W. and Moore,
A. W., editors, Machine Learning, Proceedings of the
Twenty-Third International Conference (ICML 2006),
Pittsburgh, Pennsylvania, USA, June 25-29, 2006,
volume 148 of ACM International Conference Pro-
ceeding Series, pages 369–376. ACM.
Gupta, A., Vedaldi, A., and Zisserman, A. (2016). Syn-
thetic Data for Text Localisation in Natural Images. In
2016 IEEE Conference on Computer Vision and Pat-
tern Recognition, CVPR 2016, Las Vegas, NV, USA,
June 27-30, 2016, pages 2315–2324. IEEE Computer
Society.
He, K., Gkioxari, G., Doll
´
ar, P., and Girshick, R. B. (2020).
Mask R-CNN. IEEE Trans. Pattern Anal. Mach. In-
tell., 42(2):386–397.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep Resid-
ual Learning for Image Recognition. In 2016 IEEE
Conference on Computer Vision and Pattern Recog-
nition, CVPR 2016, Las Vegas, NV, USA, June 27-30,
2016, pages 770–778. IEEE Computer Society.
Jaderberg, M., Simonyan, K., Vedaldi, A., and Zisserman,
A. (2014). Synthetic Data and Artificial Neural Net-
works for Natural Scene Text Recognition. CoRR,
abs/1406.2227.
Kingma, D. P. and Ba, J. (2015). Adam: A Method for
Stochastic Optimization. In Bengio, Y. and LeCun,
Y., editors, 3rd International Conference on Learn-
ing Representations, ICLR 2015, San Diego, CA, USA,
May 7-9, 2015, Conference Track Proceedings.
Lee, C. and Osindero, S. (2016). Recursive Recurrent Nets
with Attention Modeling for OCR in the Wild. In
2016 IEEE Conference on Computer Vision and Pat-
tern Recognition, CVPR 2016, Las Vegas, NV, USA,
June 27-30, 2016, pages 2231–2239. IEEE Computer
Society.
Lee, J., Park, S., Baek, J., Oh, S. J., Kim, S., and Lee, H.
(2019). On Recognizing Texts of Arbitrary Shapes
with 2D Self-Attention. CoRR, abs/1910.04396.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D.,
Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov,
V. (2019). RoBERTa: A Robustly Optimized BERT
Pretraining Approach. CoRR, abs/1907.11692.
Llad
´
os, J., Lumbreras, F., Chapaprieta, V., and Queralt, J.
(2001). ICAR: Identity Card Automatic Reader. In 6th
International Conference on Document Analysis and
Recognition (ICDAR 2001), 10-13 September 2001,
Seattle, WA, USA, pages 470–475. IEEE Computer
Society.
Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, L., Shazeer,
N., Ku, A., and Tran, D. (2018). Image Transformer.
In Dy, J. G. and Krause, A., editors, Proceedings of the
35th International Conference on Machine Learning,
ICML 2018, Stockholmsm
¨
assan, Stockholm, Sweden,
July 10-15, 2018, volume 80 of Proceedings of Ma-
chine Learning Research, pages 4052–4061. PMLR.
Ren, S., He, K., Girshick, R. B., and Sun, J. (2017). Faster
R-CNN: Towards Real-Time Object Detection with
Region Proposal Networks. IEEE Trans. Pattern Anal.
Mach. Intell., 39(6):1137–1149.
Shi, B., Bai, X., and Yao, C. (2017). An End-to-End
Trainable Neural Network for Image-Based Sequence
Recognition and Its Application to Scene Text Recog-
nition. IEEE Trans. Pattern Anal. Mach. Intell.,
39(11):2298–2304.
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., and Bai, X.
(2019). ASTER: An Attentional Scene Text Recog-
nizer with Flexible Rectification. IEEE Trans. Pattern
Anal. Mach. Intell., 41(9):2035–2048.
Simonyan, K. and Zisserman, A. (2015). Very Deep Con-
volutional Networks for Large-Scale Image Recogni-
tion. In Bengio, Y. and LeCun, Y., editors, 3rd In-
ternational Conference on Learning Representations,
ICLR 2015, San Diego, CA, USA, May 7-9, 2015,
Conference Track Proceedings.
Smith, L. N. (2017). Cyclical Learning Rates for Training
Neural Networks. In 2017 IEEE Winter Conference on
Applications of Computer Vision, WACV 2017, Santa
Rosa, CA, USA, March 24-31, 2017, pages 464–472.
IEEE Computer Society.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L., and Polosukhin, I.
(2017). Attention is All you Need. In Guyon, I., von
Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R.,
A Two Step Fine-tuning Approach for Text Recognition on Identity Documents
843