cross attention. In 2019 International Conference on
Document Analysis and Recognition (ICDAR), pages
274–280. IEEE.
Jaderberg, M., Simonyan, K., Vedaldi, A., and Zisserman,
A. (2016). Reading text in the wild with convolutional
neural networks. International Journal of Computer
Vision, 116(1):1–20.
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S.,
Bagdanov, A., Iwamura, M., Matas, J., Neumann, L.,
Chandrasekhar, V. R., Lu, S., et al. (2015). Icdar 2015
competition on robust reading. In 2015 13th Interna-
tional Conference on Document Analysis and Recog-
nition (ICDAR), pages 1156–1160. IEEE.
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Big-
orda, L. G., Mestre, S. R., Mas, J., Mota, D. F., Al-
mazan, J. A., and De Las Heras, L. P. (2013). Ic-
dar 2013 robust reading competition. In 2013 12th
International Conference on Document Analysis and
Recognition, pages 1484–1493. IEEE.
Kendall, A., Gal, Y., and Cipolla, R. (2018). Multi-task
learning using uncertainty to weigh losses for scene
geometry and semantics. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recogni-
tion, pages 7482–7491.
Lee, C.-Y. and Osindero, S. (2016). Recursive recurrent
nets with attention modeling for ocr in the wild. In
Proceedings of the IEEE Conference on Computer Vi-
sion and Pattern Recognition, pages 2231–2239.
Liao, M., Zhang, J., Wan, Z., Xie, F., Liang, J., Lyu, P.,
Yao, C., and Bai, X. (2019). Scene text recognition
from two-dimensional perspective. In Proceedings of
the AAAI Conference on Artificial Intelligence, vol-
ume 33, pages 8714–8721.
Liu, W., Chen, C., and Wong, K.-Y. K. (2018). Char-net:
A character-aware neural network for distorted scene
text recognition. In Thirty-Second AAAI Conference
on Artificial Intelligence.
Luo, C., Jin, L., and Sun, Z. (2019). Moran: A multi-object
rectified attention network for scene text recognition.
Pattern Recognition, 90:109–118.
Lyu, P., Liao, M., Yao, C., Wu, W., and Bai, X. (2018).
Mask textspotter: An end-to-end trainable neural net-
work for spotting text with arbitrary shapes. In Pro-
ceedings of the European Conference on Computer Vi-
sion (ECCV), pages 67–83.
Mishra, A., Alahari, K., and Jawahar, C. (2012). Scene text
recognition using higher order language priors.
Neumann, L. and Matas, J. (2010). A method for text lo-
calization and recognition in real-world images. In
Asian Conference on Computer Vision, pages 770–
783. Springer.
Otsu, N. (1979). A threshold selection method from gray-
level histograms. IEEE Transactions on Systems,
Man, and Cybernetics, 9(1):62–66.
Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., and Wang, W.
(2020). Seed: Semantics enhanced encoder-decoder
framework for scene text recognition. In Proceedings
of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pages 13528–13537.
Quy Phan, T., Shivakumara, P., Tian, S., and Lim Tan, C.
(2013). Recognizing text with perspective distortion
in natural scenes. In Proceedings of the IEEE Inter-
national Conference on Computer Vision, pages 569–
576.
Risnumawan, A., Shivakumara, P., Chan, C. S., and Tan,
C. L. (2014). A robust arbitrary text detection system
for natural scene images. Expert Systems with Appli-
cations, 41(18):8027–8048.
Shi, B., Bai, X., and Yao, C. (2016a). An end-to-end train-
able neural network for image-based sequence recog-
nition and its application to scene text recognition.
IEEE transactions on pattern analysis and machine
intelligence, 39(11):2298–2304.
Shi, B., Wang, X., Lyu, P., Yao, C., and Bai, X. (2016b).
Robust scene text recognition with automatic recti-
fication. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 4168–
4176.
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., and Bai, X.
(2018). Aster: An attentional scene text recognizer
with flexible rectification. IEEE transactions on pat-
tern analysis and machine intelligence.
Su, B. and Lu, S. (2014). Accurate scene text recognition
based on recurrent neural network. In Asian Confer-
ence on Computer Vision, pages 35–48. Springer.
Sun, K., Zhao, Y., Jiang, B., Cheng, T., Xiao, B., Liu, D.,
Mu, Y., Wang, X., Liu, W., and Wang, J. (2019). High-
resolution representations for labeling pixels and re-
gions. arXiv preprint arXiv:1904.04514.
Veit, A., Matera, T., Neumann, L., Matas, J., and Belongie,
S. (2016). Coco-text: Dataset and benchmark for text
detection and recognition in natural images. arXiv
preprint arXiv:1601.07140.
Wang, K., Babenko, B., and Belongie, S. (2011). End-to-
end scene text recognition. In 2011 International Con-
ference on Computer Vision, pages 1457–1464. IEEE.
Wang, K. and Belongie, S. (2010). Word spotting in the
wild. In European Conference on Computer Vision,
pages 591–604. Springer.
Wang, T., Wu, D. J., Coates, A., and Ng, A. Y. (2012). End-
to-end text recognition with convolutional neural net-
works. In Proceedings of the 21st International Con-
ference on Pattern Recognition (ICPR2012), pages
3304–3308. IEEE.
Wang, T., Zhu, Y., Jin, L., Luo, C., Chen, X., Wu, Y., Wang,
Q., and Cai, M. (2020). Decoupled attention network
for text recognition. In AAAI, pages 12216–12224.
Yang, M., Guan, Y., Liao, M., He, X., Bian, K., Bai, S.,
Yao, C., and Bai, X. (2019). Symmetry-constrained
rectification network for scene text recognition. In
Proceedings of the IEEE International Conference on
Computer Vision, pages 9147–9156.
Yang, X., He, D., Zhou, Z., Kifer, D., and Giles, C. L.
(2017). Learning to read irregular text with attention
mechanisms. In IJCAI, volume 1, page 3.
Yao, C., Bai, X., and Liu, W. (2014). A unified frame-
work for multioriented text detection and recog-
nition. IEEE Transactions on Image Processing,
23(11):4737–4749.
SCAN: Sequence-character Aware Network for Text Recognition
609