PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR
System. arXiv:2109.03144 [cs].
Du, Y., Li, C., Guo, R., Yin, X., Liu, W., Zhou, J., Bai,
Y., Yu, Z., Yang, Y., Dang, Q., and Wang, H. (2020).
PP-OCR: A Practical Ultra Lightweight OCR System.
arXiv:2009.09941 [cs].
Flores, M., Valiente, D., Peidr
´
o, A., Reinoso, O., and Pay
´
a,
L. (2024). Generating a full spherical view by mod-
eling the relation between two fisheye images. The
Visual Computer.
Graves, A., Fern
´
andez, S., Gomez, F., and Schmidhu-
ber, J. (2006). Connectionist temporal classification:
labelling unsegmented sequence data with recurrent
neural networks. In Proceedings of the 23rd inter-
national conference on Machine learning - ICML ’06,
pages 369–376, Pittsburgh, Pennsylvania. ACM Press.
Gupta, N. and Jalal, A. S. (2022). Traditional to trans-
fer learning progression on scene text detection and
recognition: a survey. Artificial Intelligence Review,
55(4):3457–3502.
He, K., Gkioxari, G., Doll
´
ar, P., and Girshick, R. (2017).
Mask R-CNN. In 2017 IEEE International Confer-
ence on Computer Vision (ICCV), pages 2980–2988.
ISSN: 2380-7504.
Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J. M.,
Mattern, F., Mitchell, J. C., Naor, M., Nierstrasz, O.,
Pandu Rangan, C., Steffen, B., Sudan, M., Terzopou-
los, D., Tygar, D., Vardi, M. Y., Weikum, G., Wang,
K., and Belongie, S. (2010). Word Spotting in the
Wild. In Daniilidis, K., Maragos, P., and Paragios, N.,
editors, Computer Vision – ECCV 2010, volume 6311,
pages 591–604. Springer Berlin Heidelberg, Berlin,
Heidelberg.
JaidedAI (2020). EasyOCR.
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S.,
Bagdanov, A., Iwamura, M., Matas, J., Neumann, L.,
Chandrasekhar, V. R., Lu, S., Shafait, F., Uchida, S.,
and Valveny, E. (2015). ICDAR 2015 competition
on Robust Reading. In 2015 13th International Con-
ference on Document Analysis and Recognition (IC-
DAR), pages 1156–1160, Tunis, Tunisia. IEEE.
Kuang, Z., Sun, H., Li, Z., Yue, X., Lin, T. H., Chen, J., Wei,
H., Zhu, Y., Gao, T., Zhang, W., Chen, K., Zhang,
W., and Lin, D. (2021). MMOCR: A Comprehensive
Toolbox for Text Detection, Recognition and Under-
standing. In Proceedings of the 29th ACM Interna-
tional Conference on Multimedia, pages 3791–3794,
Virtual Event China. ACM.
Li, C., Liu, W., Guo, R., Yin, X., Jiang, K., Du, Y., Du, Y.,
Zhu, L., Lai, B., Hu, X., Yu, D., and Ma, Y. (2022).
PP-OCRv3: More Attempts for the Improvement of
Ultra Lightweight OCR System. arXiv:2206.03001
[cs].
Li, H., Wang, P., Shen, C., and Zhang, G. (2019). Show,
Attend and Read: A Simple and Strong Baseline for
Irregular Text Recognition. Proceedings of the AAAI
Conference on Artificial Intelligence, 33(01):8610–
8617.
Lin, H., Yang, P., and Zhang, F. (2020). Review of Scene
Text Detection and Recognition. Archives of Compu-
tational Methods in Engineering, 27(2):433–454.
Long, S., He, X., and Yao, C. (2021). Scene Text Detection
and Recognition: The Deep Learning Era. Interna-
tional Journal of Computer Vision, 129(1):161–184.
Long, S., Ruan, J., Zhang, W., He, X., Wu, W., and Yao,
C. (2018). TextSnake: A Flexible Representation for
Detecting Text of Arbitrary Shapes. pages 20–36.
Naosekpam, V. and Sahu, N. (2022). Text detection, recog-
nition, and script identification in natural scene im-
ages: a Review. International Journal of Multimedia
Information Retrieval, 11(3):291–314.
Phan, T. Q., Shivakumara, P., Tian, S., and Tan, C. L.
(2013). Recognizing Text with Perspective Distortion
in Natural Scenes. In 2013 IEEE International Con-
ference on Computer Vision, pages 569–576. ISSN:
2380-7504.
Shi, B., Bai, X., and Yao, C. (2017). An End-to-End
Trainable Neural Network for Image-Based Sequence
Recognition and Its Application to Scene Text Recog-
nition. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 39(11):2298–2304.
Veit, A., Matera, T., Neumann, L., Matas, J., and Belongie,
S. (2016). COCO-Text: Dataset and Benchmark for
Text Detection and Recognition in Natural Images.
arXiv:1601.07140 [cs].
Yamanaka, Y., Kayukawa, S., Takagi, H., Nagaoka, Y.,
Hiratsuka, Y., and Kurihara, S. (2022). One-shot
wayfinding method for blind people via ocr and ar-
row analysis with a 360-degree smartphone cam-
era. Lecture Notes of the Institute for Computer
Sciences, Social-Informatics and Telecommunications
Engineering, LNICST, 419 LNICST:150–168.
Yang, L., Li, L., Xin, X., Sun, Y., Song, Q., and Wang, W.
(2023). Large-Scale Person Detection and Localiza-
tion using Overhead Fisheye Cameras.
Yue, X., Kuang, Z., Lin, C., Sun, H., and Zhang, W. (2020).
RobustScanner: Dynamically Enhancing Positional
Clues for Robust Text Recognition. In Computer Vi-
sion – ECCV 2020: 16th European Conference, Glas-
gow, UK, August 23–28, 2020, Proceedings, Part XIX,
pages 135–151, Berlin, Heidelberg. Springer-Verlag.
Yuliang, L., Lianwen, J., Shuaitao, Z., and Sheng, Z.
(2017). Detecting Curve Text in the Wild: New
Dataset and New Solution. arXiv:1712.02170 [cs].
Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., and
Zhang, W. (2021). Fourier Contour Embedding for
Arbitrary-Shaped Text Detection. In 2021 IEEE/CVF
Conference on Computer Vision and Pattern Recogni-
tion (CVPR), pages 3122–3130, Nashville, TN, USA.
IEEE.
ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics
140