for multi-oriented scene text detection. In ICPR 2018,
pages 3604–3609.
Deng, D., Liu, H., Li, X., and Cai, D. (2018). Pixellink: De-
tecting scene text via instance segmentation. In AAAI
2018, pages 6773–6780.
Epshtein, B., Ofek, E., and Wexler, Y. (2010). Detecting
text in natural scenes with stroke width transform. In
CVPR 2010, pages 2963–2970.
Harizi, R., Walha, R., and Drira, F. (2022a). Deep-learning
based end-to-end system for text reading in the wild.
Multim. Tools Appl., 81(17):24691–24719.
Harizi, R., Walha, R., Drira, F., and Zaied, M. (2022b).
Convolutional neural network with joint stepwise
character/word modeling based system for scene text
recognition. Multim. Tools Appl., 81(3):3091–3106.
He, K., Gkioxari, G., Doll
´
ar, P., and Girshick, R. B. (2017).
Mask R-CNN. In ICCV 2017, pages 2980–2988.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In CVPR 2016,
pages 770–778.
He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., and Sun, C.
(2018). An end-to-end textspotter with explicit align-
ment and attention. CVPR 2018, pages 5020–5029.
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Big-
orda, L. G., Mestre, S. R., Mas, J., Mota, D., Almaz
´
an,
J., and de las Heras, L. (2013). ICDAR 2013 robust
reading competition. In ICDAR, pages 1484–1493.
Liao, M., Lyu, P., He, M., Yao, C., Wu, W., and Bai, X.
(2021). Mask textspotter: An end-to-end trainable
neural network for spotting text with arbitrary shapes.
IEEE Trans. Pattern Anal. Mach. Intell., 43(2):532–
548.
Liao, M., Shi, B., and Bai, X. (2018a). Textboxes++: A
single-shot oriented scene text detector. IEEE Trans-
actions on Image Processing, 27:3676–3690.
Liao, M., Shi, B., Bai, X., Wang, X., and Liu, W. (2017).
Textboxes: A fast text detector with a single deep neu-
ral network. In AAAI 2017, pages 4161–4167.
Liao, M., Song, B., Long, S., He, M., Yao, C., and Bai,
X. (2020). Synthtext3d: synthesizing scene text im-
ages from 3d virtual worlds. Sci. China Inf. Sci.,
63(2):120105.
Liao, M., Zhu, Z., Shi, B., Xia, G., and Bai, X. (2018b).
Rotation-sensitive regression for oriented scene text
detection. In CVPR 2018, pages 5909–5918.
Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., and Yan,
J. (2018a). Fots: Fast oriented text spotting with a
unified network. CVPR 2018, pages 5676–5685.
Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., and
Chen, H. (2022). Abcnet v2: Adaptive bezier-curve
network for real-time end-to-end text spotting. IEEE
Trans. Pattern Anal. Mach. Intell., 44(11):8048–8064.
Liu, Z., Shen, Q., and Wang, C. (2018b). Text detection in
natural scene images with text line construction. In
ICICSP 2018, pages 59–63.
Long, S., Ruan, J., Zhang, W., He, X., Wu, W., and Yao,
C. (2018). Textsnake: A flexible representation for
detecting text of arbitrary shapes. In ECCV 2018, Part
II, pages 19–35.
Long, S. and Yao, C. (2020). Unrealtext: Synthesizing real-
istic scene text images from the unreal world. CoRR,
abs/2003.10608.
Mallek, A., Drira, F., Walha, R., Alimi, A. M., and Lebour-
geois, F. (2017). Deep learning with sparse prior - ap-
plication to text detection in the wild. In VISIGRAPP
- Volume 5: VISAPP 2017, pages 243–250.
Metzenthin, E., Bartz, C., and Meinel, C. (2022). Weakly
supervised scene text detection using deep reinforce-
ment learning. CoRR, abs/2201.04866.
Naiemi, F., Ghods, V., and Khalesi, H. (2021). A novel
pipeline framework for multi oriented scene text im-
age detection and recognition. Expert Syst. Appl.,
170:114549.
Piriyothinkul, B., Pasupa, K., and Sugimoto, M. (2019).
Detecting text in manga using stroke width transform.
In KST 2019, pages 142–147.
Redmon, J. and Farhadi, A. (2017). YOLO9000: better,
faster, stronger. In CVPR 2017, pages 6517–6525.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster
r-cnn: Towards real-time object detection with region
proposal networks. In Advances in Neural Informa-
tion Processing Systems, volume 28.
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., and Bai, X.
(2019). Aster: An attentional scene text recognizer
with flexible rectification. PAMI, 41:2035–2048.
Tian, Z., Huang, W., He, T., He, P., and Qiao, Y. (2016).
Detecting text in natural image with connectionist text
proposal network. In ECCV, Part VIII, pages 56–72.
Walha, R., Drira, F., Lebourgeois, F., Garcia, C., and Alimi,
A. M. (2013). Single textual image super-resolution
using multiple learned dictionaries based sparse cod-
ing. In ICIAP 2013, Part II, volume 8157, pages 439–
448.
Walha, R., Drira, F., Lebourgeois, F., Garcia, C., and Alimi,
A. M. (2015). Joint denoising and magnification of
noisy low-resolution textual images. In ICDAR 2015,
pages 871–875.
Wang, K. and Belongie, S. (2010). Word spotting in the
wild. In ECCV 2010, pages 591–604.
Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., and
Shao, S. (2019a). Shape robust text detection with
progressive scale expansion network. In CVPR 2019,
pages 9336–9345.
Wang, W., Xie, E., Song, X., Zang, Y., Wang, W., Lu, T.,
Yu, G., and Shen, C. (2019b). Efficient and accurate
arbitrary-shaped text detection with pixel aggregation
network. In ICCV 2019, pages 8439–8448.
Xing, L., Tian, Z., Huang, W., and Scott, M. (2019). Con-
volutional character networks. In ICCV, pages 9125–
9135.
Yu, W., Liu, Y., Hua, W., Jiang, D., Ren, B., and Bai, X.
(2023). Turning a CLIP model into a scene text detec-
tor. In CVPR 2023, pages 6978–6988.
Zhang, Z., Shen, W., Yao, C., and Bai, X. (2015).
Symmetry-based text line detection in natural scenes.
In CVPR 2015, pages 2558–2567.
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W.,
and Liang, J. (2017). East: An efficient and accurate
scene text detector. In CVPR 2017, pages 2642–2651.
Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., and Zhang,
W. (2021). Fourier contour embedding for arbitrary-
shaped text detection. In CVPR 2021, pages 3123–
3131.
SIFT-ResNet Synergy for Accurate Scene Word Detection in Complex Scenarios
987