Cai, J., Zeng, H., Yong, H., Cao, Z., and Zhang, L. (2019).
Toward real-world single image super-resolution: A
new benchmark and a new model. In ICCV, pages
3086–3095.
Dalal, N. and Triggs, B. (2005). Histograms of oriented
gradients for human detection. In CVPR, pages 886–
893.
Dong, C., Loy, C. C., He, K., and Tang, X. (2016). Image
super-resolution using deep convolutional networks.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 38:295–307.
Dugar, P., Bhat, R. S., Tarsode, A. S., Dutta, U., Banerjee,
K., Chatterjee, A., and Agneeswaran, V. S. (2021).
From pixels to words: A scalable journey of text in-
formation from product images to retail catalog. In
CIKM, pages 3787–3795.
Feng, W., He, W., Yin, F., Zhang, X.-Y., and Liu, C.-L.
(2019). Textdragon: An end-to-end framework for ar-
bitrary shaped text spotting. In 2019 IEEE/CVF In-
ternational Conference on Computer Vision (ICCV),
pages 9075–9084.
Hor
´
e, A. and Ziou, D. (2010). Image quality metrics: Psnr
vs. ssim. In 2010 20th International Conference on
Pattern Recognition, pages 2366–2369.
Hui, Z., Gao, X., Yang, Y., and Wang, X. (2019).
Lightweight image super-resolution with information
multi-distillation network. In MM, pages 2024–2032.
Jaderberg, M., Simonyan, K., Vedaldi, A., and Zisserman,
A. (2015a). Reading text in the wild with convolu-
tional neural networks. International Journal of Com-
puter Vision, 116:1–20.
Jaderberg, M., Simonyan, K., Zisserman, A., and
Kavukcuoglu, K. (2015b). Spatial transformer net-
works. In NIPS, pages 2017–2025.
Jaderberg, M., Vedaldi, A., and Zisserman, A. (2014). Deep
features for text spotting. In ECCV, pages 512–528.
Johnson, J., Alahi, A., and Fei-Fei, L. (2016). Perceptual
losses for real-time style transfer and super-resolution.
In ECCV, pages 694–711.
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S.,
Bagdanov, A., Iwamura, M., Matas, J., Neumann, L.,
Chandrasekhar, V. R., Lu, S., et al. (2015). Icdar 2015
competition on robust reading. In 2015 13th Interna-
tional Conference on Document Analysis and Recog-
nition (ICDAR), pages 1156–1160. IEEE.
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda,
L. G. i., Mestre, S. R., Mas, J., Mota, D. F., Almaz
`
an,
J. A., and de las Heras, L. P. (2013). Icdar 2013 robust
reading competition. In ICDAR, pages 1484–1493.
Kim, J., Lee, J. K., and Lee, K. M. (2016). Accurate image
super-resolution using very deep convolutional net-
works. In CVPR, pages 1646–1654.
Ledig, C., Theis, L., Husz
´
ar, F., Caballero, J., Cunning-
ham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J.,
Wang, Z., and Shi, W. (2017). Photo-realistic single
image super-resolution using a generative adversarial
network. In CVPR, pages 105–114.
Liu, Z., Li, Y., Ren, F., Goh, W., and Yu, H. (2018).
Squeezedtext: A real-time scene text recognition by
binary convolutional encoder-decoder network. In
AAAI, pages 7194–7201.
Luo, C., Jin, L., and Sun, Z. (2019). Moran: A multi-object
rectified attention network for scene text recognition.
Pattern Recognition, 90:109–118.
Marzal, A. and Vidal, E. (1993). Computation of normal-
ized edit distance and applications. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence,
15(9):926–932.
Shi, B., Bai, X., and Yao, C. (2017). An end-to-end train-
able neural network for image-based sequence recog-
nition and its application to scene text recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 39(11):2298–
2304.
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., and Bai, X.
(2019). Aster: An attentional scene text recognizer
with flexible rectification. IEEE Trans. Pattern Anal.
Mach. Intell., 41(9):2035–2048.
Staff, W. (2019a). Walmart health. Accessed: 2021-17-09.
Staff, W. (2019b). Walmart’s new intelligent retail lab
shows a glimpse into the future of retail, irl. Accessed:
2021-17-09.
Staff, W. (2020). Walmart and gatik go driverless in
arkansas and expand self-driving car pilot to a second
location. Accessed: 2021-17-09.
Staff, W. (2021a). Walmart invests in cruise, the all-electric
self-driving company. Accessed: 2021-17-09.
Staff, W. (2021b). Walmart unveils all-in-one associate
app, me@walmart, and gives 740,000 associates a
new samsung smartphone. Accessed: 2021-17-09.
Wang, K., Babenko, B., and Belongie, S. (2011). End-to-
end scene text recognition. In 2011 International Con-
ference on Computer Vision, pages 1457–1464.
Wang, W., Xie, E., Liu, X., Wang, W., Liang, D., Shen, C.,
and Bai, X. (2020). Scene text image super-resolution
in the wild. In ECCV, pages 650–666.
Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao,
Y., and Loy, C. C. (2018). Esrgan: Enhanced super-
resolution generative adversarial networks. In The
European Conference on Computer Vision Workshops
(ECCVW).
Wei Liu, Chaofeng Chen, K.-Y. K. W. Z. S. and Han, J.
(2016). Star-net: A spatial attention residue network
for scene text recognition. In BMVC, pages 43.1–
43.13.
Ye, J., Chen, Z., Liu, J., and Du, B. (2020). Textfusenet:
Scene text detection with richer fused features. In IJ-
CAI, pages 516–522.
Zhang, K., Zuo, W., Chen, Y., Meng, D., and Zhang, L.
(2017). Beyond a gaussian denoiser: Residual learn-
ing of deep cnn for image denoising. IEEE Transac-
tions on Image Processing, 26(7):3142–3155.
Zhang, X., Chen, Q., Ng, R., and Koltun, V. (2019). Zoom
to learn, learn to zoom. In CVPR, pages 3762–3770.
Don’t Miss the Fine Print! An Enhanced Framework to Extract Text from Low Resolution Images
671