4 CONCLUSIONS AND
PERSPECTIVES
This paper presents a new two-stage system for scene
text detection. It applies in the first stage the RT-LoG
operator. This operator supports the full pipeline for
scene text detection. A dedicated algorithm is applied
to group the keypoints into RoIs using the RT-LoG
features. The RoIs are then classified into text/non
text regions by a CNN. Before the verification, the
RoIs are normalized with the RT-LoG features. This
normalization relaxes the verification from invariant
problems. The proposed system is in the top 5 for the
F-measure score and achieves the strongest recall of
the literature. It performs as one of the highest FPS
rate while processing with a difference of two orders
of magnitude in term of processing resources.
As a main perspective, the precision rate of the
system can be consolidated. This can be obtained
by making the RT-LoG operator contrast-invariant, to
deal with the missed detection cases. Deeper CNN
models could be applied to make more robust at the
text verification stage. At last, acceleration of the RT-
LoG operator could be obtained by taking advantage
of the Gaussian kernel distribution and decomposition
with box filtering.
REFERENCES
Busta, M., Neumann, L., and Matas, J. (2015). Fastext: Ef-
ficient unconstrained scene text detector. In Interna-
tional Conference on Computer Vision (ICCV), pages
1206–1214.
Cabaret, L. and Lacassagne, L. (2017). Distanceless label
propagation: an efficient direct connected component
labeling algorithm for gpus. In International Confer-
ence on Image Processing Theory, Tools and Applica-
tions (IPTA), pages 1–6.
Cho, H., Sung, M., and Jun, B. (2016). Canny text detec-
tor: Fast and robust scene text localization algorithm.
In International Conference on Computer Vision and
Pattern Recognition (CVPR), pages 3566–3573.
Dan, G., Khan, M., and Fodor, V. (2015). Characterization
of surf and brisk interest point distribution for dis-
tributed feature extraction in visual sensor networks.
volume 17, pages 591–602.
Dong, W., Lian, Z., Tang, Y., and Xiao, J. (2015). Text de-
tection in natural images using localized stroke width
transform. In International Conference on Multimedia
Modeling (ICMM), pages 49–58.
Fernndez-Carams, C., Moreno, V., and Curto, B. (2014). A
real-time door detection system for domestic robotic
navigation. Journal of Intelligent Robotic Systems,
76(1):119–136.
Fragoso, V., Srivastava, G., Nagar, A., and Li, Z. (2014).
Cascade of box (cabox) filters for optimal scale
space approximation. In International Conference on
Computer Vision and Pattern Recognition Workshops
(CVPRW), pages 126–131.
Ghosh, M., Mukherjee, H., Obaidullah, S., Santosh, K.,
Das, N., and Roy, K. (2019). Identifying the pres-
ence of graphical texts in scene images using cnn. In
International Conference on Document Analysis and
Recognition Workshops (ICDARW), volume 1, pages
86–91.
Gomez, L. and Karatzas, D. (2014). Mser-based real-time
text detection and tracking. In International Con-
ference on Pattern Recognition (ICPR), pages 3110–
3115.
He, P., Huang, W., He, T., and Zhu, Q. (2017). Single shot
text detector with regional attention. In International
Conference on Computer Vision (ICCV), pages 3047–
3055.
He, W., Zhang, X., Yin, F., and Liu, C. (2018). Multi-
oriented and multi-lingual scene text detection with
direct regression. Transactions on Image Processing
(TIP), 27(11):5406–5419.
Huang, Z., Zhong, Z., Sun, L., and Huo, Q. (2019). Mask
r-cnn with pyramid attention network for scene text
detection. In Conference on Applications of Computer
Vision (WACV), pages 764–772.
Karatzas, D. and Gomez-Bigorda, L. (2015). Icdar 2015
competition on robust reading. In International Con-
ference on Document Analysis and Recognition (IC-
DAR), pages 1156–1160.
Liu, J., Liu, X., Sheng, J., and Liang, D. (2019). Pyramid
mask text detector. arXiv preprint:1903.11800.
Liu, X., Liang, D., S.Yan, and Chen, D. (2018). Fots: Fast
oriented text spotting with a unified network. In Con-
ference on computer vision and pattern recognition
(CVPR), pages 5676–5685.
Liu, Y., Zhang, D., Zhang, Y., and Lin, S. (2014). Real-time
scene text detection based on stroke model. In Inter-
national Conference on Pattern Recognition (CVPR,
pages 3116–3120.
Long, S., He, X., and Ya, C. (2018). Scene text
detection and recognition: The deep learning era.
arXiv:1811.04256.
Lyu, P., Liao, M., .Yao, C., and Wu, W. (2018a). Mask
textspotter: An end-to-end trainable neural network
for spotting text with arbitrary shapes. In The Euro-
pean Conference on Computer Vision (ECCV).
Lyu, P., Yao, C., and W. Wu, S. Y. (2018b). Multi-oriented
scene text detection via corner localization and region
segmentation. In Conference on Computer Vision and
Pattern Recognition (CVPR), pages 7553–7563.
Mao, J., Li, H., Zhou, W., Yan, S., and Tian, Q. (2013).
Scale based region growing for scene text detection. In
International conference on Multimedia (ACM), pages
1007–1016.
Nayef, N., Patel, Y., Busta, M., and .Chowdhury, P. (2019).
Icdar2019 robust reading challenge on multi-lingual
scene text detection and recognition–rrc-mlt-2019.
arXiv preprint:1907.00945.
VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications
244