to detect text-lines on a new domain of receipts. We
blur characters in some bboxes because of sensitive
information. We think fine-tuning with a small dataset
may help the model work better on new domains of
different languages, backgrounds, and styles.
5 CONCLUSIONS
In this paper, we propose newly constructing and
training strategies for a text-detection model based on
the Faster R-CNN architecture. We focus on three
important factors that influence the accuracy of text-
detection models. Firstly, we propose an anchor box
determining method by clustering the IoU of assumed
anchor boxes and bboxes. Secondly, we implement
Squeeze-and-Excitation blocks (SE blocks) and
ResNeXt blocks to create a very deep feature
extraction network so that the model using this
network outperforms the model using the ResNet 152
network, which has more trainable parameters.
Finally, we train the text detection network with
artificially skewed text-lines, then they can predict
angles of skewed and upward/downward curved text-
lines. We use the predicted angles to revise bboxes
before applying the Non-Maximum Suppression
algorithm, so that the model can detect skewed and
upward/downward curved text-lines.
The model achieves a high accuracy of text-line
detection, so we can integrate it with our text-line
recognition model to create an automatically text-
image recognizing system for receipt, invoice, and
form images. Our approach is also flexible to apply
for other datasets of complex backgrounds, different
styles, and languages.
REFERENCES
Jeonghun, Baek, Geewook Kim, Junyeop Lee, Sungrae
Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh,
and Hwalsuk Lee. “What is wrong with scene text
recognition model comparisons? dataset and model
analysis.” In Proceedings of the IEEE International
Conference on Computer Vision, pp. 4715-4723. 2019.
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun.
“Faster r-cnn: Towards real-time object detection with
region proposal networks.” In Advances in neural
information processing systems, pp. 91-99. 2015.
Bodla, Navaneeth, Bharat Singh, Rama Chellappa, and
Larry S. Davis. “Soft-NMS--improving object
detection with one line of code.” In Proceedings of the
IEEE international conference on computer vision, pp.
5561-5569. 2017.
Dinh, Viet Cuong, Seong Soo Chun, Seungwook Cha,
Hanjin Ryu, and Sanghoon Sull. “An efficient method
for text detection in video based on stroke width
similarity.” In Asian conference on computer vision,
pp. 200-209. Springer, Berlin, Heidelberg, 2007.
Epshtein, Boris, Eyal Ofek, and Yonatan Wexler.
“Detecting text in natural scenes with stroke width
transform.” In 2010 IEEE Computer Society
Conference on Computer Vision and Pattern
Recognition, pp. 2963-2970. IEEE, 2010.
Huang, Weilin, Yu Qiao, and Xiaoou Tang. “Robust scene
text detection with convolution neural network induced
mser trees.” In European conference on computer
vision, pp. 497-511. Springer, Cham, 2014.
Zhou Xinyu, Cong Yao, He Wen, Yuzhi Wang, Shuchang
Zhou, Weiran He, and Jiajun Liang. “East: an efficient
and accurate scene text detector.” In Proceedings of the
IEEE conference on Computer Vision and Pattern
Recognition, pp. 5551-5560. 2017.
Zhong, Zhuoyao, Lei Sun, and Qiang Huo. “An anchor-free
region proposal network for Faster R-CNN-based text
detection approaches.” International Journal on
Document Analysis and Recognition (IJDAR) 22, no. 3
(2019): 315-327.
Wenhao, He, Xu-Yao Zhang, Fei Yin, and Cheng-Lin Liu.
“Deep direct regression for multi-oriented scene text
detection.” In Proceedings of the IEEE International
Conference on Computer Vision, pp. 745-753. 2017.
Huang, Zheng, Kai Chen, Jianhua He, Xiang Bai,
Dimosthenis Karatzas, Shijian Lu, and C. V. Jawahar.
“ICDAR 2019 competition on scanned receipt OCR
and information extraction.” In 2019 International
Conference on Document Analysis and Recognition
(ICDAR), pp. 1516-1520. IEEE, 2019.
Kaiming, He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
“Deep residual learning for image recognition.” In
Proceedings of the IEEE conference on computer vision
and pattern recognition, pp. 770-778. 2016.
Xie, Saining, Ross Girshick, Piotr Dollár, Zhuowen Tu, and
Kaiming He. “Aggregated residual transformations for
deep neural networks.” In Proceedings of the IEEE
conference on computer vision and pattern recognition,
pp. 1492-1500. 2017.
Hu, Jie, Li Shen, and Gang Sun. “Squeeze-and-excitation
networks.” In Proceedings of the IEEE conference on
computer vision and pattern recognition, pp. 7132-
7141. 2018.
Szegedy, Christian, Sergey Ioffe, Vincent Vanhoucke, and
Alex Alemi. “Inception-v4, inception-resnet and the
impact of residual connections on learning.” arXiv
preprint arXiv:1602.07261 (2016).
Simonyan, Karen, and Andrew Zisserman. “Very deep
convolutional networks for large-scale image
recognition.” arXiv preprint arXiv:1409.1556 (2014).