Text Recognition in Natural Images using Multiclass Hough Forests

Gökhan Yildirim, Radhakrishna Achanta, Sabine Süsstrunk

Abstract

Text detection and recognition in natural images are popular yet unsolved problems in computer vision. In this paper, we propose a technique that attempts to detect and recognize text in a unified manner by searching for words directly without reducing the image into text regions or individual characters. We present three contributions. First, we modify an object detection framework called Hough Forests (Gall et al., 2011) by introducing “Cross-Scale Binary Features” that compares the information between the same image patch at different scales. We use this modified technique to produce likelihood maps for every text character. Second, our word-formation cost function and computed likelihood maps are used to detect and recognize the text in natural images. We test our technique with the Street View House Numbers (Netzer et al., 2011) and the ICDAR 2003† (Lucas et al., 2003) datasets. For the SVHN dataset, our algorithm outperforms recent methods and has comparable performance using fewer training samples. We also exceed the state-of-the-art word recognition performance for ICDAR 2003 dataset by 4%. Our final contribution is a realistic dataset generation code for text characters.

References

  1. Ballard, D. H. (1981). Pattern Recognition, 13(2):111-122.
  2. Chen, X. and Yuille, A. (2004). Detecting and reading text in natural scenes. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages 366-373.
  3. de Campos, T. E., Babu, B. R., and Varma, M. (2009). Character recognition in natural images. In Proc. of the International Conference on Computer Vision Theory and Applications, pages 273-280.
  4. Ezaki, N., Bulacu, M., and Schomaker, L. (2004). Text detection from natural scene images: towards a system for visually impaired persons. In Proc. of the International Conference on Pattern Recognition, volume 2, pages 683-686.
  5. Felzenszwalb, P. F. and Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1):55-79.
  6. Gall, J., Yao, A., Razavi, N., Gool, L. V., and Lempitsky, V. (2011). Hough forests for object detection, tracking, and action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(11):2188- 2202.
  7. Kim, K. I., Jung, K., and Kim, J. H. (2003). Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12):1631 - 1639.
  8. Lucas, S., Panaretos, A., Sosa, L., Tang, A., Wong, S., and Young, R. (2003). ICDAR 2003 robust reading competitions. In Proc. of the International Conference on Document Analysis and Recognition, pages 682-687.
  9. Mishra, A., Alahari, K., and Jawahar, C. V. (2012). Topdown and bottom-up cues for scene text recognition. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2687-2694.
  10. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., and Ng, A. Y. (2011). Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning.
  11. Neumann, L. and Matas, J. (2011). A method for text localization and recognition in real-world images. In Proc. of the Asian Conference on Computer Vision, volume 3, pages 770-783.
  12. Newell, A. J. and Griffin, L. D. (2011). Multiscale histogram of oriented gradient descriptors for robust character recognition. In Proc. of the International Conference on Document Analysis and Recognition, pages 1085-1089.
  13. Razavi, N., Gall, J., and Gool, L. J. V. (2011). Scalable multi-class object detection. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 1505-1512.
  14. Wang, K., Babenko, B., and Belongie, S. (2011). End-toend scene text recognition. In Proc. of the International Conference on Computer Vision, pages 1457- 1464.
Download


Paper Citation


in Harvard Style

Yildirim G., Achanta R. and Süsstrunk S. (2013). Text Recognition in Natural Images using Multiclass Hough Forests . In Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013) ISBN 978-989-8565-47-1, pages 737-741. DOI: 10.5220/0004197407370741


in Bibtex Style

@conference{visapp13,
author={Gökhan Yildirim and Radhakrishna Achanta and Sabine Süsstrunk},
title={Text Recognition in Natural Images using Multiclass Hough Forests},
booktitle={Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)},
year={2013},
pages={737-741},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004197407370741},
isbn={978-989-8565-47-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)
TI - Text Recognition in Natural Images using Multiclass Hough Forests
SN - 978-989-8565-47-1
AU - Yildirim G.
AU - Achanta R.
AU - Süsstrunk S.
PY - 2013
SP - 737
EP - 741
DO - 10.5220/0004197407370741