Object Attention Patches for Text Detection and Recognition in Scene Images using SIFT

Bowornrat Sriman, Lambert Schomaker

Abstract

Natural urban scene images contain many problems for character recognition such as luminance noise, varying font styles or cluttered backgrounds. Detecting and recognizing text in a natural scene is a difficult problem. Several techniques have been proposed to overcome these problems. These are, however, usually based on a bottom-up scheme, which provides a lot of false positives, false negatives and intensive computation. Therefore, an alternative, efficient, character-based expectancy-driven method is needed. This paper presents a modeling approach that is usable for expectancy-driven techniques based on the well-known SIFT algorithm. The produced models (Object Attention Patches) are evaluated in terms of their individual provisory character recognition performance. Subsequently, the trained patch models are used in preliminary experiments on text detection in scene images. The results show that our proposed model-based approach can be applied for a coherent SIFT-based text detection and recognition process.

References

  1. Bay, H., Ess, A., Tuytelaars, T., and Van Gool, L. (2008). Speeded-up robust features (surf). Comput. Vis. Image Underst., 110(3):346-359.
  2. Borji, A., Sihite, D. N., and Itti, L. (2012). Salient object detection: A benchmark. In Proceedings of the 12th European Conference on Computer Vision - Volume Part II, ECCV'12, pages 414-429, Berlin, Heidelberg. Springer-Verlag.
  3. Burt, P. and Adelson, E. (1983). The laplacian pyramid as a compact image code. Communications, IEEE Transactions on, 31(4):532-540.
  4. Chen, X., Yang, J., Zhang, J., and Waibel, A. (2004). Automatic detection and recognition of signs from natural scenes. Image Processing, IEEE Transactions on, 13(1):87-99.
  5. de Campos, T. E., Babu, B. R., and Varma, M. (2009). Character recognition in natural images. In Proceedings of the International Conference on Computer Vision Theory and Applications, Lisbon, Portugal.
  6. Epshtein, B., Ofek, E., and Wexler, Y. (2010). Detecting text in natural scenes with stroke width transform. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2963-2970.
  7. Fan, L., Fan, L., and Tan, C. L. (2001). Binarizing document image using coplanar prefilter. In 6th International Conference Proceedings on Document Analysis and Recognition, pages 34-38.
  8. Forgy, E. W. (1965). Cluster analysis of multivariate data: efficiency versus interpretability of classifications biometrics. Biometrics, 21:768-769.
  9. Koo, H. I. and Kim, D. H. (2013). Scene text detection via connected component clustering and nontext filtering. Image Processing, IEEE Transactions on, 22(6):2296-2305.
  10. Li, C., Ding, X., and Wu, Y. (2001). Automatic text location in natural scene images. In Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on, pages 1069-1073.
  11. Lowe, D. G. (2004). Distinctive image features from scaleinvariant keypoints. Int. J. Comput. Vision, 60(2):91- 110.
  12. Lucas, S. and et al. (2005). Icdar 2003 robust reading competitions: entries, results, and future directions. International Journal of Document Analysis and Recognition (IJDAR), 7(2-3):105-122.
  13. Morel, J.-M. and Yu, G. (2009). Asift: A new framework for fully affine invariant image comparison. SIAM J. Img. Sci., 2(2):438-469.
  14. Ojala, T., Pietikainen, M., and Maenpaa, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(7):971-987.
  15. Park, J., Yoon, H., and Lee, G. (2007). Automatic segmentation of natural scene images based on chromatic and achromatic components. In Proceedings of the 3rd International Conference on Computer Vision/Computer Graphics Collaboration Techniques, MIRAGE'07, pages 482-493, Berlin, Heidelberg. Springer-Verlag.
  16. Smolka, B. and et al. (2002). Self-adaptive algorithm of impulsive noise reduction in color images. Pattern Recognition, 35(8):1771-1784.
  17. Yi, C. and Tian, Y. (2011). Text detection in natural scene images by stroke gabor words. In Document Analysis and Recognition (ICDAR), 2011 International Conference on, pages 177-181.
  18. Zhang, M. and et al. (2009). Ocrdroid: A framework to digitize text using mobile phones. In International Conference on Mobile Computing, Applications, and Services (MOBICASE), pages 273-292. Springer-Verlag New York.
Download


Paper Citation


in Harvard Style

Sriman B. and Schomaker L. (2015). Object Attention Patches for Text Detection and Recognition in Scene Images using SIFT . In Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-076-5, pages 304-311. DOI: 10.5220/0005218603040311


in Bibtex Style

@conference{icpram15,
author={Bowornrat Sriman and Lambert Schomaker},
title={Object Attention Patches for Text Detection and Recognition in Scene Images using SIFT},
booktitle={Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2015},
pages={304-311},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005218603040311},
isbn={978-989-758-076-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Object Attention Patches for Text Detection and Recognition in Scene Images using SIFT
SN - 978-989-758-076-5
AU - Sriman B.
AU - Schomaker L.
PY - 2015
SP - 304
EP - 311
DO - 10.5220/0005218603040311