Natural Scene Character Recognition Without Dependency on Specific Features

Muhammad Ali, Hassan Foroosh

Abstract

Current methods in scene character recognition heavily rely on discriminative power of local features, such as HoG, SIFT, Shape Contexts (SC), Geometric Blur (GB), etc. One of the problems with this approach is that the local features are rasterized in an ad hoc manner into a single vector perturbing thus spatial correlations that carry crucial information. To eliminate this feature dependency and associated problems, we propose a holistic solution as follows: For each character to be recognized, we stack a set of training images to form a 3-mode tensor. Each training tensor is then decomposed into a linear superposition of ‘k’ rank-1 matrices, whereby the rank-1 matrices form a basis, spanning solution subspace of the character class. For a test image to be classified, we obtain projections onto the pre-computed rank-1 bases of each class, and recognize it as the class for which inner-product of mixing vectors is maximized. We use challenging natural scene character datasets, namely Chars74K, ICDAR2003, and SVT-CHAR. We achieve results better than several baseline methods based on local features (e.g. HoG) and show leave-random-one-out-cross validation yield even better recognition performance, justifying thus our intuition of the importance of feature-independency and preservation of spatial correlations in recognition.

References

  1. Chen, X., and Yuille, A., 2004. Detecting and reading text in natural scenes. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. IEEE 2004. Vol. 2. pp. II-366.
  2. Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., Wu, D., and Ng, A., 2011. Text detection and character recognition in scene images with unsupervised feature learning. In International Conference on Document Analysis and Recognition (ICDAR), 2011. IEEE 2011, pp. 440-445.
  3. Dalal, N., and Triggs, B., 2005. Histograms of oriented gradients for human detection. In International Conference on Computer Vision and Pattern Recognition (CVPR) 2005. IEEE 2005, pp.886-893.
  4. de Campos, T.E., Babu, B. R., and Varma, M., 2009. Character recognition in natural images. In VISAPP (2), 2009, pp. 273-280.
  5. Donoser, M., Bischof, H., and Wagner, S., 2008. Using web search engines to improve text recognition. In 19th International Conference on Pattern Recognition, ICPR 2008. Vol. no. 14, pp. 8-11.
  6. Hazan, T., Polak, S., and Shashua, A., 2005. Sparse Image Coding using a 3D Non-negative Tensor Factorization. In International Conference on Computer Vision (ICCV), 2005. IEEE 2005. Vol. 1, pp. 50-57.
  7. Field, J., and Learned-Miller, E., 2013. Improving OpenVocabulary Scene Text Recognition. In International Conference on Document Analysis and Recognition (ICDAR) 2013. IEEE 2013, pp. 604-608.
  8. Kita, K., and Wakahara, T., 2010. Binarization of color characters in scene images using k-means clustering and support vector machines. In International Conference on Pattern Recognition (ICPR), 2010. IEEE 2010, pp. 3183-3186.
  9. Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., and Young, R., 2003. ICDAR 2003 robust reading competitions. In Proceedings of the Seventh International Conference on Document Analysis and Recognition 2003. IEEE 2003. Vol. 2, pp. 682-687.
  10. Mishra, A., Alahari, K., and Jawahar, C., 2012. Top-down and bottom-up cues for scene text recognition. In Computer Vision and Pattern Recognition (CVPR), 2012. IEEE 2012, pp. 2687-2694.
  11. Mishra, A., Alahari, K., and Jawahar, C., 2011. An MRF model for binarization of natural scene text. In International Conference on Document Analysis and Recognition (ICDAR), 2011. IEEE 2011, pp. 11-16.
  12. Nagy, R., Dicker, A., and Meyer-Wegener, K., 2011. NEOCR: A Configurable Dataset for Natural Image Text Recognition. In CBDAR Workshop, ICDAR 2011, pp. 53-58.
  13. Neumann, L., and Matas, J., 2011. A method for text localization and recognition in real-world images. In Computer Vision-ACCV 2010, pp. 770-783.
  14. Niblack, W., 1985. An introduction to digital image processing. Strandberg Publishing Company.
  15. Otsu, N., 1979. A Threshold Selection Method from GrayLevel Histogram. In Trans. System, Man and Cybernetics. IEEE 1979. Vol.9, pp.62-69.
  16. Shashua, A., and Levin, A., 2001. Linear Image Coding for Regression and Classification using the Tensor-rank Principle. In International Conference on Computer Vision and Pattern Recognition (CVPR), 2001. IEEE, 2001. Vol. 1, pp. I-42 - I-49.
  17. Sun, C., Junejo, I. N., and Foroosh, H., 2011. Action Recognition using Rank-1 Approximation of Joint SelfSimilarity Volume. In International Conference on Computer Vision (ICCV) 2011, pp. 1007-1012.
  18. Wang, T., Wu, D., Coates, A., and Ng, A., 2012. End-toEnd Text Recognition with Convolutional Neural Networks. In International Conference on Pattern Recognition (ICPR), 2012. IEEE 2012, pp. 330.
  19. Wang, K., Babenko, B., and Belongie, S., 2011. End-to-end scene text recognition. In International Conference Computer Vision (ICCV), 2011. IEEE 2011, pp. 1457- 1464.
  20. Wang, K., and Belongie, S., 2010. Word spotting in the wild. In Computer Vision-ECCV 2010, pp. 591-604.
  21. Weinman, J., Learned-Miller, E., and Hanson, A., 2009. Scene text recognition using similarity and a lexicon with sparse belief propagation. In Pattern Analysis and Machine Intelligence TPAMI. IEEE Transactions 2009. Vol. 31, no. 10, pp. 1733-1746.
  22. Xianqian, L., and Sidiropoulos, N.D., 2001. Cramer-Rao lower bounds for low-rank decomposition of multidimensional arrays. In Transactions on Signal Processing. IEEE 2001. Vol. 49, Issue 9, pp. 2074- 2086.
  23. Yokobayashi M., and Wakahara, T., 2005. Segmentation and Recognition of Characters in Scene Images Using Selective Binarization in Color Space and GAT Correlation. In International Conference on Document Analysis and Recognition (ICDAR), 2005. IEEE 2005. pp. 167-171.
Download


Paper Citation


in Harvard Style

Ali M. and Foroosh H. (2015). Natural Scene Character Recognition Without Dependency on Specific Features . In Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015) ISBN 978-989-758-090-1, pages 368-376. DOI: 10.5220/0005305603680376


in Bibtex Style

@conference{visapp15,
author={Muhammad Ali and Hassan Foroosh},
title={Natural Scene Character Recognition Without Dependency on Specific Features},
booktitle={Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015)},
year={2015},
pages={368-376},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005305603680376},
isbn={978-989-758-090-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015)
TI - Natural Scene Character Recognition Without Dependency on Specific Features
SN - 978-989-758-090-1
AU - Ali M.
AU - Foroosh H.
PY - 2015
SP - 368
EP - 376
DO - 10.5220/0005305603680376