6 CONCLUSIONS
We proposed a holistic approach to solve natural
scene character recognition that avoids dependency
on specific features. Our method is based on multi-
image tensor decomposition similar to (Shashua and
Levin, 2001) with modification as to the way we get
rank-1 matrices for natural scene images that contain
a lot of variations and noise. Through our results we
showed the potential of using image tensor
decomposition to better capture shape and font
variations in scene character images. We got better
results than several baseline methods and achieved
improved recognition performance on the datasets
using leave-random-one-out cross-validation,
justifying thus our intuition of the importance of
feature-independency and preservation of spatial
correlations in recognition.
In future we hope to get state-of-the-art
performance using better image segmentation
methods and also plan to incorporate recent advances
in tensor decomposition domain in solving other sub
problems of scene text recognition.
Figure 10: Accuracy vs. Number of training samples of ‘A’
from Chars74K dataset.
REFERENCES
Chen, X., and Yuille, A., 2004. Detecting and reading text
in natural scenes. In Computer Vision and Pattern
Recognition, 2004. CVPR 2004. IEEE 2004. Vol. 2. pp.
II–366.
Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh,
B., Wang, T., Wu, D., and Ng, A., 2011. Text detection
and character recognition in scene images with
unsupervised feature learning. In International
Conference on Document Analysis and Recognition
(ICDAR), 2011. IEEE 2011, pp. 440–445.
Dalal, N., and Triggs, B., 2005. Histograms of oriented
gradients for human detection. In International
Conference on Computer Vision and Pattern
Recognition (CVPR) 2005. IEEE 2005, pp.886-893.
de Campos, T.E., Babu, B. R., and Varma, M., 2009.
Character recognition in natural images. In VISAPP (2),
2009, pp. 273–280.
Donoser, M., Bischof, H., and Wagner, S., 2008. Using web
search engines to improve text recognition. In 19th
International Conference on Pattern Recognition,
ICPR 2008. Vol. no. 14, pp. 8-11.
Hazan, T., Polak, S., and Shashua, A., 2005. Sparse Image
Coding using a 3D Non-negative Tensor Factorization.
In International Conference on Computer Vision
(ICCV), 2005. IEEE 2005. Vol. 1, pp. 50–57.
Field, J., and Learned-Miller, E., 2013. Improving Open-
Vocabulary Scene Text Recognition. In International
Conference on Document Analysis and Recognition
(ICDAR) 2013. IEEE 2013, pp. 604–608.
Kita, K., and Wakahara, T., 2010. Binarization of color
characters in scene images using k-means clustering
and support vector machines. In International
Conference on Pattern Recognition (ICPR), 2010.
IEEE 2010, pp. 3183–3186.
Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S.,
and Young, R., 2003. ICDAR 2003 robust reading
competitions. In Proceedings of the Seventh
International Conference on Document Analysis and
Recognition 2003. IEEE 2003. Vol. 2, pp. 682–687.
Mishra, A., Alahari, K., and Jawahar, C., 2012. Top-down
and bottom-up cues for scene text recognition. In
Computer Vision and Pattern Recognition (CVPR),
2012. IEEE 2012, pp. 2687–2694.
Mishra, A., Alahari, K., and Jawahar, C., 2011. An MRF
model for binarization of natural scene text. In
International Conference on Document Analysis and
Recognition (ICDAR), 2011. IEEE 2011, pp. 11–16.
Nagy, R., Dicker, A., and Meyer-Wegener, K., 2011.
NEOCR: A Configurable Dataset for Natural Image
Text Recognition. In CBDAR Workshop, ICDAR 2011,
pp. 53-58.
Neumann, L., and Matas, J., 2011. A method for text
localization and recognition in real-world images. In
Computer Vision–ACCV 2010, pp. 770–783.
Niblack, W., 1985. An introduction to digital image
processing. Strandberg Publishing Company.
Otsu, N., 1979. A Threshold Selection Method from Gray-
Level Histogram. In Trans. System, Man and
Cybernetics. IEEE 1979. Vol.9, pp.62-69.
Shashua, A., and Levin, A., 2001. Linear Image Coding for
Regression and Classification using the Tensor-rank
Principle. In International Conference on Computer
Vision and Pattern Recognition (CVPR), 2001. IEEE,
2001. Vol. 1, pp. I-42 - I-49.
Sun, C., Junejo, I. N., and Foroosh, H., 2011. Action
Recognition using Rank-1 Approximation of Joint Self-
Similarity Volume. In International Conference on
Computer Vision (ICCV) 2011, pp. 1007-1012.
NaturalSceneCharacterRecognitionWithoutDependencyonSpecificFeatures
375