total word formation cost, which is lower for a com-
bined word than individual words. In Figure 6(f), the
word “Oxfam” was missed due to the resemblance of
the letter “f” to the letter “t”.
Computing Hough votes and searching for words
can require significant computational power. How-
ever, due to the nature of Hough forests and the local
word search operation in the image, the whole oper-
ation is highly parallelizable both in the training and
testing stages.
6 CONCLUSIONS
We present a new method for text detection and recog-
nition in natural images. We introduce cross-scale bi-
nary features and show that using these features im-
proves the recognition performance. We train Hough
forests using images generated by our realistic charac-
ter generator code. We recognize the words in natural
images using these features and our word-formation
cost function. We test our algorithm on two avail-
able datasets. In individual character recognition, we
show that our algorithm has a better recognition per-
formance and can operate at same performance using
fewer training samples. In cropped word recognition,
we exceed the recognition performance of the most
recent algorithm by 4%.
(a) australia (b) sports, centre, wivenhoe, partk,
conference, center, car, parks
(c) closeout, final, reductions,
closeout
(d) yamaha
(e) famous, COLOUR, fist, chips (f) OXFAM,bookshop
Figure 6: Some results of our algorithm on ICDAR 2003
dataset images (correctly and incorrectly recognized words
are written in small and capital letters, respectively).
REFERENCES
Ballard, D. H. (1981). Pattern Recognition, 13(2):111–122.
Chen, X. and Yuille, A. (2004). Detecting and reading text
in natural scenes. In Proc. of the IEEE Conference on
Computer Vision and Pattern Recognition, volume 2,
pages 366–373.
de Campos, T. E., Babu, B. R., and Varma, M. (2009). Char-
acter recognition in natural images. In Proc. of the
International Conference on Computer Vision Theory
and Applications, pages 273–280.
Ezaki, N., Bulacu, M., and Schomaker, L. (2004). Text de-
tection from natural scene images: towards a system
for visually impaired persons. In Proc. of the Interna-
tional Conference on Pattern Recognition, volume 2,
pages 683–686.
Felzenszwalb, P. F. and Huttenlocher, D. P. (2005). Pic-
torial structures for object recognition. International
Journal of Computer Vision, 61(1):55–79.
Gall, J., Yao, A., Razavi, N., Gool, L. V., and Lempitsky, V.
(2011). Hough forests for object detection, tracking,
and action recognition. IEEE Transactions on Pat-
tern Analysis and Machine Intelligence, 33(11):2188–
2202.
Kim, K. I., Jung, K., and Kim, J. H. (2003). Texture-based
approach for text detection in images using support
vector machines and continuously adaptive mean shift
algorithm. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 25(12):1631 – 1639.
Lucas, S., Panaretos, A., Sosa, L., Tang, A., Wong, S., and
Young, R. (2003). ICDAR 2003 robust reading com-
petitions. In Proc. of the International Conference on
Document Analysis and Recognition, pages 682–687.
Mishra, A., Alahari, K., and Jawahar, C. V. (2012). Top-
down and bottom-up cues for scene text recognition.
In Proc. of the IEEE Conference on Computer Vision
and Pattern Recognition, pages 2687–2694.
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., and
Ng, A. Y. (2011). Reading digits in natural images
with unsupervised feature learning. In NIPS Workshop
on Deep Learning and Unsupervised Feature Learn-
ing.
Neumann, L. and Matas, J. (2011). A method for text lo-
calization and recognition in real-world images. In
Proc. of the Asian Conference on Computer Vision,
volume 3, pages 770–783.
Newell, A. J. and Griffin, L. D. (2011). Multiscale his-
togram of oriented gradient descriptors for robust
character recognition. In Proc. of the International
Conference on Document Analysis and Recognition,
pages 1085–1089.
Razavi, N., Gall, J., and Gool, L. J. V. (2011). Scalable
multi-class object detection. In Proc. of IEEE Con-
ference on Computer Vision and Pattern Recognition,
pages 1505–1512.
Wang, K., Babenko, B., and Belongie, S. (2011). End-to-
end scene text recognition. In Proc. of the Interna-
tional Conference on Computer Vision, pages 1457–
1464.
TextRecognitioninNaturalImagesusingMulticlassHoughForests
741