memory consumption. The challenge is to construct
reliable small codebooks based on a large represen-
tative data set of SIFT KPs. Third, there are many
parameters assigned during the model creation, e.g.,
number of clusters, matched distance ratio threshold
and number of POIs that are not absolutely deter-
mined yet. These optimal values need further experi-
ments. Fourth, in the current modeling, the scale and
orientation of KPs are ignored. It is possible that use-
ful information is lost in this manner. Future work
will address this issue.
Finally, the accuracy rates of approximately 70
- 80% are satisfactory for the character detection in
scene images, but it is relatively low when compare
to machine-printed paper text images at 98.29%. That
may be because the PKPs are created from differ-
ent images. If images quality was improved in the
pre-processing, the performance would be increased.
Eventually, this efficient model could improve detect-
ing and recognizing texts more precisely in scene im-
ages.
7 CONCLUSION
This paper has presented a SIFT-based modeling of
character objects for scene-text detection and recogni-
tion. The construction of models (attentional patches)
from natural scenes has been described. The evalu-
ation of character recognition and a preliminary test
for text detection shows our proposed model is usable
for scene-text detection and recognition purposes. For
future work, an algorithm to increase text detection
performance is necessary. On the basis of the current
framework, there is a potential both in the improve-
ment of the feature scheme for recognition but also
for the development of, e.g., NN classifiers that use
the current framework as a textuality detector.
REFERENCES
Bay, H., Ess, A., Tuytelaars, T., and Van Gool, L. (2008).
Speeded-up robust features (surf). Comput. Vis. Image
Underst., 110(3):346–359.
Borji, A., Sihite, D. N., and Itti, L. (2012). Salient ob-
ject detection: A benchmark. In Proceedings of the
12th European Conference on Computer Vision - Vol-
ume Part II, ECCV’12, pages 414–429, Berlin, Hei-
delberg. Springer-Verlag.
Burt, P. and Adelson, E. (1983). The laplacian pyramid as a
compact image code. Communications, IEEE Trans-
actions on, 31(4):532–540.
Chen, X., Yang, J., Zhang, J., and Waibel, A. (2004). Au-
tomatic detection and recognition of signs from natu-
ral scenes. Image Processing, IEEE Transactions on,
13(1):87–99.
de Campos, T. E., Babu, B. R., and Varma, M. (2009). Char-
acter recognition in natural images. In Proceedings
of the International Conference on Computer Vision
Theory and Applications, Lisbon, Portugal.
Epshtein, B., Ofek, E., and Wexler, Y. (2010). Detect-
ing text in natural scenes with stroke width transform.
In Computer Vision and Pattern Recognition (CVPR),
2010 IEEE Conference on, pages 2963–2970.
Fan, L., Fan, L., and Tan, C. L. (2001). Binarizing docu-
ment image using coplanar prefilter. In 6th Interna-
tional Conference Proceedings on Document Analysis
and Recognition, pages 34–38.
Forgy, E. W. (1965). Cluster analysis of multivariate data:
efficiency versus interpretability of classifications bio-
metrics. Biometrics, 21:768–769.
Koo, H. I. and Kim, D. H. (2013). Scene text detec-
tion via connected component clustering and nontext
filtering. Image Processing, IEEE Transactions on,
22(6):2296–2305.
Li, C., Ding, X., and Wu, Y. (2001). Automatic text lo-
cation in natural scene images. In Document Analy-
sis and Recognition, 2001. Proceedings. Sixth Inter-
national Conference on, pages 1069–1073.
Lowe, D. G. (2004). Distinctive image features from scale-
invariant keypoints. Int. J. Comput. Vision, 60(2):91–
110.
Lucas, S. and et al. (2005). Icdar 2003 robust reading com-
petitions: entries, results, and future directions. Inter-
national Journal of Document Analysis and Recogni-
tion (IJDAR), 7(2-3):105–122.
Morel, J.-M. and Yu, G. (2009). Asift: A new framework
for fully affine invariant image comparison. SIAM J.
Img. Sci., 2(2):438–469.
Ojala, T., Pietikainen, M., and Maenpaa, T. (2002). Mul-
tiresolution gray-scale and rotation invariant texture
classification with local binary patterns. Pattern Anal-
ysis and Machine Intelligence, IEEE Transactions on,
24(7):971–987.
Park, J., Yoon, H., and Lee, G. (2007). Automatic seg-
mentation of natural scene images based on chro-
matic and achromatic components. In Proceedings
of the 3rd International Conference on Computer Vi-
sion/Computer Graphics Collaboration Techniques,
MIRAGE’07, pages 482–493, Berlin, Heidelberg.
Springer-Verlag.
Smolka, B. and et al. (2002). Self-adaptive algorithm of
impulsive noise reduction in color images. Pattern
Recognition, 35(8):1771–1784.
Yi, C. and Tian, Y. (2011). Text detection in natural scene
images by stroke gabor words. In Document Analysis
and Recognition (ICDAR), 2011 International Confer-
ence on, pages 177–181.
Zhang, M. and et al. (2009). Ocrdroid: A framework to dig-
itize text using mobile phones. In International Con-
ference on Mobile Computing, Applications, and Ser-
vices (MOBICASE), pages 273–292. Springer-Verlag
New York.
ObjectAttentionPatchesforTextDetectionandRecognitioninSceneImagesusingSIFT
311