One of the significant advantages of the proposed
line extraction method is its flexibility. Our scheme
is independent of font, size, style and orientation of
the text lines. As we mentioned earlier, our assump-
tion is that, distance between two lines of a docu-
ment is greater than inter-character distance of the
words. But sometimes distance between two words
of two different text lines is very small and hence our
method generates errors in some of these cases. An-
other drawback of our method is that it will not work
if the characters are broken and that broken part can-
not be joined through preprocessing. Here, neighbor-
hood component selection will not be proper. So, di-
rection from water reservoir concept cannot give the
candidate region properly and errors occur. Also, our
proposed method may not work properly if there are
many joining characters in a string.
For recognition, the dataset has been tested using
cross validation technique. For this purpose, we di-
vided the dataset into 5 parts. We trained our sys-
tem on 4 parts of the divided dataset and tested on
remaining part of the data. From the dataset, we have
obtained 96.54% (95.78%) recognition accuracy us-
ing circular (convex hull) based feature of dimension
256. Recognition accuracy obtained from circular and
convex hull features with their different dimension
are given in Table2. From the experiment we noted
that better accuracy can be achieved combining circu-
lar and convex hull features. Combining circular and
convex hull features of 256 dimension each we got
512 dimension feature. Using this 512 dimensional
combined feature we achieved 96.73% accuracy from
our SVM classifier in this dataset. From the experi-
ment we also noticed that better results were obtained
in case of bigger font-size characters.
Table 2: Character recognition result.
Feature Type
Feature Dimension
32 128 256
circular ring 90.54 96.01 96.54
convex hull 82.77 93.76 95.78
In Fig.8(b) we have shown the detected text lines
and the recognition result of corresponding text char-
acters of Fig.8(a). Here, all the text lines have
been extracted correctly though there are some words
in curvi-linear text lines, for e.g. “ATLANTIC
OCEAN”. The recognition result is very encourag-
ing. Sometimes, due to “joining characters” and over-
lapping lines, the recognition of few characters are
not correct. For e.g. in the word “Tagus”, the join-
ing character “gu” is mis-recognized as ‘a’. From
Fig.8(b), it may be noted there are some small graph-
ical borders which were not eliminated due to our CC
analysis and hence we got erroneous result. We also
noticed that most of the errors occurred due to simi-
lar shape structures. We noted that highest error oc-
curred from the character pair ‘K’ and ‘k’, ‘f’ and ‘t’
and ‘t’ and ‘l’ pair. This is because of their shape sim-
ilarity. Other errors occurred mainly from noisy data
where residue from convex hull has not been extracted
properly. This wrong residue detection sometimes in-
fluences error. In comparison, we have checked that,
Adam et al. (Adam et al., 2000) received 95.74% ac-
curacy on real English characters, whereas our system
performs better with 96.73%.
6 CONCLUSIONS
In this paper we proposed a complete system for
graphical documents. Here, we separated text from
graphical components and extracted the correspond-
ing text lines. The multi-oriented text characters are
recognized using convex hull information. From the
experiment, we have obtained encouraging result.
ACKNOWLEDGEMENTS
This work has been partially supported by the Spanish
projects TIN2006-15694-C02-02 and CONSOLID-
ERINGENIO 2010 (CSD2007-00018).
REFERENCES
Adam, S., Ogier, J. M., Carlon, C., Mullot, R., Labiche, J.,
and Gardes, J. (2000). Symbol and character recogni-
tion: application to engineering drawing. In Interna-
tional Journal on Document Analysis and Recognition
(IJDAR).
Ahmed, M. and Ward, R. (2002). A rotation invariant rule-
based thinning algorithm for character recognition. In
IEEE Transactions on Pattern Analysis and Machine
Intelligence.
Cao, R. and Tan, C. (2001). Text/graphics separation in
maps. In Proc. of International Workshop on Graphics
Recognition (GREC).
Fletcher, L. A. and Kasturi, R. (1988). A robust algorithm
for text string separation from mixed text/graphics im-
ages. In IEEE Transactions on Pattern Analysis and
Machine Intelligence.
Goto, H. and Aso, H. (1999). Extracting curved lines using
local linearity of the text line. In International Journal
on Document Analysis and Recognition (IJDAR).
Hase, H., Shinokawa, T., Yoneda, M., and Suen, C. Y.
(2003). Recognition of rotated characters by eigen-
space. In International Conference on Document
Analysis and Recognition (ICDAR).
A COMPLETE SYSTEM FOR DETECTION AND RECOGNITION OF TEXT IN GRAPHICAL DOCUMENTS USING
BACKGROUND INFORMATION
215