extension of their previous work (Zhang, 2001) and
used a new resolution free restoration system with a
simpler connected component analysis. The
technique was free from various resolution
parameters although it makes a significant
assumption of negligible skewing. Estimation of
robust text lines was done in (Kakamanu, 2006)
using the cue that the text lies on the surface of the
page and are straight. Their approach assumed
simple background which can be easily separated
without considering shadowing at spines. Cao et al
(Cao, 2003) assumed a cylindrical model that can be
fitted over the warped document. They estimated the
bending extent of the surface by extracting the
horizontal baselines and then fitted a curve over the
warped text. This assumption is not valid for all
warped document, especially those with skew. A
similar technique was proposed without any
consideration of cylindrical model by Ezaki et al
(Ezaki, 2005) by fitting a model in order to estimate
the warp of each text line by fitting an elastic curve
to the text line. Another approach was based on
segmentation (Gatos, 2005) which detects all the
words using image smoothing. Then the lines were
identified using connected component labelling.
In general most of the techniques described
above neglect skewing which is often present in the
warped document and may be present independent
of any warping. Also, most of the earlier approaches
ignore the presence of shadowing. Thus, a number
of techniques proposed by earlier workers will not
work properly in the presence of skewing. On the
other hand, a number of approaches have been
proposed considering skewing as an individual
problem without taking into account the problem of
warping. Dhandra and his colleagues in their work
(Dhandra, 2006) removed the skewing using image
dilation and region labelling techniques. They
calculated the average of all the angles at which each
labelled region was tilted and then found the
resultant skew angle. Another approach, based on
least square method and saw tooth algorithm, was
used to calculate the skew angle by Yu et al (Yu,
1995). Another method proposed by Ballard et al
(Ballard, 1982) uses 2D Fourier Transform to
estimate the skew angle. It was computationally easy
and appropriate for uniformly distributed text.
However, in the presence of warping the degree of
alignment changes continuously as one moves over
the warped line because of which the FFT method
fails.This non-uniformity present across the text
forced us to find a technique that should work on
local distribution of the text. So we have opted for
affine transformations as our approach for the
removal of warping.
We have formulated our approach using as few
assumptions as possible. Moreover, our technique
handles possible shadowing, skewing and warping
of the text image. We follow the approach of vertical
projection in order to remove skewing after dilation
of the shadow free document. Deskewing is
followed by dewarping using local affine
transformation of the segmented words after finding
the angle at which each word can be considered to
be warped. In the following section we describe our
approach. In the third section we present the results
of the experiments that were performed in order to
test our approach. Some directions for future
improvements are presented in the concluding
section.
2 SEQUENTIAL RESTORATION
OF DISTORTED DOCUMENT
The scanned document of a thick volume consists of
warping along with skewing and shadowing at the
spines. The steps can be listed as:
a) Deshadowing in order to get the text region which
we call as foreground and background separately.
b) Deskewing of the binary document after dilating
the image uniformly.
c) Removal of warping from the deskewed
document using local affine transformation.
The significant factor of our work is the
consideration of a binary image that contains less
information as compared to grey scale or colour.
Without loss of generality, we will take white as
background and black as foreground. In the
following we describe each part of the process.
2.1 Deshadowing
Shadowing is a photometric effect which is
independent of other two distortions. This paper
proposes a filtering technique for deshadowing
which will eliminate the noise spread that is present
mainly near the spine as shown in Figure 2a. An
examination of Figure 2a shows that pixel density
can be used as a feature for distinguishing
foreground from background. In all cases that we
examined, the pixel density of the text area is
significantly larger than all other regions, including
the region under the shadow.
VISAPP 2009 - International Conference on Computer Vision Theory and Applications
74