1.1 Related Work
A number of methods have previously been
proposed for identifying document image skew
angles. A survey was reported by Hull (1998) and an
extended reference is made by O. Okun et al. (1999).
The main methods proposed in the literature may
be categorized into the following groups: methods
based on projection profile analysis, methods based
on nearest-neighbor clustering, methods based on
Hough transform, methods based on cross-
correlation and methods based on morphological
transform.
The projection profile is a histogram of text
index pixels or representative (fiducial) points of
characters such as centers of connected components
bounding boxes, for example, along a given
direction. The points are projected in multiple
directions and the variation in the obtained
projection profile is calculated for each direction.
The angle corresponding to the maximum variation
is the desired skew.
The Hough transform is another popular
technique for skew detection. This transform is often
applied to a number of representative points of
characters such as the lowermost pixels or centres of
gravity. Each representative point (x,y) is mapped
from the Cartesian space to the points (ρ,θ) in the
Hough space by forming a set of lines coming
through (x,y) with a slope θ and distance ρ from the
origin. The skew corresponds to the angle associated
with a peak in the Hough space.
Most of the above methods have their inherent
weakness, because most of them actually are tailor-
made algorithms that are applicable to a particular
document layout. As a result, some of them may fail
to estimate skew angles of documents containing
complicated layouts with multiple font styles and
sizes, arbitrary text orientation and script, or high
proportion of non-text regions such as graphics and
tables. Moreover, they estimate a single or the
dominant skew angle of the document and fail to
recognize multiple skew angles.
Messelodi S. and Modena (1999) showed that
projection profiles in combination with a clustering
procedure based on simple heuristics may overcome
the problem of the limited angle range. Although
this method can detect multiple skew and small
interline spacing, it was only tested on small-sized
(512x512 pixels) images of book covers containing a
few text lines. Gatos et al. (1997), uses an interline
cross-correlation for two or more vertical lines
located at a fixed distance d for skew estimation.
The cross-correlation function is computed for an
entire image to obtain the documents skew angle.
This can, however, be time consuming and the
presence of graphics degrades the accuracy. Y. Lu,
and C. L. Tan (2003) folowed a nearest-neighbor
chain based approach developing a skew estimation
method with a high accuracy and with language-
independent capability. Their approach detects only
a dominant skew for the document.
2 PROPOSED TECHNIQUE
The proposed technique aims in correcting the skew
in documents that contain several areas with text
bend in different slopes and be robust enough to
handle a great variety of printed documents,
including book and magazine covers, spreadsheets
among with regularly layout documents in any
language. This assumes that the text is supposed to
be correctly oriented in either vertical or horizontal
alignment. It is also assumed that the document can
contain from several down to a single text area
slopes.
To achieve this goal a bottom-up approach is
applied which is better suited to the specific
problem. First, the document is prepossessed to
identify the text colour index. Then, the connected
components of the document are retrieved using a
simple serial labelling algorithm and their bounded
rectangles are constructed. A filtering procedure is
applied to the connected components to discard non-
text connected elements according to their
geometrical characteristics and indicate the
candidate characters. These candidate characters are
grouped using a nearest neighbour approach to form
words. The words are grouped, based on a rough
slope calculation, to form lines of text. Using linear
regression on the edge pixels of the connected
components bounding rectangles, a set of straight
lines is estimated for each text line representing its
top and bottom boundaries. The text lines in near
locations with similar skew angles are grown to
form text areas and their slope is defined according
to the slope of the text line boundary lines. The
connected components that have been filtered or
failed to construct words, included in a text area are
supposed to be part of that text area. Finally, each
text area is rotated to a horizontal or vertical plane
taking measures to avoid the possibility of
overlapping.
The result of the technique is a single binary
image ready to be processed by the layout analysis
module of an OCR system. Next, analysis of the
main stages of the proposed technique is given.
VISAPP 2007 - International Conference on Computer Vision Theory and Applications
86