been proposed for example in (Shapiro and Gluhchev,
2004; Ko and Kim, 2003; Lee et al., 2004) or (Rah-
man et al., 2003). A different approach, used also
in the proposed method is to perform both operations
at once, thus treating the text not as a sequence of
individual characters, but rather as one line of text
that is processed as a whole. A method for printed
text recognition using this approach is described in
(LeCun et al., 1998; Savchynskyy and Kamotskyy,
2006).
In our approach, the classifier is defined by a text
structure (i.e. a grammar and a layout) and by a vector
of parameters, representing the optimal appearance
of individual characters. The method is based on a
linear classifier. The classifier parameters are opti-
mized according to a training set of examples using
the structured support vector machine (SVM) learn-
ing method.
The next section describes how the text structure
is modeled and sections 3 and 4 are focused on the
learning and classification task. Finally we present
experiments that were performed on car license plate
and ADR/RID images.
2 TEXT STRUCTURE
MODELLING
A digitized greyscale image I is a H ×W matrix, ele-
ments of which are the intensity values of correspond-
ing pixels. A segment of width ω ∈ N of the image I is
a submatrix of I formed by ω succesive columns. The
left border λ of the segment is the index of the left-
most column of the segment (the lowest index). Each
segment can be fully described by a pair (λ, ω) and
also each pair (λ, ω) ∈ N
2
, λ + ω ≤ W + 1 defines a
segment in I. We will denote a segment of I with left
border λ and width ω as I[λ, ω]. The element in the
i-th row and j-th column of I will be denoted I
ij
and
I
ij
[λ, ω] for the segment I[λ, ω].
It is assumed that the input image depicts the text
in a horizontal position and that the top and bottom
edge of the image coincides with the top and bottom
of the text. Neither the left-to-right position nor the
width of the text is known, these are considered as
unknown parameters, which makes it possible to cope
with an imprecise detection of the left and right text
border.
The text structure is described by a geometric
model. A model µ is given by a sequence of seg-
ments the text contains. Each segment is described
by its left border λ (i.e. its position), width ω and a
type identifier. The type identifier is a subset of al-
phabet A containing all characters possibly appearing
in a given segment. Thus the model has the form of
a 3× N
µ
table, where N
µ
is the number of segments.
Figure 2 shows a typical model of a license plate text.
The spaces between characters are modelled by a se-
quence of special space segments of width equal to
one. On the other hand, the rest of the image that is
not covered by the model is omitted.
Figure 2: Typical structured text model. N stands for a type
identifier denoting numbers, L means letters and empty nar-
row segments contain only space characters.
Because the width of the text in the image is un-
known and the width of the model is fixed, it is nec-
essary to find the ratio between the two. This ratio is
called scale. The next unknown parameter is the left-
to-right position of the text which is described by the
index of its leftmost column called offset.
The combination of a model µ, scale s and offset
o determines the geometry of the text and the model
also defines all possible strings. The string will be de-
noted Σ and thus the complete description of a given
image consists of four parameters – µ, s, o and Σ. We
will also denote I
s
the image I that has been resized
by scale s to dimensions
1
H × ⌈W · s⌉.
3 STRUCTURED SVM
Classification is a process that assigns a state from a
given set of all possible states to an object, based on
some observation made on the object. Classifier (or
classification strategy) is a function f : X → Y that
assigns to each observation x ∈ X a state y ∈ Y. Next
let us define a loss function ∆ :Y ×Y → R that assigns
to each pair (y, f(x)) a real number loss expressing
the penalty for classifying x into f(x), while the real
state is y. We assume that ∆(y, y
′
) = 0 if y = y
′
and
∆(y, y
′
) > 0 if y 6= y
′
.
The structured support vector machine learning
method is based on finding an optimal classification
strategy that minimizes the empirical risk defined as
R
emp
( f) =
1
m
m
∑
k=1
∆(y
k
, f(x
k
)), (1)
1
The ⌈·⌉ denotes the nearest integer towards infinity –
ceiling.
RECOGNITION OF TEXTWITH KNOWN GEOMETRIC AND GRAMMATICAL STRUCTURE
195