RECOGNITION OF TEXT WITH KNOWN GEOMETRIC AND

AMMATICAL STRUCTURE

Jan Rathousk´y

Department of Control Engineering, Faculty of Elec. Eng., Czech Technical University in Prague, Czech

Martin Urban

Eyedea Recognition, Prague, Czech Republic

Center for Applied Cybernetics, Faculty of Elec. Eng., Czech Technical University in Prague, Czech

Vojtˇech Franc

Fraunhofer Institut FIRST IDA, Berlin, Germany

Keywords:

Text Recognition, Sructured Support Vector Machines, License Plate Recognition.

Abstract:

The optical character recognition (OCR) module is a fundamental part of each automated text processing sys-

tem. The OCR module translates an input image with a text line into a string of symbols. In many applications

(e.g. license plate recognition) the text has some a priori known geometric and grammatical structure. This

article proposes an OCR method exploiting this knowledge which restricts the set of possible strings to a

limited set of feasible combinations. The recognition task is formulated as maximization of a similarity func-

tion which uses character templates as reference. These templates are estimated by a support vector machine

method from a set of examples. In contrast to the common approach, the proposed method performs character

segmentation and recognition simultaneously. The method was successfully evaluated in a car license plate

recognition system.

1 INTRODUCTION

Recognition of text in images is an important part of

the pattern recognition ﬁeld. Systems for text recog-

nition are generally referred to as OCR (Optical Char-

acter Recognition) systems.

This article presents a method for OCR that makes

use of the fact that many examined texts have a given

structure that can be described by a common model.

In other words, the text yields to some grammar and

layout, determining the number of symbols, their rel-

ative width and position and also the kind of sym-

bols that can appear in each position. The advantage

of this approach is that using the a priori knowledge

about the text structure reduces the number of pos-

sible conﬁgurations, thus improving the success rate

of the method, especially when the input image is in

a bad quality. Typically the method ﬁts for recogni-

tion of short structured texts (see Figure 1) taken in

low resolution and possibly inappropriate light condi-

tions.

The text recognition itself consists of two sub-

(a) license plate

(b) license plate (c) ADR/RID plate

igure 1: Examples of images with short structured texts

with a priori known geometrical and grammatical structure.

tasks – the text segmentation, where areas (segments)

of the image containing single characters are found

and the text recognition, where the characters in in-

dividual segments are determined. The classical ap-

proach is to perform these subtasks separately, which

leads to recognition errors if the segmentation is

done incorrectly. Systems using this approach have

194

Rathouský J., Urban M. and Franc V. (2008).

RECOGNITION OF TEXT WITH KNOWN GEOMETRIC AND GRAMMATICAL STRUCTURE.

In Proceedings of the Third International Conference on Computer Vision Theory and Applications, pages 194-199

DOI: 10.5220/0001086501940199

 SciTePress

been proposed for example in (Shapiro and Gluhchev,

2004; Ko and Kim, 2003; Lee et al., 2004) or (Rah-

man et al., 2003). A different approach, used also

in the proposed method is to perform both operations

at once, thus treating the text not as a sequence of

individual characters, but rather as one line of text

that is processed as a whole. A method for printed

text recognition using this approach is described in

(LeCun et al., 1998; Savchynskyy and Kamotskyy,

2006).

In our approach, the classiﬁer is deﬁned by a text

structure (i.e. a grammar and a layout) and by a vector

of parameters, representing the optimal appearance

of individual characters. The method is based on a

linear classiﬁer. The classiﬁer parameters are opti-

mized according to a training set of examples using

the structured support vector machine (SVM) learn-

ing method.

The next section describes how the text structure

is modeled and sections 3 and 4 are focused on the

learning and classiﬁcation task. Finally we present

experiments that were performed on car license plate

and ADR/RID images.

2 TEXT STRUCTURE

MODELLING

A digitized greyscale image I is a H ×W matrix, ele-

ments of which are the intensity values of correspond-

ing pixels. A segment of width ω ∈ N of the image I is

a submatrix of I formed by ω succesive columns. The

left border λ of the segment is the index of the left-

most column of the segment (the lowest index). Each

segment can be fully described by a pair (λ, ω) and

also each pair (λ, ω) ∈ N

, λ + ω ≤ W + 1 deﬁnes a

segment in I. We will denote a segment of I with left

border λ and width ω as I[λ, ω]. The element in the

i-th row and j-th column of I will be denoted I

and

[λ, ω] for the segment I[λ, ω].

It is assumed that the input image depicts the text

in a horizontal position and that the top and bottom

edge of the image coincides with the top and bottom

of the text. Neither the left-to-right position nor the

width of the text is known, these are considered as

unknown parameters, which makes it possible to cope

with an imprecise detection of the left and right text

border.

The text structure is described by a geometric

model. A model µ is given by a sequence of seg-

ments the text contains. Each segment is described

by its left border λ (i.e. its position), width ω and a

type identiﬁer. The type identiﬁer is a subset of al-

phabet A containing all characters possibly appearing

in a given segment. Thus the model has the form of

a 3× N

table, where N

is the number of segments.

Figure 2 shows a typical model of a license plate text.

The spaces between characters are modelled by a se-

quence of special space segments of width equal to

one. On the other hand, the rest of the image that is

not covered by the model is omitted.

NNNN

Figure 2: Typical structured text model. N stands for a type

identiﬁer denoting numbers, L means letters and empty nar-

row segments contain only space characters.

Because the width of the text in the image is un-

known and the width of the model is ﬁxed, it is nec-

essary to ﬁnd the ratio between the two. This ratio is

called scale. The next unknown parameter is the left-

to-right position of the text which is described by the

index of its leftmost column called offset.

The combination of a model µ, scale s and offset

o determines the geometry of the text and the model

also deﬁnes all possible strings. The string will be de-

noted Σ and thus the complete description of a given

image consists of four parameters – µ, s, o and Σ. We

will also denote I

the image I that has been resized

by scale s to dimensions

H × ⌈W · s⌉.

3 STRUCTURED SVM

Classiﬁcation is a process that assigns a state from a

given set of all possible states to an object, based on

some observation made on the object. Classiﬁer (or

classiﬁcation strategy) is a function f : X → Y that

assigns to each observation x ∈ X a state y ∈ Y. Next

let us deﬁne a loss function ∆ :Y ×Y → R that assigns

to each pair (y, f(x)) a real number loss expressing

the penalty for classifying x into f(x), while the real

state is y. We assume that ∆(y, y

′

) = 0 if y = y

′

and

∆(y, y

′

) > 0 if y 6= y

′

The structured support vector machine learning

method is based on ﬁnding an optimal classiﬁcation

strategy that minimizes the empirical risk deﬁned as

emp

( f) =

∑

k=1

∆(y

, f(x

)), (1)

The ⌈·⌉ denotes the nearest integer towards inﬁnity –

ceiling.

RECOGNITION OF TEXTWITH KNOWN GEOMETRIC AND GRAMMATICAL STRUCTURE

195

and maximizes the margin supposing that there is a

set of example data {(x

, y

), . . . , (x

, y

)} available

(Vapnik, 1998). Here y

denotes the true state of x

To choose the optimal decision strategy, it is ﬁrst

necessary to determine the set of functions from

which the optimal function should be chosen. Usually

a set of functions is described by a function F(w;x, y)

dependent on some vector of parameters w. Then the

decision strategy has the form of

ˆy = f(w;x) = argmax

y∈Y

F(w;x, y) (2)

and choosing the optimal strategy means choosing the

optimal parameter vector w.

If a classiﬁer is linear in the vector of its param-

eters, the optimal vector of parameters can be found

using the structured support vector machine (SVM)

learning algorithm. There are two main differences

between classical and structured SVM. First, struc-

tured SVM allows for much more complicated output

spaces than classical SVM, where the output space

is merely a set of class labels. Second, arbitrary

loss functions may be used, satisfying only previously

mentioned conditions.

A classiﬁer linear in the vector of its parameters w

can be expressed as an inner product of the vector w

and some vector function Ψ(x, y) of the observation x

and state y. This means that (2) takes the form of

ˆy = f(w;x) = argmax

y∈Y

hw, Ψ(x, y)i. (3)

In the case described in this article, the state y is

deﬁned by a combination of four parameters – scale s,

offset o, model µ and string Σ – introduced in section

2. Observation x corresponds to the input image I.

Substituting these in (3) we can write

( ˆs, ˆµ, ˆo,

Σ) = argmax

s,µ,o,Σ

hw, Ψ(I, (s, o, µ, Σ))i. (4)

The vector w represents prototypes of all charac-

ters a of the alphabet A. Prototypes E(a) are images

that all have the same height H as the input image.

Vector w is created by placing these images column-

wise after each other in a given order. The inner prod-

uct hw, Ψ(I, (s, o,µ, Σ))i expresses a similarity func-

tion of input image I resized by scale s and an im-

age created from prototypes of characters in string Σ

placed according to the model µ and offset o. We use

the general form of the similarity function suggested

in (Savchynskyy and Kamotskyy, 2006)

hw, Ψ(I, (s, o, µ, Σ))i =

∑

i=1

(E(a

) ⊙ I

[o+ λ

, ω

]),

(5)

where Σ = (a

, . . . , a

) is a string, E(a

) is the pro-

totype of the character a

and the ⊙ operator denotes

the similarity function between the prototype and the

segment I

[o + λ

, ω

]. We use the cross correlation

function for this purpose as described in (Franc and

Hlav´aˇc, 2006)

E(a

) ⊙ I

[o+ λ

, ω

] =

∑

j=1

∑

k=1

E(a

)

[o+ λ

, ω

(6)

The mapping function Ψ is thus deﬁned implicitly by

the equations (5) and (6).

The vector Ψ(I, (s, o, µ, Σ)) can also be con-

structed explicitly by placing the segment I

[o +

, ω

] to the vector Ψ in the same way as the pro-

totype E(a

) is placed in the vector w (ﬁgure 3). The

remaining elements of Ψ are set to zero so that these

do not inﬂuence the value of (5).

Σ = "T0A8"

µ, o

resize

Figure 3: An example of construction of vector Ψ from im-

age I. First, I is resized to I

, second a model µ is placed on

on position o and ﬁnally segments are placed into Ψ on

positions corresponding to characters in Σ.

Finding the optimal vector of parameters w in

the sense of minimization of the empirical risk (1)

and maximization of the margin is a QP optimization

problem in the following form

min

||w||

∑

k=1

(7)

VISAPP 2008 - International Conference on Computer Vision Theory and Applications

196

such that

∀k = 1, . . . , m, ∀y ∈ Y :

hw, Ψ(x

, y

) − Ψ(x

, y)i ≥ ∆(y, y

) − ξ

, (8)

where ξ

are so called slack variables and C is a con-

stant expressing the trade off between margin max-

imization and empirical risk minimization (Vapnik,

1998). Due to the large number of constraints (8),

the QP task is performed iteratively. Most violated

constraints are added to the working set in each it-

eration. Finding these constraints requires that there

exists an algorithm for solving the so called loss aug-

mented classiﬁcation task

ˆy = argmax

y∈Y

(∆(y, y

) + hw, Ψ(x

, y)i). (9)

The maximum in (9) is searched over all y ∈ Y. Since

y is given by the parameters (s, µ,o, Σ), the geometric

models are also used in the optimization (learning)

process.

The correct segmentation of all images in the

training sets is known and it is given by the states

. Thus each column of the training image can be

labeled according to the character it depicts. The loss

function ∆(y, y

) was deﬁned as the number of incor-

rectly labeled image columns for segmentation based

A general algorithm solving the problem (7) is

described in (Tsochantaridis et al., 2005) and needs

an external QP solver. A modiﬁed algorithm used in

this work is described in detail in (Franc and Hlav´aˇc,

2006).

4 CLASSIFICATION TASK

EVALUATION

The recognition algorithm implements the maximiza-

tion of (5) over all variables, i.e.

( ˆs, ˆµ, ˆo,

Σ) = argmax

s,µ,o,Σ

hw, Ψ(I, (s, o, µ, Σ))i =

= argmax

s,µ,o,Σ

∑

i=1

(E(a

) ⊙ I

[o+ λ

, ω

]). (10)

Since the model assumes that characters in different

segments are independent of each other, the similar-

ity function can be maximized within each segment

separately.

( ˆs, ˆµ, ˆo,

Σ) =

= argmax

s,µ,o

∑

i=1

max

(E(a

) ⊙ I

[o+ λ

, ω

]). (11)

The algorithm based on equation (11) is shown in

ﬁgure 4.

Input:

Image I of height H and width W

A set of models M

Prototypes E(a) for all symbols a

Set of scales S and set of offsets O

Output:

Scale ˆs, offset ˆo, model ˆµ and string

Σ = ( ˆa

, . . . , ˆa

)

maximizing the similarity function.

begin

TOTALMAX := −∞

forall s ∈ S do

= resize(I, s)

forall µ ∈ M do

forall o ∈ O do

VALUE := 0

Initialize array CHAR of length N

for i = 1 to N

MAXC := −∞

foreach a

C := E(a

) ⊙ I

[o+ λ

, ω

]

if C > MAXC then

MAXC := C

CHAR[i] := a

end

VALUE := VALUE+ MAXC

end

if VALUE > TOTALMAX then

TOTALMAX := VALUE

ˆµ := µ ˆo := o ˆs := s

Σ := CHAR

end

Figure 4: Basic algorithm for similarity function maximiza-

tion.

5 EXPERIMENTS

In this paper we present experiments on three data

sets. The ﬁrst data set consists of car license plates

from four European countries (Czech, Hungarian,

Slovak and Polish). The second data set contains

Saudi-Arabianlicense plates and the third set contains

ADR/RID plates.

The ﬁrst set consists of 2121 training images and

521 testing images. The input image size was 13x200

pixels. Eight models in total were used to describe

the geometry and the syntax of the strings in the set.

The recognized alphabet consists of 39 symbols. Al-

though distinct text fonts appear in the set, just one

prototype per character from the alphabet was used.

The second data set with Saudi-Arabian license

plates consists of 627 training examples and 157 test-

ing examples. Only one geometrical model with the

RECOGNITION OF TEXTWITH KNOWN GEOMETRIC AND GRAMMATICAL STRUCTURE

197

7 S 2 1 2 5 7

F L X − 1 1 0

D W 0 3 3 4 J

B R − 5 4 0 A U

Figure 4: Four examples of input images from the ﬁrst data

set with recovered segmentation and recognized strings.

     

Figure 5: An example of Saudi-Arabian license plate image

with recovered segmentation.

3 3

1 2 1 9

Figure 6: An example of a top and a bottom line from ADR

plate with recovered segmentation and recognized strings.

alphabet of 27 symbols was used. The input image

size was 24x100 pixels.

The third data set contains two-line ADR/RID

plates. The set consists of 109 training and only 20

testing images. Each text line was recognized inde-

pendently. The image resolution of the ipnut line was

13x140 pixels.

Severalexamplesof input images and OCR results

are shown in Figure 4, Figure 5 and Figure 6.

The total error rates achieved by the algorithm on

the testing sets are given in Table 1, Table 2 and Ta-

ble 3. Most of the errors are due to character mis-

classiﬁcation. The segmentation error is typically low

in this approach. If necessary, the total error can be

reduced joining a nonlinear character classiﬁcation

module which cuts down the character misclassiﬁca-

tion error.

In general, the error rate depends on the quality of

the input image sets and the complexity of the given

recognition task (i.e. the number of all possible solu-

tions). Unfortunately we did not ﬁnd any public refer-

ence data set to enable the objective evaluation of the

Table 1: Error rates on data set consisting of Czech, Hun-

garian, Polish and Slovak license plates.

algorithm total segmentation character

error error misclsf

reference alg. 10.1% 3.3% 6.8%

proposed alg. 4.6% 0.95% 3.6%

Table 2: Error rates on Saudi-Arabian license plates.

algorithm total segmentation character

error error misclsf

reference alg. 18.1% 6.8 % 11.3%

proposed alg. 9.7 % 2.3 % 7.4%

Table 3: Error rates on ADR/RID plates.

algorithm total segmentation character

error error misclsf

reference alg. 5% 5% 0.0%

proposed alg. 0.0% 0.0% 0.0%

presented algorithm.

Therefore we took as a reference another algo-

rithm described in (Franc and Hlav´aˇc, 2006). This

reference algorithm is also based on structured SVM,

however it does not make use of any geometrical or

syntax model.

6 CONCLUSIONS

In this article we proposed an OCR algorithm for

structured texts that is based on exploiting the knowl-

edge about their geometricand grammatical structure.

We introduced a formal description of a large vari-

ety of structured texts in terms of a geometric model.

We also formulated the classiﬁcation task in terms of

maximizing a similarity function based on (Savchyn-

skyy and Kamotskyy, 2006; Franc and Hlav´aˇc, 2006)

that compares the input image to an idealized one for

all possible conﬁgurations. The idealized image con-

sists of prototypes of individual characters. These

prototypes are interpreted as parameters of the clas-

siﬁer that are to be determined by learning. We used

the SVM method for structured classiﬁers described

in (Franc and Hlav´aˇc, 2006).

The described OCR method was tested in many

experiments and currently was proved as a part of

a commercial license plate recognition system. The

algorithm ﬁts especially for low quality images of

strings with limited number of geometric and gram-

matical models.

VISAPP 2008 - International Conference on Computer Vision Theory and Applications

198

ACKNOWLEDGEMENTS

This work has been sponsored by The Czech Min-

istry of Education project 1M0567 (M.U.) and by

The Czech Science Foundation project 201/06/1821

(J.R.). The third author (V.F.) was supportedby Marie

Curie Intra-European Fellowship grant SCOLES

MEIF-CT-2006-042107.

REFERENCES

Franc, V. and Hlav´aˇc, V. (2006). A novel algorithm for

learning support vector machines with structured out-

put spaces. Research Report K333–22/06, CTU–

CMP–2006–04, Department of Cybernetics, Faculty

of Electrical Engineering Czech Technical University,

Prague, Czech Republic.

Ko, M.-A. and Kim, Y.-M. (2003). License plate surveil-

lance system using weighted template matching. In

Proceedings of the 32nd Applied Imagery Pattern

Recognition Workshop (AIPR’03). IEEE Computer

Society.

LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998).

Gradient-based learning applied to document recogni-

tion. Proceedings of the IEEE, 86(11):2278–2324.

Lee, H.-J., Chen, S.-Y., and Wang, S.-Z. (2004). Extrac-

tion and recognition of license plates of motorcycles

and vehicles on highways. In Proceedings of the

17th International Conference on Pattern Recognition

(ICPR’04). IEEE Computer Society.

Rahman, C. A., Badawy, W., and Radmanesh, A. (2003).

A real time vehicle’s license plate recognition system.

In Proceedings of the IEEE Conference on Advanced

Video and Signal Based Surveillance (AVSS’03). IEEE

Computer Society.

Savchynskyy, B. and Kamotskyy, O. (2006). Character tem-

plates learning for textual image recognition as an ex-

ample of learning in structural recognition. In Pro-

ceedings of the Second International Conference on

Document Image Analysis for Libraries (DIAL’06),

pages 88–95. IEEE Computer Society.

Shapiro, V. and Gluhchev, G. (2004). Multinational license

plate recognition system: Segmentation and classiﬁ-

cation. In Proceedings of the 17th International Con-

ference on Pattern Recognition, volume 4, pages 352–

355.

Tsochantaridis, I., Joachims, T., Hofmann, T., and Altun,

Y. (2005). Large margin methods for structured and

interdependent output variables. Journal of Machine

Learning Research, 6:1453–1484.

Vapnik, V. (1998). Statistical Learning Theory. John Wiley

& Sons, Inc.

RECOGNITION OF TEXTWITH KNOWN GEOMETRIC AND GRAMMATICAL STRUCTURE

199