SIMULTANEOUS RECONSTRUCTION AND RECOGNITION OF

NOISY CHARACTER-LIKE SYMBOLS

L´aszl´o Cz´uni

1

,

´

Agnes Lipovits

2

and D´avid Papp

1

1

Dept. of Electrical Engineering and Information Systems, University of Pannonia, Egyetem street 10., Veszpr´em, Hungary

2

Dept. of Mathematics, University of Pannonia, Egyetem street 10., Veszpr´em, Hungary

Keywords:

Image Reconstruction, Markov Random Field, Optical Character Recognition, Radon Transform

Abstract:

In our article we deal with the simultaneous problem of reconstruction and recognition of binary symbols

loaded with heavy additive noise. We introduce a Markov Random Field (MRF) model where a shape energy

term is responsible to ﬁnd a solution similar to a tested hypothesis. This way we could increase the precision

of the reconstruction process the only question is how to ﬁnd out the right hypotheses which helps the recon-

struction the best way. Fortunately the new energy term gives us the answer: the tested hypotheses with the

minimal shape energy component designates the right shape.

1 INTRODUCTION

The reconstruction of binary images from noisy ob-

servations is a common problem in image processing.

In several applications, besides observations, we have

some a priori information about the possible shapes

of the typical objects (such as letters or other well de-

ﬁned symbols) to help the reconstruction. Often we

would also like to recognize the observed symbols,

not possible without prior noise ﬁltering.

In our paper we show an MRF based solution for this

twofold problem: the proposed algorithm makes re-

construction suitable for visual purposes or for further

processing, and it also recognizes the symbols. For

a priori information about the shape of objects, their

Radon transform is stored as simple shape descrip-

tor vectors. The Radon transform is fast to compute,

requires only 1D information to store, and tolerates

some distortion of the original shapes.

2 THE MRF MODEL

Markov Random Field (MRF) is a probability model

based on local characteristics (Geman and Grafﬁgne,

1986). In an MRF, the sites are related to one an-

other via a neighborhood system. A generalized def-

inition of Markov Random Field can be given using

graphs. Let G = (S, E) be an undirected graph where

S = s

1

,s

2

,...,s

N

is a ﬁnite set of vertices (sites) of

the graph, and E is the set of edges of the graph. By

deﬁnition, two sites of the graph, s

i

and s

j

are neigh-

bors if there is an edge connecting them. Given a

site s, the set of points which are neighbors of s (the

neighborhood of s) is denoted by V

s

. By deﬁnition,

V = {V

s

|s ∈S} is a neighborhood system for G if

s /∈V

s

and s ∈V

r

⇔ r ∈V

s

(1)

We assign to each site of the graph a label λ from

a ﬁnite set of labels Λ. Such an assignment is called

a conﬁguration, denoted by ω. The set of all possi-

ble conﬁgurations is denoted by Ω. The conﬁguration

restricted to a subset T ⊂ S is denoted by ω

T

. The

value given to a site s by the conﬁguration ω is rep-

resented by ω

s

. We assign probability measures to

the set Ω of all possible conﬁgurations ω. The local

characteristics of a probability measure P deﬁned on

the set Ω of all possible conﬁgurations are the con-

ditional probabilities of the form P(ω

s

|ω

S−s

), that is,

the probability that the site s is assigned the label ω

s

,

given the values at all other sites of the graph. By def-

inition, a probability measure χ is a Markov Random

Field with respect to a neighborhood system V if

∀ω ∈Ω : P(χ = ω) > 0 (2)

∀s ∈ S,∀ω ∈ Ω : P(ω

s

|ω

S−s

) = P(ω

s

|ω

V

) (3)

so that the local characteristics of the probability mea-

sure depend only on the knowledge of the labels at the

neighboring sites.

By deﬁnition, C ⊂ S is a clique, if every pair of

points in C are neighbors. We can deﬁne a potentialV

as a way to assign a numberV

a

(ω) to every subconﬁg-

uration ω

A

of a conﬁguration ω. Given the potential

197

Czúni L., Lipovits Á. and Papp D..

SIMULTANEOUS RECONSTRUCTION AND RECOGNITION OF NOISY CHARACTER-LIKE SYMBOLS.

DOI: 10.5220/0003861101970200

In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2012), pages 197-200

ISBN: 978-989-8565-03-7

Copyright

c

2012 SCITEPRESS (Science and Technology Publications, Lda.)

V, it deﬁnes an energyU(ω) on the set Ω of all conﬁg-

urations ω by U(ω) = −

∑

A

V

A

(ω) where for a ﬁxed

ω the sum is taken over all subsets A of S.

The Gibbs measure induced by U is deﬁned by:

π(ω) =

1

Z

exp(−U(ω)), Z =

∑

ω

exp(−U(ω)). (4)

In the case when V

A

(ω) = 0 whenever A is not a

clique, the potential V is called a nearest neighbor

Gibbs potential. The Hammersley-Clifford theorem

establishes the equivalence between Gibbs measures

and MRFs: χ is an MRF with respect to the neigh-

borhood system V, if and only if π(ω) = P(χ = ω)

is a Gibbs distribution with a nearest neighbor Gibbs

potential V, that is

π(ω) =

1

Z

exp(−

∑

c∈C

V

c

(ω)), (5)

where C is the set of cliques. This equivalence allows

us to specify potentials instead of local characteristics

when deﬁning an MRF.

Markov Random Fields can be used to address many

low-level image tasks, like restoration, edge detec-

tion, segmentation, motion detection. Let F denote

the observations on a grayscale image, and f

s

denote

the observation belonging to the pixel s. We have to

ﬁnd the conﬁguration ω which maximizes the proba-

bility P(ω|F). By the Bayes-theorem

P(ω|F) =

1

P(F)

P(F|ω)P(ω). (6)

Furthermore,

P(F|ω) =

∏

s∈S

P( f

s

|ω

s

), (7)

and based on the Hammersley-Clifford theorem,

P(ω) =

∏

c∈C

exp(−U

c

(ω

c

)), (8)

where C is the set of all possible cliques, and U is the

clique potential. So we have to ﬁnd the conﬁguration

which maximizes the function

∏

s∈S

P( f

s

|ω

s

)

∏

c∈C

exp(−U

c

(ω

c

)) (9)

P(F) can be omitted, because it does not depend on ω.

Assuming that P( f

s

|ω

s

) is Gaussian, and taking the

logarithm of the above we get the following energy

functions:

U

1

(ω,F) =

∑

s∈S

(ln

√

2πσ

ω

s

+

( f

s

−µ

ω

s

)

2

2σ

2

ω

s

) (10)

which stands for the observation, and

U

2

(ω) =

∑

c∈C

V

2

(ω

c

) (11)

V

2

(ω

c

) = V

s,r

(ω

s

,ω

r

) =

(

−β if ω

s

= ω

r

+β if ω

s

6= ω

r

(12)

β is a model parameter representing the homogeneity

of the regions, µ

ω

s

and σ

ω

s

is the mean and the de-

viation belonging to label ω

s

. The goal is to ﬁnd a

conﬁguration which minimizes the two energy func-

tions,

∑

s∈S

V

1

(ω

s

, f

s

) +

∑

c∈C

V

2

(ω

c

) (13)

For optimization different relaxation techniques can

be used, in our paper we apply Simulated Annealing.

2.1 Radon Transformation

Radon transformation is widely used in the ﬁeld of

tomography (Radon, 1917) (Nagy and Kuba, 2005)

(Naser et al., 2009). The projection of an object at

a given angle θ is made up of a set of line integrals.

These line integrals are the Radon transform of the

object. The inverse Radon transform can be used to

reconstruct an approximation of the original object.

If we had an inﬁnite number of projections of an ob-

ject taken at an inﬁnite number of angles, we could

perfectly reconstruct the original object.

Let f(x,y) be a two dimensional continuous func-

tion. The Radon transform is a function deﬁned on

the two dimensional space of straight lines L by the

line integral along each line:

Rf(L) =

Z

L

f(x)dx (14)

A straight line L can be represented in the form:

(ρcosθ−ssinθ,ρsinθ+ scosθ), (15)

where ρ is the distance of L from the origin, and θ is

the angle of the normal vector to L with the x axis.

The Radon transform can be expressed, using

these quantities as coordinates, as:

Rf(ρ, θ) =

Z

∞

−∞

f(ρcosθ −ssinθ,ρsinθ+ scosθ)ds

(16)

We use only two angles (the vertical and horizon-

tal directions), which makes it easy and fast to com-

pute. Let f(i, j) be a binarized image, where

f(i, j) =

(

1 if the pixel is black

0 if the pixel is not black

(17)

0 ≤ i < n,0 ≤ j < m , where n is the height of the

image, and m is the width of the image.

We denote the Radon transform of the jth row as

Rr

j

, and the Radon transform of the ith column as Rc

i

.

VISAPP 2012 - International Conference on Computer Vision Theory and Applications

198

We deﬁne them as the sum of the values of the pixel

in the jth row or the ith column:

Rr

j

=

n−1

∑

i=0

f(i, j), Rc

i

=

m−1

∑

j=0

f(i, j). (18)

We can deﬁne the probability of a random pixel

chosen from a row or a column being black, in respect

of the chosen row or column:

P(r

j

(i) = 1) =

Rr

j

n

and P(c

i

( j) = 1) =

Rc

i

m

. (19)

Considering the whole image, the probability is the

following:

P( f(i, j) = 1) = P(c

i

( j) = 1) ·P(r

j

(i) = 1). (20)

2.2 The Proposed Extension of MRF

For the MRF reconstruction with Radon transforms

(called mMRF), a hypothesis about the shape is

needed since we will have a new energy term to mea-

sure the similarity to this hypothesis. For simplicity

we run the mMRF with all of the possible symbols as

hypotheses, and then we ﬁnd out the right one as the

reconstruction is ﬁnished. The similarity between the

reconstruction and the hypothesis is described by the

energy term:

U

3

=

γ·

n−1

∑

i=0

m−1

∑

j=0

|P(r

j

( f,i) = 1) −P(r

j

(g,i) = 1)|+

|P(c

i

( f, j) = 1) −P(c

i

(g, j) = 1)|,

(21)

where γ is a constant specifying a weight, f is the

reconstructed image, and g is the hypothesis image.

2.3 Localization

The position of the characters under reconstruction is

not known but would be required according to (21).

The vertical coordinate can be easily determinedsince

the characters are usually part of a bigger textual data,

where the position of the lines can easily be found.

To calculate the horizontal coordinate matching of

Radon transforms can be used. Unfortunately, the

Radon functions of the raw input images are too noisy

(see Figure 2), so we apply a Gaussian convolution

ﬁlter on the input image and then binarize the result.

Matching is done by:

min

x

n−1

∑

i=0

|Rc

i+x

( f) −Rc

i

(g)|, (22)

where g is the smoothed and binarized noisy image,

and f is the original template image.

Table 1. shows, for different font types, the per-

centage of correct localization of the horizontal posi-

tion and the average distances from the real position

in pixels in our experiments. Even if the estimated po-

sition is not precise, it is at maximum 2 pixels away

from the real position and the average difference is at

subpixel level, it will not affect the classiﬁcation seri-

ously, as shown later.

Table 1: Results of the character localization using Radon

transforms after preprocessing.

Font type Exact match Average distance

arial 83,33% 0.1667

kristen 83,33% 0.1667

pescadero 66,67% 0.3611

3 RECONSTRUCTION AND

CLASSIFICATION

In our experiments three different fonts were used,

each of 36 characters (the alphabet from a to z and

the digits from 0 to 9). Figure 1. shows samples of

some of the tested characters with and without noise

(Gaussian additive noise with 0 mean and 400 vari-

ance was applied).

Figure 1: A sample of the kristen characters without and

with noise.

It is very common to recognize characters by us-

ing only their Radon transforms (Miciak, 2010). The

row and the column Radon transform could be seen

as vectors, and the Euclidean distance of these vectors

could be used as a measure to ﬁnd the closest match to

the noisy image. Unfortunately, due to the very heavy

noise, the Radon transforms can not provide a method

for proper recognition.

Figure 2. shows the Radon transforms of two

characters with and without noise. The noise-free ver-

sions can be easily distinguished, however, the ones

with the heavy noise do not have any unique charac-

teristics, they are very similar to each other. They can

not be recognized by simply using the distance of the

SIMULTANEOUS RECONSTRUCTION AND RECOGNITION OF NOISY CHARACTER-LIKE SYMBOLS

199

Figure 2: The vertical (left) and the horizontal (right) Radon

transforms of the noise-free and noisy image of the charac-

ter ’c’ (above line) and character ’s’ (below line).

Radon transforms as experienced in our tests, where

only 1 (of the 36) character was correctly classiﬁed.

The new idea of our proposal is to use the Radon func-

tions’ similarity in the reconstruction and let this en-

ergy term (21) decide if the hypothesis is correct or

not. We chose the hypothesis with the minimal U

3

to be the right one. Thus we assume that the MRF

process will result in the best reconstruction among

others and will also classify the symbol by U

3

Figure 3. shows samples of the images recon-

structed with MRF, and with mMRF. Table 2. shows

the percentage of improvement of reconstruction and

the classiﬁcation rate for each font using the proposed

method.

Figure 3: A sample of kristen characters reconstructed with

MRF and mMRF.

Table 2: Improvement of reconstruction rate over the nor-

mal MRF and percentage of successful classiﬁcations.

arial kristen pescadero

Classiﬁcation 100,00% 100,00% 91,67%

Reconstruction 9.97% 8,25% 13,72%

4 SUMMARY

In our paper we investigated the reconstruction of bi-

nary symbols with very low SNR. We found that with

the proposed extension of the plain MRF model the

pixel-based reconstruction increased with appr. 11%

in general in case of three types of test fonts. We

found that the right template, necessary for recon-

struction, can be designated by the new energy term

generated with the Radon transform. Classiﬁcation

rate was 97% for the tested three font types.

ACKNOWLEDGEMENTS

The work and publication of results have been sup-

ported by the Hungarian Research Fund, grant OTKA

CNK 80368.

REFERENCES

Geman, S. and Grafﬁgne, C. (1986). Markov random ﬁeld

image models and their applications to computer vi-

sion. In Proceedings of the International Congress of

Mathematicans. American Mathematical Society.

Miciak, M. (2010). Radon transformation and principal

component analysis method applied in postal address

recognition task. International Journal of Computer

Science and Applications, 7(3):33–44.

Nagy, A. and Kuba, A. (2005). Reconstruction of binary

matrices from fan-beam projections. Acta Cybernet-

ica.

Naser, M. A., Mahmud, A., Areﬁn, T. M., Sarowar, G., and

Ali, M. M. N. (2009). Comparative analysis of radon

and fan-beam based feature extraction techniques for

bangla character recognition. IJCSNS International

Journal of Computer Science and Network Security,

9(9):287–289.

Radon, J. (1917). ber die bestimmung von funktio-

nen durch ihre integralwerte lngs gewisser mannig-

faltigkeiten. Berichte ber die Verhandlungen der

Schsische Akademie der Wissenschaften, pages 262–

277.

VISAPP 2012 - International Conference on Computer Vision Theory and Applications

200