Character

Rotation Absorption

Using a Dynamic Neural Network Topology:

Comparison With Invariant Features

Christophe Choisy

, Hubert Cecotti

, and Abdel Bela

ıd

LORIA

Campus Scientiﬁque, BP 239

54506 Vandoeuvre-ls-Nancy cedex, France

Abstract. This paper treats on rotation absorption in neural networks for multi-

oriented character recognition. Classical approaches are based on several rotation

invariant features. Here, we propose to use a dynamic neural network topology

to absorb the rotation phenomenon. The basic idea is to preserve as most as pos-

sible the graphical information, that contains all the information. The proposal

is to dynamically modify the neural network architecture, in order to take into

account the rotation variation of the analysed pattern. We use too a speciﬁc topol-

ogy that carry out a polar transformation inside the network. The interest of such

a transformation is to transform the rotation problem from a 2D problem to a 1D

problem, that is easier to treat. These proposals are applied on a synthetic and on

a real EDF

base of multi-oriented characters. A comparison is made with Fourier

and Fourier-Mellin invariants.

1 Introduction

There are two main opposed approaches in character recognition: the ﬁrst one tries

to ﬁnd a set of representative features for each class [6], as the second one tries to

learn all the possible character deformations and variations [9]. If the ﬁrst method could

be efﬁcient with a low cost for relatively clean data, the second one has proved its

superiority on variable characters like handwritten digits [8].

The backdraw of this method is that it requires to dispose of all the possible pattern

deformations in the training set. When dealing with relatively straight characters it is

possible to create such a set [9]. But for multi-oriented characters the problem is multi-

plied by all the rotated position possibilities, leading to a very huge database. Moreover,

the number of model parameters will also greatly increase, leading to a very complex

system. This is why usually one proposes to extract some invariant features to represent

each class [1, 10, 4]. But as said before, this method is less efﬁcient than learn all the

possibilities, because some information is lost in the features extraction.

In order to not loose any information, we avoid the intermediate feature extraction

step, working directly on the image. Furthermore, as it is impossible to learn all the

EDF:

the French national company of electricity

Choisy C., Cecotti H. and Belaïd A. (2004).

Character Rotation Absorption Using a Dynamic Neural Network Topology: Comparison With Invariant Features.

In Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems, pages 90-97

DOI: 10.5220/0002683500900097

 SciTePress

multi-oriented variations, we propose to absorb the rotation by adapting the system

topology to the deformation produced by the rotation. The interest of such a technique

is to obtain model and training set conditions similar to those of straight characters

approaches. Thus the model complexity and the database size remain reasonable.

Neural networks offer the possibility to dynamically modify their topology making

easy the implementation of such approach. The purpose of this paper is to show how

to dynamically modify the network topology by acting on the network links without

disturbing the training efﬁciency. The principle of the dynamic changing is to act in a

same manner on all the links of all the neurons on a speciﬁc layer. This guarantees to

consistently distribute the error during the learning step.

In order to facilitate the dynamic network adaptation, it is usefull to proceed to

a polar transformation. This makes it possible to pass from a 2D problem to a 1D

problem. In order to disturb as less as possible the network process, we propose to code

the polar transformation as a speciﬁc neural topology. This leads the network deciding

itself about the interest of each pixel in the transformation.

2 System overview

Fig.1. System overview

Our system is based on a multi-layer perceptron with several speciﬁcities. It is de-

signed to be as most as possible preprocessing-free. We only scale the images according

to the neural network input size. Figure 1 shows the different steps in our neural net-

work.

The input layer takes the gray-level pixels of the pattern. The link between the ﬁrst

and the second layer are organized in order to complete a polar transformation of the

image: thus the second layer can be assimilated to the polar version of the input image.

During learning step, the link weights are re-estimated in order to take into account

pixel interest in the transformation. This polar transformation allows to transform the

problem of image rotation into a problem of shift along the angle axis.

The second and third layers are connected with dynamic links. Modifying these

links allows to adapt to the shift. Thus the information will be transferred to the next

layers as if the input image was straight.

A layer with local convolutions is added to reﬁne the image analysis. The last layer

gives the output probabilities for the different classes.

3 The polar transformation

The problem of rotation absorption leads often to transform the image into polar co-

ordinates in a ﬁrst step [7,3]. Indeed, it is easier to deal with the polar form, because

the rotation problem is transformed in a translation problem: thus, a 2D deformation

becomes a 1D deformation.

Consider an image I

of size X ∗ Y deﬁned in Cartesian coordinate I

(x; y) where

0 ≤ x ≤ X −1 and 0 ≤ y ≤ Y −1. If the image I

is rotated through angle α ∈ [0; 2π],

the new image is deﬁned as I

; y

) = I

(x ∗ cos(α); y ∗ sin(α)).

Let G be the gravity center of the image. Consider the image in polar coordi-

nate I

= (r, θ): the relationship between I

and the rotated image though angle α

is I

; θ

) = I

(r; θ + α). r is deﬁned as the distance between G and the point.

θ ∈ [0; 2π] corresponds to the angle. Thus the angle variations can be treated as a shift

along the θ axe. Several ways are proposed to perform this transformation.

Fig.2. Rotation-to-Shift effect of the polar transformation: Top: original image, Bottom: trans-

formed image

3.1 Goshtasby transformation improvements

Let R be the maximum radius of the pattern. R is deﬁned as the maximum Euclidean

distance between the center of mass and the furthest gray pixel (not isolated, to avoid

taking noise as pixel reference). The Goshtasby method [7] is to draw n concentrically

circles around G with radius R ∗ i/n ; i ∈ {1, .., n}. Each circles is cut out m parts. In

this proposal only the pixels in the intersection of circles and the partition of angles are

chosen, to build the n ∗ m polar shape matrix.

With this kind of description, the original image is not fully represented in the trans-

formed one; thus we ﬁrst propose to use the mean of the sector pixels rather than only

the intersecting points.

However surfaces are growing with the radius, thus the parts near the center contain

few information as the furthest keep a lot of information. Thus we propose another

approach, based on equal sector surfaces.

Let r be the portion’s step of the radius, πr

is the surface of the inner ring. Consider

the ring n, the surface S

of this ring is deﬁned as π((n + 1)r))

− π(nr)

, S

(2n + 1) ∗ πr

. By dividing the n

ring into 2n + 1 parts, sector surfaces are equal

for each ring on the image. The number of sectors of the polar image for n rings is

i=n−1

i=0

(2i + 1).

This approach allows a better repartition of the pixels in the transformed image,

as shows ﬁgure 3. Some tests on multi-oriented character recognition show about 5%

improvement for the equal surfaces approach.

Fig.3. Sector center repartition considering classical (left) and equal surface (right) transforma-

tion

3.2 Integration of the polar transformation in a Neural Network

It is classical to give a topology to a neural network, in order to search elementary

patterns and decrease the network complexity. The advantage is to let the network learn

itself what is the contribution of each pixel in these patterns. In the same idea, we

propose to integrate the polar transformation in the neural network through a speciﬁc

topology.

The transformation is made through the links between two layers. The ﬁrst one

represents the Cartesian image whereas the second one represents the polar image. The

principle is similar to the mathematical transform, except than pixels are not directly

moved: the links are ﬁxed in such a manner than the input neuron position is taken

in Cartesian coordinates, and the destination neuron position correspond to the polar

destination of the pixel. The ﬁgure 4 illustrate the two layers with some links, for the

classical polar transformation.

The classical and equal surfaces transformation were tested using appropriated spe-

ciﬁc topologies. But for the following work we only use the classical transformation,

due to an interesting property: each column of the transformed image corresponds to

an angle portion (see ﬁgure 4). For the equal surfaces transformation the shift problem

will vary according to the radius (see ﬁgure 5): this leads to a more complex dynamic

procees, needing some interpolation depending of the observed ring.

Fig.4. Left: Layer 0, Input Image ; Right: Layer 1, image in polar coordinate. Rows represent set

of links between neurons

Fig.5. Equal surface analysis needs a pyramidal layer struct, that leads to a more complex dy-

namic process.

4 Dynamic topology for the process

The use of a dynamic topology is a extension of the convolution idea for neural network.

As said in the introduction, an optimal learning needs to have seen all the possible

pattern distortions. For multi-oriented data, that conduces to multiply the data number

by all the possible angles. This leads to a very big database, and needs a big neural

network to deal with so many parameters. It could be possible to achieve such a system,

but it is not realistic to use it.

After the polar transformation, the neural network has to deal with a problem of

shift along the angle axe. Our proposal is to correct the shift error, in order to bring the

image back to its straight position. Thus it is not needed to learn all the possible rotation

possibilities, but only the straight samples. This allows to have a network complexity

similar to the classical other applications.

The principle of the dynamic links is to change the input neuron of each link ac-

cording to the shift error. The intuitive idea is shown in ﬁgure 6. The objective is that

the ﬁrst link of each neuron will always be connected to the real ﬁrst column of the

polar image. If the image is shifted, the link is dynamically moved to the position of

the ﬁrst column. This movement is synchronized for all the dynamic links, according to

the observed sector. On an output neuron line, all the neurons observe the same input

line, and the links are moved synchronously. It allows to have classically several output

neurons for the same input neuron group.

Fig.6. Dynamic topology. Left: Input image straight, Right: Input image oriented

This introduce the notion of order in the links: if the training and testing calculation

give the same role to each link, the dynamic part differentiate them according to the

location they analyze on the image.

5 Implementation and experimental results

Our system has been tested on two databases. The ﬁrst one is synthetic: its interest is to

validate the proposed approach and to clearly compare it to invariant feature extraction.

The second database is a real database of multi-oriented and multi-sized characters,

extracted from EDF

technical maps.

The synthetic database was builded from different orientations of characters from

one font. It has 62 classes. The training base considers 12 orientations by 30 degrees

step beginning in 0. The testing base aslo considers this step, but beginning in 15 de-

grees. Considering all classes, it leads to 744 patterns to train, and same to test.

The real database was restricted to 30 major classes, as proposed in another work

[1]. It contains 15268 samples, divided in 80% for training and 20% for testing. The

distribution of the characters are not homogeneous. An orientation approximation made

by EDF while character strings are extracted from maps, shows that half of the character

have an estimated slope lower than 5

. It conﬁrms one of our hypothesis: a database

with enough samples for all the possible orientations will be very huge.

In our ﬁrst tests, we used the orientation information to dynamically adapt the

topology. The objective is to validate the interest of such an approach. Obviously, the

perspective of this work is clearly to automatically decide this dynamic modiﬁcation.

Results are given considering straight rectiﬁed characters, dynamic topology, Fourier-

Mellin invariants, Fourier invariants, and multi-oriented characters. Fourier invariants

are extracted fater polar transformation; assuming a 16x32 polar image, it gives 512

invariants. Fourier-Mellin invariant number depends on the precision of the analysis:

the parameter number is indicated.

Synthesis database results are given in table 1.

EDF: the French national company of electricity

Table 1. Synthesis database: recognition rate for 62 classes

Straight rectiﬁed characters 92.47%

Dynamic Topology 88.44%

Fourier-Mellin (33 features) 83.20%

Fourier features 81.89%

Multi-oriented characters 45.83%

Table 2 Fourier and Fourier-Mellin results when samples are grouped into rotation

similar classes.

Table 2. Synthesis database with 44 grouped classes

Fourier-Mellin (33 features) 86.83%

Fourier features 87.90%

Results show the superiority of the dynamic topology approach. The difference be-

tween this approach and straight rectiﬁed character results, resides in the fact that recti-

ﬁcation is made precisely at the rotation degree, when dynamic topology absorbs only

with a variation of +/-5 degrees. The small superiority of Fourier when considering

grouped classes can be explained by its higher parameter number regarding to Fourier-

Mellin approach.

Results on the EDF real database are given in table 3. The score for multi-oriented

characters is better than for invariant features. This phenomenon can be explained by the

fact that character slopes are badly distributed, and should conduce to orientation clus-

ters. The superiority of the dynamic topology can be explained by the approximative

orientation given by EDF: the variation induced by the dynamic topology implementa-

tion allows to better distribute the samples in the topology variation range.

Table 3. Real database: recognition rate for 30 classes

Straight rectiﬁed characters 87.31%

Dynamic Topology 88.97%

Fourier-Mellin (33 features) 68.85%

Fourier-Mellin (119 features) 76.72%

Fourier features 73.16%

Multi-oriented characters 76.89%

All these results prove that invariant features cannot compete with graphic informa-

tion. This shows the interest of a dynamic topology. Let us precise that the 30 classes

selected for this test seem not to be the best choice to really reduce the rotation con-

fusion problem. We are trying another clusters in order to have more differenciated

Fourier-Mellin features.

6 Conclusion

We have proposed a new methodology to absorb image rotations in a neural network

without extracting invariant characteristics. This method is based on a dynamic neural

network topology, and gives encouraging results when applied on multi-oriented and

multi-scaled images. Compared to rotation invariant feature extraction, this approach

proved its interest, showing that graphical information can perform better than features

extraction.

Our aim is to propose in a future work a decisional process, that will be able to de-

cide of the dynamic link modiﬁcation. Another work concerns the polar transformation:

the best version induces a more complicated dynamic topology process, that we have

to implement; some tests show that this new version could improve recognition from

about 5%.

References

1. Adam, S., Ogier, J.M., Cariou, C. Mullot, R., Gardes, J., Lecourtier, Y.: Fourier-mellin based

invariants for the recognition of multi-oriented and multi-scaled shapes : application to engi-

neering drawings analysis. Machine perception and artiﬁcial intelligence (2000) 42:132–147.

2. Brandt, R.D., Lin, F.: Representations that uniquely characterize images modulo translation,

rotation, and snewblockcaling. Pattern Recognition Letters (1996), 9(17):1001–1015.

3. Bui, T., Chen, G., Roy, Y.: Translation-invariant multiwavelets for image de-noising. Proc. 3rd

World Multi-Conference on Circuits, Systems, Communications and Computers (1999).

4. Derrode, S., Ghorbel, F.: Robust and efﬁcient fourier-mellin transform approximations for

gray-level image reconstruction and complete invariant description. Proc. 3rd World Multi-

Conference on Circuits, Systems, Communications and Computers.(2001), 83(1).

5. Deseilligny, M., Le Men, H., Stamon, G.: Characters string recognition on maps, a method for

high level reconstruction. ICDAR (1995), 249–252.

6. Dur Trier, O. Jain, A.K., Taxt, T.: Feature extraction methods for character recognition – a

survey. Pattern Recognition (1996), 4(29):641–662.

7. Goshtasby, A.: Description and discrimination of planar shapes using shapes matrices. IEEE

Transactions of Pattern Recognition and Machine Intelligence (1985), PAMI-7(6):738–743.

8. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document

recognitionntion. Intelligent Signal Processing, 306–351, 2001.

9. Simard, P., Steinkraus, D., Platt, J.: Best practice for convolutional neural networks applied

to visual document analysis international conference on document analysis and recogntion.

ICDAR, IEEE Computer Society (2003), 958–962.

10. Zhenjiang, M.: Zernike moment-based image shape analysis and its application. Pattern

Recognition Letters (2000), 2(21):169–177.