HETEROGENEOUS IMAGE RETRIEVAL SYSTEM BASED ON

FEATURES EXTRACTION AND SVM CLASSIFIER

Rostom Kachouri

Research unit on Computers, Imaging, Electronics and Systems, ENIS, BP W, 3038 Sfax, Tunisia

Informatics, Integrative Biology and Complex Systems, 40 Rue de Pelvoux, 91020 Evry Cedex, France

Khalifa Djemal, Hichem Maaref

Informatics, Integrative Biology and Complex Systems, 40 Rue de Pelvoux, 91020 Evry Cedex, France

Dorra Sellami Masmoudi, Nabil Derbel

Research unit on Computers, Imaging, Electronics and Systems, ENIS, BP W, 3038 Sfax, Tunisia

Keywords:

CBIR, SVM, QUIP-tree, feature extraction, heterogeneous image database.

Abstract:

Image databases represent increasingly important volume of information, so it is judicious to develop powerful

systems to handle the images, index them, classify them to reach them quickly in these large image databases.

In this paper, we propose an heterogeneous image retrieval system based on feature extraction and Support

vector machines (SVM) classiﬁer.

For an heterogeneous image database, ﬁrst of all we extract several feature kinds such as color descriptor,

shape descriptor, and texture descriptor. Afterwards we improve the description of these features, by some

original methods. Finally we apply an SVM classiﬁer to classify the consequent index database.

For evaluation purposes, using precision/recall curves on an heterogeneous image database, we looked for a

comparison of the proposed image retrieval system with an other Content-based image retrieval (CBIR) which

is QUadtree-based Index for image retrieval and Pattern search (QUIP-tree). The obtained results show that

the proposed system provides good accuracy recognition, and it prove more better than QUIP-tree method.

1 INTRODUCTION

Several methods ensuring image recognition were de-

veloped. But these techniques are often developed for

one kind of image and present difﬁculties for recog-

nition in an heterogeneous image database.

Different applications domains like medical do-

main, industrial domain etc, demonstrate a real need

for image recognition in large databases. To this

end we can distinguish two main types of image

databases: the speciﬁc database where the images

show a natural similarity (the same type of images,

the same content presented in a different situation,

etc), and heterogeneous databases, which can con-

tain different types and image content. One of the

important steps in a recognition system is the image

description. Indeed, this step is based on a priori

knowledge of the image content on the one hand and

on the modeled descriptors for a speciﬁc type of im-

age. Methods based on this concept gave satisfaction

for speciﬁc databases. The relevance of this descrip-

tion strategy becomes almost ineffective when image

databases are heterogeneous. It is within this frame-

work, that the system we present is registered. A con-

tent image recognition system is typically composed

of two main phases, images description and extracted

features classiﬁcation allowing effective recognition.

In fact, in an heterogeneous image database, im-

ages are various categories, and we can ﬁnd a big dif-

ference between them. So a unique feature or a unique

feature kind, can not be relevant to describe the whole

image database. In this paper, we present an heteroge-

neous image recognition system, to this aim, several

kinds of features was used and improved for this pur-

pose, such as color descriptor, shape descriptors and

texture descriptors. The used and improved features

should be efﬁcient and relevant to describe heteroge-

neous images. A better images description allows to

obtain a satisfactory images classiﬁcation.

Since the Nineties, Support vector machines

(SVMs) did not cease arousing the interest of several

researcher communities of various expertise ﬁelds.

137

Kachouri R., Djemal K., Maaref H., Sellami Masmoudi D. and Derbel N. (2008).

HETEROGENEOUS IMAGE RETRIEVAL SYSTEM BASED ON FEATURES EXTRACTION AND SVM CLASSIFIER.

In Proceedings of the Fifth International Conference on Informatics in Control, Automation and Robotics - SPSMC, pages 137-142

DOI: 10.5220/0001490601370142

 SciTePress

Such as (Schokopf et al., 1999) which was applied

SVMs to insulated handwritten ﬁgures recognition,

and (Osuna et al., 1997) which was applied SVMs

to face recognition. In the majority of cases, SVM

performance exceeds those of already established tra-

ditional models.

So, for classiﬁcation, SVMs is used in our re-

trieval system. SVMs originally formulated for two-

class classiﬁcation problems, have been successfully

applied to diverse pattern recognition problems and

have become in a very short period of time the stan-

dard state of-the-art tool. The SVMs, based on the

Structural Risk Minimization (SRM), are primarily

devised in order to minimize the upper bound of the

expected error by optimizing the trade-off between

the empirical risk and the model complexity (Burges,

1998). To achieve this, they construct an optimal hy-

perplane to separate binary class data so that the mar-

gin is maximal.

To evaluate this image retrieval system, we com-

pare it with an other Content-based image retrieval

(CBIR) system: the QUadtree-based Index for image

retrieval and Pattern search (QUIP-tree).

QUIP-tree indexing structure permits to store the

visual characteristics of the various areas in the im-

age. Database images are ﬁrst of all compared glob-

ally with the query image. Then, if its global sim-

ilarity with the query image is lower than a certain

similarity threshold, the under-areas of homologous

images are compared, so on until reaching the bottom

level (Genevire et al., 2004) (Kachouri et al., 2007).

The paper is organized as follows: Section II de-

scribes the CBIR system Structure, and the SVM ap-

proach. Section III deals with the different features

used in our system, and details the basic improve-

ments done. Experimental results, with a brief de-

scription of the QUIP-tree technique are presented in

section IV. Finally we conclude in section V.

2 CBIR SYSTEM

In this section, we ﬁrst review the CBIR theory and

describe its system Structure. Then we brieﬂy outline

the SVM classiﬁer, and QUIP-tree technique.

2.1 Content-based Image Retrieval

CBIR is today ubiquitous in computer vision. Sim-

ilarity queries on feature vectors have been widely

used to perform content-based retrieval of images. In

fact nowadays, CBIR systems allow image access ac-

cording to their visual characteristics such as color,

texture, shape, etc,..., by means of similarity mea-

sures. The smaller the similarity distance is, the closer

the two images are.

The typical CBIR system architecture, is com-

posed essentially of two stages. The ﬁrst one is Off

Line, where is carried out the feature extraction of

each database image, and the storage of each feature

in an index database. The second one is On Line,

where is carried out the recognition (classiﬁcation) by

computing similarity measures between the query im-

age signature and the index in the corresponding im-

age database.

There are several popular CBIR systems such

as: IBMs QUERY-BY-IMAGE-CONTENT (QBIC)

which allows to index images using divers features.

Visual SEEK (Smith and Chang, 1996) developed by

Smith and Chang in the university of COLUMBIA.

Surﬁmage developed in 1995 by INRIA, which is

more sophisticated than the other commercial sys-

tems. In this paper, we propose a new CBIR system

destined for heterogenous image database.

2.2 Support Vector Machines

SVM is a supervised classiﬁcation method. The su-

pervised classiﬁcation, supposes that there is already

an image classiﬁcation. So it uses necessarily train-

ing methods which from images already classiﬁed, al-

low classifying new images. For image indexing sys-

tems, supervised classiﬁcation allows to build a model

which will classify as well as possible new images,

from a classiﬁed image database.

First, in the Off Line stage: we use a training

image database, which is represented by visual de-

scriptors. With the labeled training database images,

SVM learns a boundary (i.e., hyper plane) separating

the relevant images from the irrelevant images with

maximum margin. The images on a side of boundary

are considered as relevance, and on the other side are

looked as irrelevance.

Second, in the On Line stage: using the built

model (boundary computed in the ﬁrst stage), SVM

allows to classify an evaluation image database,

which must be also represented by visual descriptors.

SVM have recently attracted a lot of researchers

from the machine learning and pattern classiﬁcation

community for its fascinating properties such as high

generalization performance and globally optimal so-

lution (Burges, 1998). In SVM, original input space

is mapped into a higher dimensional feature space

in which an optimal separating hyper-plane is con-

structed on the basis of SRM to maximize the margin

between two classes, i.e., generalization ability.

ICINCO 2008 - International Conference on Informatics in Control, Automation and Robotics

138

2.2.1 The Separable Case

Given a set of labeled images (x

, y

), . .., (x

, y

is the feature representation of one image, y

∈

{−1, +1} is the class label (− 1 denotes negative and

+1 denotes positive).

The goal is to ﬁnd a boundarysuch as all the elements,

with the same annotation, are on the same side. So we

must ﬁnd a vector w and a real b such as:

(w.x

+ b) > 0, ∀i ∈[1, n] (1)

we can take so, a decision function:

f(x) = sign(w.x+ b) (2)

This decision function is invariant by scale change,

so we choose to ﬁnd the boundary which verify

wx+b = ±1 for nearest elements to margin, what

amounts minimizing kwk

such as:

(w.x

+ b) ≥ 1, ∀i ∈[1, n] (3)

Using the Lagrangian, the problem amounts maxi-

mizing W on α, and the decision function is written

as follows:

f(x) = sign(

∑

i=1

x.x

+ b) (4)

We note that if we omit the sign operator in the deci-

sion function, we obtain a belonging measurement to

the required category.

2.2.2 The Non Separable Case

The above algorithm for separable data, when applied

to non-separable data, will ﬁnd no feasible solution.

So a ﬂexible margin may be introduced, by accepting

bad classiﬁcation for certain elements. This amounts

to raising each α

by a constant C.

Moreover, linear separation is not adapted to all prob-

lems, and it is often preferable to introduce a kernel

k(x, x

′

) which replaces the scalar product x.x

′

The classiﬁcation function can be written as:

f(x) = sign(

∑

.k(x

, x) + b) (5)

2.2.3 Choice of Kernel

The ﬁrst kernel investigated for the pattern recogni-

tion problem were the following:

k(x, y) = (x.y+ c)

Polynomial (6)

k(x, y) = e

−

kx−yk

2σ

Gaussian (7)

k(x, y) = tanh(x.y+ θ) Sigmoidal (8)

The most commonly used kernel is the gaussian one.

Since it allows to exploit the distance d placed into

exponential:

k(x, y) = e

−

d(x−y)

2σ

3 USED AND IMPROVED

FEATURES

Feature (content) extraction is the basis of CBIR. Re-

cent CBIR systems retrieve images based on visual

properties.

As we use an heterogeneous image database, im-

ages are various categories, and we can ﬁnd a big dif-

ference between their visual properties. So a unique

feature or a unique feature kind, cannot be relevant to

describe the whole image database. Then in this paper

we are interested by divers visual feature extraction

such as color, shape, texture.

3.1 Color Features

Color is one of the most important image indexing

features employed in CBIR because it has been shown

to be effective in both the academic and commercial

arenas. Some of the popular methods to character-

ize color information in images are Color average and

color histograms.

3.1.1 Color Average

The color average of an image is deﬁned by ¯x, as

follows:

¯x = (

(avg)

)

(9)

where:

Color

(avg)

∑

p=1

Color(p). N is the total

number of pixels in the image.

3.1.2 Color Histograms

Color Histograms are useful because they are rela-

tively insensitive to position and orientation changes.

So, despite they are so simple, they are the most com-

monly used color feature representation. We extract

this feature just by computing the occurrence of each

gray levels for R, G, and B color planes of the image.

3.2 Shape Features

Shape is a very important descriptor in image

database. Generally, shape descriptor indicate the

general aspect of an object, which is its contour.

3.2.1 Invariant Moments

Invariant moments are important shape descriptors in

computer vision. They are obtained from quotients

and powers of moments. One moment is a sum on all

image pixels weighted by polynomials related to the

pixel positions.

HETEROGENEOUS IMAGE RETRIEVAL SYSTEM BASED ON FEATURES EXTRACTION AND SVM

CLASSIFIER

139

In 1962, HU derived seven bi-dimensional invariant

moments (Hu, 1962).

This moments are invariant to scale, rotation and

translation.

3.2.2 Sobel Filter

Sobel ﬁlter is used for contour detection. So, it is sup-

posed that the image areas are homogeneous and that

the contour can be detected on the basis of gray levels

discontinuity.

First, we apply Sobel masks to obtain the directional

gradients according to x and y:

(i, j) = h

(i, j) ⊗I(i, j), G

(i, j) = h

(i, j) ⊗I(i, j)

(10)

Where I(i, j) is the image gray level information and

(i, j), h

(i, j) are Sobel masks:

(i, j) =





−1 −2 −1

0 0 0

1 2 1





, h

(i, j) =





−1 0 1

−2 0 2

−1 0 1





then, gradient norm is computed as follow:

G(i, j) =

(i, j)

+ G

(i, j)

(11)

3.3 Texture Features

Multiresolution approaches to texture analysis have

gained wide acceptance over the years as they ef-

fectively describe both local and global information

(Julesz et al., 1978). For this we use in this paper the

Wavelet texture features.

3.3.1 Daubechies Wavelet

Texture features are extracted from Daubechies

wavelet coefﬁcients of a two-level decomposi-

tion. Daubechies proposed an orthogonal wavelet

construction with compact support. Daubechies

wavelet has different lengthes called wavelet orders.

Daubechies wavelet order, which is always even, is

the number of null moments, it is related to the num-

ber of oscillations, more there is null moments, more

Daubechies wavelet oscillates and so there are more

regularities. Indeed, Daubechies wavelet, having M

null moments, verify :

Φ(x) =

√

2M−1

∑

k=0

k+1

Φ(2x−k) (12)

Ψ(x) =

√

2M−1

∑

k=0

k+1

Ψ(2x−k) (13)

with g

= (−1)

k−1

, k = 1, 2, ..., 2M

a) b) c)

Figure 1: a) Dinosaur, b) Dinosaur gradient norm and c)

Dinosaur Daubechies wavelet coefﬁcients of a two-level de-

composition.

Wavelet coefﬁcients are c

(x, y), where l is the de-

composition level.

Fig. 1 shows Dinosaur image, its gradient norm,

and its Daubechies wavelet transformation of a two-

level decomposition.

3.4 Feature Improvement

To improve the feature size and description, we ap-

plied original modiﬁcations to some obtained feature

coefﬁcients:

3.4.1 Sobel Coefﬁcients

As the coefﬁcient number in the gradient norm is the

same as the pixel number in the image, we compute

the gradient norm projection according to x and y, in

order to reduce this feature size:

maxG

i, j

∑

G(i, j), and P

Y j

maxG

i, j

∑

G(i, j)

(14)

Despite, this new form is a reduced form of the

Sobel feature, it preserves the same properties of the

old one.

3.4.2 Moment Coefﬁcients

To obtain more efﬁcient shape description by this fea-

ture, we do not use simple moments, which is com-

puted on image pixels, but we compute momentsfrom

the gradient norm matrix obtained on sobel feature.

0 50 100 150 200 250

0.1

0.2

0.3

0.4

0.5

0.6

Dinosaur X

0 50 100 150 200 250

0.1

0.2

0.3

0.4

0.5

0.6

Dinosaur Y

a) b)

Figure 2: New form of Dinosaur Sobel feature: a) The gra-

dient norm projection according to X and b) The gradient

norm projection according to Y.

ICINCO 2008 - International Conference on Informatics in Control, Automation and Robotics

140

So the particularity of our method, is that it com-

bines Sobel with moments, in a new shape feature de-

scription.

3.4.3 Wavelet Coefﬁcients

The lowest frequency coefﬁcients c

(x, y) are not

inherently useful for texture analysis. Therefore, a

direction-independent measure of the high-frequency

signal information is obtained by ﬁltering the raw co-

efﬁcients c

(x, y) with the Laplacian.

The texture features are obtained by computing

the subband energy of all wavelet coefﬁcients (includ-

ing the Laplacian ﬁltered c

(x, y) coefﬁcients):

∑

m=1

∑

n=1

(m, n)|

, (15)

where M and N are the dimensions of coefﬁcients

(x, y). (see Ref. (Serrano et al., 2004) for details).

Table 1: Dinosaur and Rose texture features: subband en-

ergy of all Daubechies wavelet coefﬁcients of a two level

decomposition.

Second level decomposition

Images e

Dinosaur 226.584 11.699 8.868 6.025

Rose 252.829 12.941 7.914 4.965

First level decomposition

Images e

Dinosaur 5.184 3.755 2.494

Rose 4.141 2.458 1.294

4 EXPERIMENTS

In this section we present, ﬁrst, a brief description

of the QUIP-tree technique, used for comparison pur-

pose. Then we evaluate our proposed system.

4.1 Quadtree-based Index for Image

Retrieval and Pattern Search

QUIP-tree is an unsupervised classiﬁcation method.

The unsupervised classiﬁcation, is used when images

are not classiﬁed. So it is a process by which images

are divided into different clusters such as images of

the same cluster are as similar as possible and images

of different clusters are as dissimilar as possible.

First, in the Off Line stage: we decompose

database images into n quadrants, (where n is multiple

of four), and we represent them by a visual descrip-

tor by means of quadtree. Then a similarity measure

is applied to calculate distance between images. Fi-

nally, a clustering of image database is applied.

Second, in the On Line stage: Image query is

also decomposed into quadtree structure, after that we

compare this query image with image database cluster

centers to identify candidate clusters. So query image

will be compared, at the end, with only images which

belong to candidate clusters to ﬁnally ﬁnd out similar

images.

For more details see Ref. (Genevire et al., 2004),

(Manouvrier et al., 2005), and (Kachouri et al., 2007).

4.2 System Evaluation

For evaluation, we tested our proposed image re-

trieval system, on an heterogenous image database

composed of eight clusters: a collection of 400 im-

ages (50 images by cluster). The used heterogeneous

database contains images having large difference in

colors, shapes, and textures. Some samples are shown

in Fig. 3.

To quantitatively evaluate the performances of

this system, we have carried out the following tests.

Queries representing different clusters are picked

from the image database. Then, for each query image,

a list of similar images is found in the image database,

using SVM classiﬁer.

For evaluation purposes, we compare the results

of our image retrieval system with other well known

classiﬁcation techniques QUIP-tree (see Fig. 4. (a)).

We subsequently computed the retrieval efﬁciency

using the standard retrieval benchmarks: precision

and recall (Bimbo, 2001). Let the total number of im-

ages retrieved for a query be 50, and let x1 be the num-

ber of images retrieved that are similar to the query.

Let x2 be the actual number of images similar to the

query in the image database. Evaluation standards re-

call and precision are deﬁned as follows:

precision =

×100%, and recall =

×100%

(16)

The criteria of precision and recall are often rep-

resented like graphs called precision/recall curves. In

these decreasing curves, the precision is represented

in terms of recall values. Ideally precision is equal to

1 for all recall values (see Fig. 4. (b)).

Figure 3: Samples of the used heterogeneous image

database.

HETEROGENEOUS IMAGE RETRIEVAL SYSTEM BASED ON FEATURES EXTRACTION AND SVM

CLASSIFIER

141

1 2 3 4 5 6 7 8

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Different query images

Average precision

SVM

QUIP−tree

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Average recall

Average precision

SVM

QUIP−tree

a) b)

Figure 4: a) Average precision graph for SVM and QUIP-

tree using a combination of color, shape, and texture de-

scriptor and b) Precision/recall curves.

Since QUIP-tree is based on a computationof sim-

ilarity / dissimilarity, it is efﬁcient, only for small di-

mensions (only one or two same kind features). So,

in (Kachouri et al., 2007), QUIP-tree proved more

better than SVMs method in term of recognition rate

results according to different image request, because

the descriptors used for comparison are simple fea-

tures (color histogram and color average), which do

not permit to build a reliable model of SVMs, and im-

age database used for tests contains synthetic images,

where there are only a color variation between the dif-

ferent database images.

But, as soon as dimension is increased, by using

more features (in order to improve description), the

QUIP-tree retrieval accuracy decreases signiﬁcantly,

from where the favor of SVMs which in such case

pass to a larger dimension, using a kernel.

Indeed, by comparing the results of our retrieval

system based on SVM classiﬁer with those of QUIP-

tree, we ﬁnd that in all experimental results the SVM

retrieval accuracy is better than the QUIP-tree one (as

shown in Fig. 4).

Fig. 5 shows the ﬁrst twelve retrieval results for

an example of two query image, using our proposed

image retrieval system. The image displayed ﬁrst is

the query and ranking goes from left to right and top

to bottom.

Figure 5: Retrieval results for two query image using our

proposed image retrieval system.

5 CONCLUSIONS

In this paper, we have presented an heterogeneous im-

age retrieval system based on feature extraction and

SVM classiﬁer. To evaluate this system, several kinds

of features are used and improved, such as color,

shape, and texture features.

The improved features have allowed obtaining a

satisfactory image description. The relevance of this

description is tested through an SVM classiﬁer. A

comparison with QUIP-tree technique is carried out.

As we use a real heterogenous image database,

and several kinds of features to indexing images,

SVMs prove more better than QUIP-tree method in

term of retrieval accuracy and precision/recall curves.

Moreover, in QUIP-tree method, we calculate all

distances between each image request and the other

database images; whereas, with SVMs, once the

model is built, each image request will be just eval-

uated. So, for consequent database images the SVMs

answer is faster than the QUIP-tree one.

The obtained results show that the proposed sys-

tem provides good accuracy recognition.

REFERENCES

Bimbo, A. (2001). Visual information retrieval. Morgan

Kaufmann Publishers.

Burges, C. (1998). A tutorial on support vector machines

for pattern recognition. Data Min. Knowl. Discovery,

2(2):121–167.

Genevire, J., Maude, M., Vincent, O., and Marta, R. (2004).

Indexation multi-niveau pour la recherche globale et

partielle d’images par le contenu. In BDA.

Hu, M. (1962). Visual pattern recognition by moment

invariants. IEEE Transactions information Theory,

8:179–187.

Julesz, B., Gilbert, E., and Victor, J. (1978). Visual discrim-

ination of textures with identical third-order statistics.

Biol. Cybern., 31:137–140.

Kachouri, R., Djemal, K., Sellami-Masmoudi, D., Maaref,

H., and Derbel, N. (2007). On the heterogeneous im-

age retrieval with quip-tree. In SSD.

Manouvrier, M., Rukoz, M., and Jomier, G. (2005). Spa-

tial Databases : Technologies, Techniques and Trend,

Quadtree-Based Image Representation and Retrieval,

chapter 4, pages 81–106. IDEA Group Publishing, In-

formation Science Publishing and IRM Press.

Osuna, E., Freund, R., and Girosi, F. (1997). Training sup-

port vector machines: an application to face detection.

Schokopf, B., Burges, C., and Smola, A. (1999). Introduc-

tion to support vector learning, chapter 1. Advances

in Kernel Methods - Support Vector Learning.

Serrano, N., Savakisb, A., and Luoc, J. (2004). Improved

scene classiﬁcation using fﬁcient low-level features

and semantic cues. Pattern Recognition, 37:1773–

1784.

Smith, J. and Chang, S. (1996). Tools and techniques for

colour image retrieval. In IS T/SPIE Proceedings, vol-

ume 2670, pages 426–437, San Jose, CA, USA.

ICINCO 2008 - International Conference on Informatics in Control, Automation and Robotics

142