IMAGE RETRIEVAL WITH BINARY HAMMING DISTANCE

ome Landr

CReSTIC, Universit

e de Reims-Champagne-Ardenne

I.U.T., 9 rue de Qu

ebec, 10 000, Troyes, France

eric Truchetet

Le2i, Universit

e de Bourgogne

I.U.T., 12 rue de la Fonderie, 71 200, Le Creusot, France

Keywords:

Content-based image retrieval, binary signature, multiresolution analysis.

Abstract:

This article proposes a content-based indexing and retrieval (CBIR) system based on query-by-visual-example

using hierarchical binary signatures. Binary signatures are obtained through a described binarization process of

classical features (color, texture and shape). The Hamming binary distance (based on binary XOR operation)

is used for computing distances. This technique was tested on a real natural image collection containing 10 000

images and on a virtual collection of one million images. Results are very good both in terms of speed and

accuracy allowing near real-time image retrieval in very large image collections.

1 INTRODUCTION

Searching in large image collections is a challenging

task for computer vision researchers. Internet and re-

cent imaging technologies have facilitated the avail-

ability of private and public image collections leading

to a need for efﬁcient image searching tools.

Content-based image retrieval (CBIR) consists in

working with images only without any other informa-

tion. Images are too big to be used directly for index-

ing and retrieval, features extraction gives a feature

vector per image which is a reduced representation of

the image visual content.

Classical image features are mainly divided into

three different families: color, texture and shape.

In the proposed method, a binary feature extraction

method gives a binary representation of feature vec-

tors: binary signatures.

To compute distances between images, Hamming

distance based on logical exclusive-or (XOR) func-

tion is used because it ensures great performances in

terms of speed and accuracy.

This article is organized as follows. Section 2 de-

scribes related work on binary signatures for content-

based image retrieval. In section 3, the proposed ar-

chitecture is explained in depth. Section 4 deﬁnes the

binary metric for comparing binary signatures. Ex-

perimental results are given in section 5. In section 6,

a conclusion and several tracks to explore for future

work are presented.

2 RELATED WORK

During the last decade, many image retrieval papers

have been published. Getting fast and efﬁcient CBIR

systems is an interesting challenge because even with

last generation processors, researchers have often to

choose between speed and accuracy. To ensure op-

timized performances, distance computation must be

rapid (Jacobs et al., 1995).

Several binary image retrieval techniques are

based on binary coding of feature vectors. Color-

based image retrieval with binary signatures (Nasci-

mento and Chitkara, 2002) gave good results. Binary

histograms have also been proposed (Kunttu et al.,

2003). These methods give good results but work

only with one family of feature: color.

The Hamming distance evaluates the number of

bits that differ from two binary vectors. Fuzzy Ham-

ming distance (Ionescu and Ralescu, 2005) has been

published to solve Hamming distance limitations on

real numbers. This distance is not used in this work

because only binary signatures are computed, not real

numbers.

237

Landré J. and Truchetet F. (2007).

IMAGE RETRIEVAL WITH BINARY HAMMING DISTANCE.

In Proceedings of the Second International Conference on Computer Vision Theory and Applications - IU/MTSV, pages 237-240

 SciTePress

In our approach, users can work with color, tex-

ture and shape hierarchically to reﬁne retrieval. These

three families of features are not mixed together be-

cause they are independant. For example if a user

wants to ﬁnd ”red cars” in a collection, color and

shape have to be used. Texture will not be useful in

this case. When you work with only one feature vec-

tor where the three features are mixed, useless fea-

tures inﬂuence the ﬁnal decision while they are not

supposed to.

More and more methods are based on ofﬂine clas-

siﬁcation of feature vectors to build a visual search

tree to browse the collection online. In our system,

a query-by-visual-example method (Boujemaa et al.,

2003) is used because time computing limitation is

not really important in our retrieval process due to the

high speed of binary computation.

3 PROPOSED ARCHITECTURE

Our system is based on binarization of classical fea-

tures. There are two steps in the proposed system:

ofﬂine and online. Let’s consider an image collection

C containing N images noted I

where i = 1..N.

In the ofﬂine step (no user connected to the CBIR

system), each image I

of the collection C is trans-

formed from RGB to Lab colorspace. Lab colorspace

was chosen because distances computed in this space

correspond to real perception of distances between

colors. Then a multiresolution analysis (Calderbank

et al., 1998) is computed at three resolution levels.

Several classical features are extracted in color, tex-

ture and shape feature vectors. The binarization pro-

cess is described further and leads to three binary sig-

nature per image: s

, s

and s

The size of our signatures is 32-bits so that XOR

operations can be processed into the microprocessor

internal registers. Each bit in s

, s

and s

represents

a property which is true (1) or false (0). Thus each

signature is a set of binary properties for the image I

Figure 1 presents our query-by-example architec-

ture. The binary extracted signature of the request im-

age I

is compared to every image I

of the collection

C and results are displayed on the user screen, sorted

by increasing distance.

Features are organized into a 32-bits binary sig-

nature vector. For an image I

, there are three binary

signature vectors corresponding to color (s

), texture

) and shape (s

). Bits in signatures represent the

fact that the considered image satisﬁes a certain prop-

erty or not.

• Color: Color properties are based on ”a” and ”b”

maps values of ”Lab” colorspace. There are 32

Figure 1: Architecture of the proposed system.

properties tested in every 32-bits color binary sig-

natures. For instance, the ﬁrst bit is to check prop-

erty: — Does the mean value of ”a” colormap at

the coarser resolution is greater than 64 ? —. A

value of 1 indicates this property is satisﬁed for

this image, a value of 0 means it is not satisﬁed.

So by associating several properties, our signature

contains a checklist of color properties.

• Texture: Binary properties for texture are mainly

based on the study of wavelets energy (square

value of each coefﬁcient) through the three differ-

ent levels of resolution. For instance, the ﬁrst bit

is to check property: — Does the mean energy of

”L” colormap for the coarser resolution is greater

than 128 ? —.

• Shape: Shape properties are extracted from image

contours of the ”L” colormap (by a laplacian edge

detector). For example, a typical property is: — Is

there any continuous contour of the object longer

than 30 pixels ? —.

So the entire process of binarization consists in

transforming real world questions into binary an-

swers. The underlying problem is the choice of prop-

erties.

Of course the list of properties is not exhaustive

and any kind of question whose answer is yes (1) or

not (0) is a potential binary property to use in our sys-

tem. Once binary properties have been chosen, a sim-

ilarity (or dissimilarity) metric must be used to com-

pute distances between images, i.e. between signature

vectors.

4 SIMILARITY COMPUTING

In order to evaluate distances between request image

and collection images I

, a metric must be deﬁned.

We need a measurement method to tell how two bi-

nary signatures s

(request) and s

image in the

collection) are similar (bit per bit). Therefore we want

a similarity measure where the distance value will be

the number of similar bits in the considered signa-

VISAPP 2007 - International Conference on Computer Vision Theory and Applications

238

tures. Next table gives similarity truth table for the

distance we want to deﬁne.

Considering the n

bit of s

and s

, we want to know

if they are similar or not:

[n] s

[n] d(s

[n], s

[n]) similarity

0 0 0 similar

0 1 1 not similar

1 0 1 not similar

1 1 0 similar

This truth table for needed similarity lead to a def-

inition of similarity based on the XOR binary opera-

tor. The distance is computed as the number of bits

whose value is 1 in the XOR result of the two given

binary signatures. It is the deﬁnition of the Hamming

distance.

For instance, let’s consider two 8-bits signature

vectors s

and s

. The distance between them will

be d

= I(s

⊕ s

) where ⊕ is the XOR operator and

I is the function that computes number of bits whose

value is 1 in the binary XOR result.

Theorem 1 (Hamming) d

is a metric distance on

[0, 1]

By deﬁnition, the minimal and maximal dis-

tances d

between two binary signatures in a k bits

space([0, 1]

) are respectively 0 and k. Once the dis-

tance metric is deﬁned, several experiments are pos-

sible to test it in real situation.

5 EXPERIMENTS

Several results using natural image collection are pre-

sented. This very well-known image collection con-

tains 10 000 images. Experiments were performed on

a Pentium 4 2GHz with 512 MB RAM laptop com-

puter running Linux Fedora Core 5.

User interface was built upon web pages served by

an Apache web server, with PHP for dynamic pages

and MySQL for storage purpose. C programs using

Intel IPP and OpenCV libraries were used for com-

puting distances.

In order to measure efﬁciency of the proposed

method, two parameters were studied: speed and ac-

curacy.

Speed has been evaluated on the natural image

database but also on a virtual set of one million ran-

dom binary signature vectors to show real-time possi-

bilities of the method. Computing times are given in

seconds.

An image is represented by three 32-bits (4-bytes)

signatures, s

, s

and s

. The total image collection

(N images) is represented by three arrays of unsigned

int values whose length is N. So the total amount of

memory needed to store our binary signature is 3 ×

4 × N = 12 × N bytes.

For the 10 000 images of natural image collection,

the total amount of memory to store our signatures is

12× 10 000 = 120 000 bytes. Computing time for dis-

tance is less than 10

−3

second. So for a given request,

distance d

is computed real-time.

For the one million images virtual collection, the

total amount of memory used is 12 × 10

= 12 Mb

which is a small part of actual computer memory.

Table 1: Computing time for retrieval.

Collection (images) d

comp. time (sec.)

Natural image (10 000) < 10

−3

Virtual (10

) ' 0.59

Results on table 1 show the computing time is very

low leading to on-the-ﬂy distance computing and to a

real-time request-by-example retrieval system. Speed

does not mean anything without accuracy.

Accuracy results are based on precision/recall

plots for natural image collection.

Several request images were presented to the sys-

tem. The result images for each request were sorted

by increasing distance from the request leading to

a precision and recall computation. This test pro-

cess was applied on the full feature vector (containing

color, texture and shape features) and on a hierachy of

features (color then shape vectors). In the ﬁrst case,

only one distance had to be computed, in the second,

one distance is computed for color features and an-

other is computed for shape features.

Results are proposed on ﬁgure 2. This graph

is the precision/recall graph based on a mean of

twenty objects of the natural image collection. Re-

sults have been improved by using a hierarchy (color

then shape) of binary signatures instead of one mixed

(color+texture+shape) binary signature.

Examples about the advantage of using hierarchi-

cal features are proposed on ﬁgure 3. In this ﬁgure,

using mixed features (color+texture+shape) gives bad

results (false detection) compared to using color ﬁrst

then shape.

A comparison between mixed features and hierar-

chical features is shown on ﬁgure 3. The ﬁrst image is

the request image. If mixed feature vectors are used

(a), many bad images are retrieved. If hierarchical

feature vectors are used (b), the result is better with

less mistakes than the previous case.

Two good examples of retrieval success with hi-

erarchical feature vectors are given on ﬁgure 4. The

IMAGE RETRIEVAL WITH BINARY HAMMING DISTANCE

239