FEATURE VECTOR APPROXIMATION

BASED ON WAVELET NETWORK

Mouna Dammak, Mahmoud Mejdoub, Mourad Zaied and Chokri Ben Amar

Regim Research Group on Intelligent Machines, ENIS University, BP W, 3038 Sfax, Tunisia

Keywords:

Wavelet network, Approximation, Local feature, Bag of words.

Abstract:

Image classiﬁcation is an important task in computer vision. In this paper, we propose a new image representa-

tion based on local feature vectors approximation by the wavelet networks. To extract an approximation of the

feature vectors space, a Wavelet Network algorithm based on fast Wavelet is suggested. Then, the K-nearest

neighbor (K-NN) classiﬁcation algorithm is applied on the approximated feature vectors. The approximation

of the feature space ameliorates the feature vector classiﬁcation accuracy.

1 INTRODUCTION

Visual descriptors for image categorization generally

consist of either global or local features. The for-

mer ones represent global information of images. On

the contrary, local descriptors (Piro et al., 2010; Tao

et al., 2010; Li and Allinson, 2008; Mejdoub et al.,

2008; Mejdoub et al., 2009; Mejdoub and BenAmar,

2011) extract information corresponding to locations

of a speciﬁc image that are relevant to characterize

the visual content. Indeed, these techniques are able

to emphasize local patterns, which images of the same

category are expected to share. Research suggests that

a regular dense sampling of descriptors can provide a

better representation (Lazebnik et al., 2006; Nowak

et al., 2006) than the “interest” points. The bag of

words (Csurka et al., 2004) representation can be con-

sidered as the practical proof of the effectiveness of

visual feature points. This approach is applied to im-

ages in (Csurka et al., 2004; Lazebnik et al., 2006)

to extract a histogram of words from the image. The

essential characteristic of this representation is that it

dismisses any kind of information associated to the

arrangement of words.

Analyzing a signal from its corresponding graph

does not give us access to all the information it con-

tains. It is often necessary to transform it, i.e., to give

it another representation which clearly shows its fea-

tures. Fourier (Fourier, 1822), suggests that all func-

tions must be able to express themselves in a simple

way as a sum of sinus. As an advanced alternative to

the classical Fourier analysis, wavelets (Cohen et al.,

2001) have been successfully used for signal approx-

imation. The fundamental idea behind wavelets is to

process data at different scales or resolutions. In such

a way, wavelets provide a time-scale presentation of

a sequence of input signal (Yan and Gao, 2009). The

wavelet transform applies a multi-resolution analysis

to decompose a signal into the low-frequency coefﬁ-

cients and the high-frequency coefﬁcients. The for-

mer represent the original signal approximation and

the latter represent the detailed information of the

original signal.

Besides, the Wavelet Networks (WN) (Li et al.,

2003) is a powerful tool to approximate signals. In-

deed, it mixes the performances of the wavelet theory

in terms of localization and multi-resolution represen-

tation and the Neural Network in terms of classiﬁca-

tion. The keypoint of the wavelet networks lies in the

optimization of network weights that permits to ex-

tract an approximation of the original signal. In (Je-

mai et al., 2011), the authors introduce a new training

method for WN to assess this algorithm in the ﬁeld

of images classiﬁcation directly from pixel value im-

ages. In (Ejbali et al., 2010), the authors advance a

new approach for the approximation of acoustic units

for the task of the speech recognition.

In this paper, we propose a novel image catego-

rization approach based on the approximation of lo-

cal features by Wavelet Networks. Firstly, we ex-

tract local features based on SIFT (Lowe, 1999) and

SURF (Bay et al., 2006) descriptors and we represent

them using the BOW model based on spatial pyramid.

Secondly, we approximate the histogram of words by

wavelet networks in an attempt to obtain greater rep-

resentation efﬁciency of the histogram of words. Fi-

394

Dammak M., Mejdoub M., Zaied M. and Ben Amar C..

FEATURE VECTOR APPROXIMATION BASED ON WAVELET NETWORK.

DOI: 10.5220/0003776803940399

In Proceedings of the 4th International Conference on Agents and Artiﬁcial Intelligence (ICAART-2012), pages 394-399

ISBN: 978-989-8425-95-9

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

nally, k-NN algorithm is applied on the approximated

histogram of words for image categorization.

The rest of the paper is organized as follows:

Section 2 focuses on the theoretical concept of the

wavelet networks. Section 3 outlines the proposed ap-

proach of the extraction and the approximation of the

local descriptors. Some experimental results are pre-

sented in the ﬁnal section with the aim to illustrate the

effectiveness of the proposed categorization method.

2 WAVELET NETWORK

The concept of wavelet networks was proposed ﬁrstly

by Zhang and Benveniste (Zhang and Benveniste,

1992). The basic idea of wavelet networks is to com-

bine the localization property of wavelet decomposi-

tion and the optimization property of neural networks

learning. The multilayered networks allow the rep-

resentation of a nonlinear function by training while

comparing their inputs and their outputs. This train-

ing is made while representing a nonlinear function

by a combination of activation functions. The ad-

missible wavelet is used as an activation one. They

reached the result that the wavelet networks preserve

the property of universal approximation of the RBF

networks.

Wavelet analysis gives a representation of signals that

simultaneously shows the location in time and fre-

quency, thereby facilitating the physical characteris-

tics identiﬁcation of the signal source (Morlet et al.,

1982). This analysis uses a family of translate-dilated

functions ψ

a,b

constructed from a function ψ of

(ℜ), called mother wavelet ψ

a,b

(x) =

√



x−b



with a, b represent respectively the dilation and trans-

lation parameters. Discret Wavelet Transform DWT

is deﬁned as a set of wavelets are generated by consid-

ering only a sampled value of a and b parameters. For

analyzing a signal containing a

points (1 ≤ j ≤ m, j

represents the scale parameter) we use only the family

wavelets: ψ



−m

x −nb



with n = 1 . . . a

m−j

j,n

= a

−

∑

f (x)ψ



−j

x −nb



(1)

For the particular case, when a

= 2 and b

= 1, the

sampling is called dyadic.

The multiresolution analysis consists on, ﬁrstly, a

scaling function φ (x) ∈L

(ℜ), which constitutes an

orthonormal basis by varying its position on a given

scale j. The functions of every scale generate an ap-

proximation of a given signal f to analyze. Secondly,

additional functions, i.e. wavelet functions, are then

used to encode the difference in information between

adjacent approximations (Meyer, 1990). If we have

a ﬁnished number N

of wavelets ψ

a,b

obtained from

the mother wavelet and a ﬁnished number N

of scales

a,b

obtained from the mother scaling function φ, Eq.

2 will be considered as an approximation of the in-

verse transform:

f (x) '

∑

j=1

(x) +

∑

k=1

(x) (2)

This equation establishes the idea of Wavelet Net-

works.

The model, introduced by Zhang et Benveniste

(Zhang and Benveniste, 1992), is composed of three

layers. Input of this is considered a set of parameters

that describe signal coordinate positions to analy-

sis. So the entries are not actual data but only val-

ues describing speciﬁc positions of the analyzed sig-

nal. The hidden layer contains a set of neurons, each

neuron composed of a translated and dilated wavelet.

The output layer contains one neuron which sums

the outputs of the hidden layer by weighted connec-

tions weights a

and d

that represent respectively the

wavelets and the scaling functions coefﬁcients 1. Fig-

ure 1 displays the structure of wavelet network of the

second model.

















































)1( kit



),,,(

21 n

yyy 

K : The number of neurons

T : Input positions values

W: Connections weights of Network

B: Values of translations

A : Values of dilatation

Y: Value of the approximation

Figure 1: The model of Wavelet Network architecture.

3 OVERVIEW OF THE

PROPOSED APPROACH

We suggest in this paper a solution of image clas-

siﬁcation based on local feature and approximation

vector by wavelet networks. The solution which we

present proceeds, at ﬁrst, by local feature image rep-

resentation. Second, an image is represented based on

a BoW model. Third, the approximation by wavelet

networks is used on this image. In classiﬁcation stage

(on-line), the same procedure is carried out to extract

the approximate test image signature. Finally, to de-

cide upon the image category, we search for the test

image the k-similar images in the training data and

we apply the majority vote. This search is based on

the computation of the distances between the approx-

imate feature vector of the test image and all the ap-

FEATURE VECTOR APPROXIMATION BASED ON WAVELET NETWORK

395

proximate feature vectors of the training images. The

pipeline of all these stages is illustrated in ﬁgure 2.

Plan

Category 1 Category n

…..

FEATURE EXTRACTION AND REPRESENTATION

…

Category ?

…

…..

…

Feature

extraction

Detection of

local regions

K-PPV

FEATURE EXTRACTION AND REPRESENTATION

…

Code book

…..

Bag of Words

and pyramid

spatial

…

Input

…..

Approximation

by WN

…..

mountains

Test

Training

Figure 2: Complete Framework of image classiﬁcation by

BOW and WN.

3.1 Extraction of Local Features

In this paper, we focus on local feature vectors ex-

traction. Our system makes use of three types of low-

level features: SIFT features (Lowe, 1999), SIFT-

HSV (Bosch et al., 2008) features and SURF (Bay

et al., 2006) features. We use a dense sampling to

extract patches at a regular grid in the image, and at

multiple scales. Given the feature space, the visual

vocabulary is built through the clustering of low-level

feature vectors using k-means based on the acceler-

ated ELKAN (Elkan, 2003) algorithm for optimiza-

tion. The clusters deﬁne the visual vocabulary and

then the image is characterized with the number of

occurrences of each visual word. Similar to (Lazeb-

nik et al., 2006) we use a spatial pyramid of 1x1, 2x2,

and 4x4 regions in our experiments for all visual fea-

tures.

Figure 3: Hybrid descriptor.

The steps needed to calculate the signa-

ture of an image for a given local descriptor

(D ∈

{

SIFT, SURF

}

) are :

1. Extraction of local descriptors based on the de-

scriptor D.

2. Translation of each local descriptor in a histogram

of visual words using the technique of bag of

words on the descriptors obtained in step (1).

3. Division of the image into bands using the spatial

pyramid technique.

4. For each band obtained from the spatial pyramid.

extraction of a visual histogram of keywords by

combining visual keyword histograms obtained in

step (2).

5. Combining histograms obtained for each band to

derive a histogram H

associated with local de-

scriptor D.

6. Combining histograms obtained for SURF de-

scriptor and SIFT descriptor.

3.2 Wavelet Network

The Wavelet Network is used to approximate each lo-

cal feature vectors computed as indicated in the sec-

tion 3.1. We denote by V the local feature vector ex-

tracted from a given image database. To deﬁne the

Wavelet Network, we ﬁrst take a family of n wavelets

Ψ = (ψ

, . . . , ψ

) with different parameters of scaling

and translation (generated by distributing the parame-

ters on a dyadic grid) that can be chosen arbitrarily at

this point. The architecture of the wavelet network is

exactly speciﬁed by the number of particular wavelets

required. In this work, we build the candidate hid-

den neurons representing a library of wavelets (scal-

ing functions), and then we select the hidden neurons

in order to form the optimal structures.

3.2.1 Wavelet Network Initialization

The G library of wavelet and scaling function candi-

dates to join the network are the results of a sampling

on a dyadic grid of the parameters of expansion and

translation. This family of functions is:

(

G =



−m

x −nb



, φ



−j

x −nb

o

with m ∈S

, n ∈ S

b(m)

)

(3)

In Eq.3, a

, b

> 0 are two scalar constants deﬁn-

ing the discretization step sizes for dilation and trans-

lation. a

is typically dyadic. S

and S

are ﬁnite sets

related to the size of the data input domain D. The ﬁrst

derivate from the wavelet beta (Amar et al., 2005) is

used as mother wavelet, given by equation 4:











B(x) =



x−x

−x





x−x

−x



i f x ∈ [x

, x

]

0 otherwise

wherep, q, x0, x1 ∈ R and x0 < x1

and x

+qx

p+q











(4)

We give below the used steps to initialize the WN:

Step 1: Start the learning by preparing a library of

candidate wavelets and scaling functions.

Step 2: Calculation of the weights corresponding to

all the functions of activation of the library.

ICAART 2012 - International Conference on Agents and Artificial Intelligence

396

Step 3: Impose a stop criterion; an error E between

the feature vector V and the exit of the network.

Step 4: Initialize the output network to

V = 0.

3.2.2 Learning Wavelet Network

The training model is built according to the following

steps:

Step 1: Calculate the weights γ



ou β



respec-

tively associated with the wavelets and scales func-

tions, of the library already created in section 3.2.1.

Step 2: Calculate the contribution of all the functions

of activation in the library (γ

) for the rebuilding of

the feature vector V . V can be written :

V =

∑

l=1

(5)

Step 3: Select the function g

of activation which pro-

vided the best approximation of the feature vector V .

Step 4: Recruit the best function with the hidden

layer of the wavelet network. The total approxima-

tion, which is the accumulation of all the approxima-

tions obtained with each iteration, is calculated by:

V =

V + γ

Step 5: Compute the difference between the original

signal and that approximated one is calculated. If the

error E, is reached, then it is the end of the phase of

training. If not, we move back to Step 3.

4 EXPERIMENTAL RESULTS

4.1 Presentation of the Datasets

We have carried out experiments on the WANG

database (Wang et al., 2001) which contains 1000 im-

ages. The database contains ten clusters representing

semantic generalized meaningful categories.Besides,

we have carried our experiments on the OT dataset

(Oliva and Torralba, 2001) that contains 2688 color

images , divided in 8 categories. For evaluation we

divide the images randomly into 500 training and

500 test images for wang dataset and 800 training

and 1688 test images for OT. Experiments were per-

formed on a personal computer with conﬁgurations:

Intel Core2 Duo (2 GHZ), 4GO. In our experiments,

image patches are extracted on regular grids at 4 dif-

ferent scales. SIFT, SIFT-HSV and SURF descrip-

tors are computed at every point of the regular grid.

The dense features are vector quantized by the bag of

words model with N-cluster = 120 for Wang database

and N-cluster = 300 for OT database. We tested the

performance of our proposed image retrieval method

taking into account the retrieval process accuracy. For

the accuracy evaluation, we use the precision.

4.2 Impact of the Space Approximation

Feature

Several experiments are made using the Wang and OT

databases. We observe in Figure 5 and 6 that the im-

age signature approximated by WN for the various

types of descriptors (SIFT, SIFT-HSV, SURF and Hy-

brid) give a better performance in term of precision of

retrieval than the original signature (signature not ap-

proximated by WN).

(a) SIFT.

(b) SIFT-HSV.

(d) Hybrid.

Figure 4: Comparison between original and approximate

feature vectors (Wang).

FEATURE VECTOR APPROXIMATION BASED ON WAVELET NETWORK

397

(a) SIFT.

(b) SIFT-HSV.

(d) Hybrid.

Figure 5: Comparison between original and approximate

feature vectors (OT).

4.3 Comparison of the Proposed

Classiﬁer to Popular Methods in the

Literature

We will now compare the results and using differ-

ent machine learning algorithms as Support Vector

Machines (SVMs), Hidden Markov Model (HMM),

K-Nearest Neighbor (KNN), or Universal Nearest

Neighbors rule (UNN) could be applied for catego-

rization.

The experimental results reported here see (1)

seem very promising and the proposed approach out-

Table 1: Comparison with some State-of-the-Art methods.

(a) Wang database.

Classiﬁcation Model Classiﬁcation rate

(Jemai et al., 2011) 71,2

(Jemai et al., 2010) 71,4

(Mouret et al., 2009) 70,60

Our approach 74,50

(b) OT database.

Classiﬁcation Model Classiﬁcation rate

(Piro et al., 2010)-KNN 73,8

(Piro et al., 2010)-UNN 75,70

(Oliva and Torralba, 2001) 83,70

(Horster et al., 2008) 79

Our approach 80,20

performs the other methods.

4.4 Comparison between Raw Pixels

and Feature Vectors Representation

In (Jemai et al., 2011), the authors reshape the size

of the image to 90*90 pixels. The wavelets network

is directly applied to the pixel values of image.Our

work allows on the one hand reducing the dimension

of the feature vector of the image, and on the other

hand generating a compact representation of the im-

age. And consequently, we obtained a faster comput-

ing time and a more precise categorization rate. Table

2 compares CPU-times spent to compute the different

stages of the learning algorithm for one image and

rate categorization.

Table 2: Classiﬁcation rate and Time consumption for the

processing steps on Wang.

BWNN (Jemai

et al.,

2011)

Our ap-

proach

Time consumption

Training 20mn 0.1176s 0,010531s

Classiﬁcation 2mn 0.0627s 0.038517s

Classiﬁcation rate

Rate classi-

ﬁcation

60,2 71,4 74,5

5 CONCLUSIONS

A new indexing method was proposed in this paper.

This method is based on a combined local feature ex-

traction and approximated signal by Wavelet Network

is proposed. This method is applied to the image clas-

siﬁcation ﬁelds. Based on the experiment results, the

ICAART 2012 - International Conference on Agents and Artificial Intelligence

398

proposed approach exhibits high classiﬁcation rates

and small computing times.

REFERENCES

Amar, C. B., Zaied, M., and Alimi, M. A. (2005). Beta

wavelets. synthesis and application to lossy image

compression. Advances in Engineering Software,

36:459–474.

Bay, H., Tuytelaars, T., and Gool, L. V. (2006). Surf:

Speeded up robust features. In 9th European Con-

ference on Computer Vision.

Bosch, A., Zisserman, A., and Munoz, X. (2008). Scene

classiﬁcation using a hybrid generative/discriminative

approach. IEEE Transactions on Pattern Analysis and

Machine Intelligence, 30(4):712–727.

Cohen, A., Dahmen, W., Daubechies, I., and Devore,

R. (2001). Tree approximation and optimal encod-

ing. Applied and Computational Harmonic Analysis,

11(2):192–226.

Csurka, G., Dance, C. R., Fan, L., Willamowski, J., and

Bray, C. (2004). Visual categorization with bags of

keypoints. In Workshop on Statistical Learning in

Computer Vision, ECCV.

Ejbali, R., Zaied, M., and Amar, C. B. (2010). Wavelet net-

work for recognition system of arabic word. Interna-

tional Journal of Speech Technology, 13(3):163–174.

Elkan, C. (2003). Using the triangle inequality to accelerate

k-means. ICML, pages 147–153.

Fourier, J. B. J. (1822). Thorie analytique de la chaleur.

Horster, E., Greif, T., Lienhart, R., and Slaney, M. (2008).

Comparing local feature descriptors in plsa-based im-

age models. In DAGM-Symposium, pages 446–455.

Jemai, O., Zaied, M., Amar, C. B., and Alimi, M. A.

(2010). Fbwn: An architecture of fast beta wavelet

networks for image classiﬁcation. In International

Joint Conference on Neural Networks (IJCNN), pages

1–8, Barcelona.

Jemai, O., Zaied, M., Amar, C. B., and Alimi, M. A. (2011).

Pyramidal hybrid approach: Wavelet network with ols

algorithm-based image classiﬁcation. International

Journal of Wavelets, Multiresolution and Information

Processing (IJWMIP), 9(1):111–130.

Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond

bags of features: Spatial pyramid matching for rec-

ognizing natural scene categories. In Proceedings of

the IEEE Conference on Computer Vision and Pattern

Recognition, volume 2, pages 2169–2178, New York.

Li, C., Liao, X., and Yu, J. (2003). Complex-valued wavelet

network. Journal of Computer and System Sciences,

67(3):623–632.

Li, J. and Allinson, N. M. (2008). A comprehensive review

of current local features for computer vision. Neuro-

computing, 71:1771–1787.

Lowe, D. (1999). Object recognition from local scale-

invariant features. In International Conference on

Computer Vision, pages 1150–1157, , Corfu, Greece.

Mejdoub, M. and BenAmar, C. (2011). Hierarchical cat-

egorization tree based on a combined unsupervised-

supervised classiﬁcation. In Seventh International

Conference on Innovations in Information Technol-

ogy.

Mejdoub, M., Fonteles, L., BenAmar, C., and Antonini, M.

(2008). Fast indexing method for image retrieval using

tree-structured lattices. In Content based multimedia

indexing CBMI.

Mejdoub, M., Fonteles, L., BenAmar, C., and Antonini,

M. (2009). Embedded lattices tree: An efﬁcient in-

dexing scheme for content based retrieval on image

databases. Journal of Visual Communication and Im-

age Representation, Elsevier.

Meyer, Y. (1990). Ondelettes et oprateurs I. Actual-

its Mathmatiques Current Mathematical Topics. Her-

mann, Paris.

Morlet, J., Arens, G., Fourgeau, E., and Giard, D. (1982).

Wave propagation and sampling theory. Geophysics,

47:203–236.

Mouret, M., Solnon, C., and Wolf, C. (2009). Classiﬁca-

tion of images based on hidden markov models. In

IEEE Workshop on Content Based Multimedia Index-

ing, pages 169–174.

Nowak, E., Jurie, F., and Triggs, B. (2006). Sampling strate-

gies for bag-of-features image classiﬁcation. In Euro-

pean Conference on Computer Vision. Springer.

Oliva, A. and Torralba, A. (2001). Modeling the shape

of the scene: A holistic representation of the spatial

envelope. International Journal of Computer Vision,

42(3):145–175.

Piro, P., R.Nock, Nielsen, F., and Barlaud, M. (2010).

Boosting k-nn for categorization of natural scenes.

CoRR.

Tao, Y., Skubic, M., Han, T., Xia, Y., and Chi, X. (2010).

Performance evaluation of sift-based descriptors for

object recognition. In Prooceeding of The Interna-

tional MultiConference of Engineers and Computer

Scientists.

Wang, J. Z., Li, J., and Wiederhold, G. (2001). Simplicity

: Semantics-sensitive integrated matching for picture

libraries. IEEE Transactions on Pattern Analysis and

Machine Intelligence, 23(9):947–963.

Yan, R. and Gao, R. (2009). Base wavelet selection for bear-

ing vibration signal analysis. International Journal of

Wavelets, Multi-resolution, and Information Process-

ing, 7(4):411–426.

Zhang, Q. and Benveniste, A. (1992). Wavelet networks.

IEEE Transactions on Neural Networks, 3(6):889–

898.

FEATURE VECTOR APPROXIMATION BASED ON WAVELET NETWORK

399