Empirical Mode Decomposition-based Face Recognition System

Esteve Gallego-Jutglà

, Karmele Lopez-de-Ipiña

, Pere Martí-Puig

and Jordi Solé-Casals

Digital Technologies Group, University of Vic, Sagrada Família 7,08500 Vic, Barcelona, Spain

System Engineering and Automation Department, University of the Basque Country, Donostia 20008, Spain,

Keywords: Face Recognition, Multivariate Empirical Mode Decomposition (mEMD), Neural Networks, Biometrics.

Abstract: In this work we explore the multivariate empirical mode decomposition combined with a Neural Network

classifier as technique for face recognition tasks. Images are simultaneously decomposed by means of EMD

and then the distance between the modes of the image and the modes of the representative image of each

class is calculated using three different distance measures. Then, a neural network is trained using 10- fold

cross validation in order to derive a classifier. Preliminary results (over 98 % of classification rate) are

satisfactory and will justify a deep investigation on how to apply mEMD for face recognition.

1 INTRODUCTION

During these last years, several security laws have

been proposed in order to increase control access to

different places, such as airports, train stations and

underground stations, border crossings between

countries, governmental buildings, etc. To control

these environments, different biometric systems are

being used.

One of those systems is face recognition. This

system has become one of the biggest challenges in

technological development, due to the relevance that

these applications have achieved. Different fields

have benefited from the use of face recognition, such

as continuous monitoring, access security,

telecommunication systems, etc. (Woodward et al.,

2003); (Xiao, 2007).

Face recognition has been quickly developed,

and it seems that there is not a limit for the capacity

of this system, because the data entry of these

systems can be really big. This is why researchers

try to improve the existent systems introducing new

characteristics and new working lines that can be

valid for the developing of these kinds of systems

(Iancu et al., 2007).

The most important characteristic of face

recognition is that it is a non invasive method. That

becomes an advantage compared with other systems,

which require the guide collaboration of the subjects

that forms the database. Another important

characteristic is the simplicity of the capture system,

where basically only illumination must be controlled

in order to obtain a good image.

This paper is a continuation of a previous work

(Gallego-Jutglà and Solé-Casals, 2012) where we

explored a promising strategy for face recognition

using a new decomposition technique, the

multivariate empirical mode decomposition (EMD).

Now we combine the previous work (distance

measures calculated over the modes of pairs of

images) using a neural network classifier in order to

enhance the performance of the classifier. This

nonlinear classification system improves the final

results, increasing the classification rate up to a

98.25%.

This paper is organized as follows: After this

introduction, the used data base is presented in

section 2. EMD technique is presented in Section 3,

and its extension for multivariate signals is presented

in Section 4. Section 5 is devoted to the neural

network description, where section 6 details the

image processing methodology. Experimental results

and discussion are shown in Section 7. Finally,

conclusions are presented in Section 8.

2 DATA BASE

The used data base contains ten different images of

forty subjects, which represents a total of four

hundred different images. Images were taken with a

dark background, with frontal position and with

different orientation of those. The whole dataset is

presented in Figure 1.

445

Gallego-Jutglà E., López-de-Ipiña K., Martí-Puig P. and Solé-Casals J. (2013).

Empirical Mode Decomposition-based Face Recognition System.

In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing, pages 445-450

DOI: 10.5220/0004359104450450

 SciTePress

Figure 1: Data base ORL (Olivetti Research Laboratory).

This data base presents images with different

gestural positions, such as eyes open eyes close,

smile non-smile, glasses non-glasses and

illumination variations. The illumination variations

are not defined. All images are grey scale of 256

values, with an original size of 92 x 112 pixels.

3 EMPIRICAL MODE

DECOMPOSITION (EMD)

EMD algorithm is a method designed for multiscale

decomposition and time –frequency analysis, which

can analyze nonlinear and non-stationary data

(Huang et al., 1998).

With this method any time-series data set can be

decomposed into a finite and often small number of

Intrinsic Mode Functions (IMFs). These IMFs are

defined so as to exhibit locality in time and to

represent a single oscillatory mode. Each IMF

satisfies two basic conditions: (i) the number of

zero-crossings and the number of extrema must be

the same or differ at most by one in the whole

dataset, and (ii) at any point, the mean value of the

envelope defined by the local maxima and the

envelope defined by the local minima is zero (Huang

et al., 1998).

The EMD algorithm (Huang et al., 1998) for a

signal 







can be summarized as follows.

(i) Determine the local maxima and minima of









;

(ii) Generate the upper and lower signal envelope

by connecting those local maxima and minima

respectively by an interpolation method;

(iii) Determine the local mean









, by averaging

the upper and lower signal envelope;

(iv) Subtract the local mean from the data: 





























(v) If 









 obeys the stopping criteria, then we

define

















as an IMF, otherwise set



















and repeat the process from step

(i).

Then, the empirical mode decomposition of a

signal







can be written as:





 IMF







ε











(1)

Where n is the number of extracted IMFs, and the

final residue ε







is the mean trend or a constant.

4 MULTIVARIATE EMPIRICAL

MODE DECOMPOSITION

(MEMD)

EMD has achieved optimal results in data processing

(Diez et al., 2009); (Molla et al., 2010). However,

this method presents several shortcomings in

multichannel datasets. The IMFs from different time

series do not necessarily correspond to the same

frequency, and different time series may end up

having a different number of IMFs. For

computational purpose, it is difficult to match the

different obtained IMFs from different channels

(Mutlu and Aviyente, 2011).

To solve these shortcomings, an extension of

EMD to mEMD is required. In this approach the

local mean is computed by tanking an average of

upper and lower envelopes, which in turn are

obtained by interpolating between the local maxima

and minima. However, in general, for multivariate

signals, the local maxima and minima may not be

defined directly. To deal with these problems

multiple n-dimensional envelopes are generated by

taking signal projections along different direction in

n-dimensional spaces (Rehman and Mandic, 2010).

mEMD is the technique used in this paper to

compute all the decompositions.

The algorithm (Rehman and Mandic, 2010) can

be summarized as follows.

(i) Choose a suitable point set for sampling on an

BIOSIGNALS2013-InternationalConferenceonBio-inspiredSystemsandSignalProcessing

446



1



sphere (this



1



 sphere resides in

an  dimensional Euclidean coordinate system).

(ii) Calculate the projection, p















, of the

input signal v









along the direction vector,





for all k giving p















(iii) Find the time instants t







corresponding to the

maxima of the set of projected

signalsp















(iv) Interpolate t







,vt







 to obtain multivariate

envelope curvese















(v) For a set of K direction vectors, the mean of the

envelope curves is calculated as 









⁄∑













(vi) Extract the detail 







using 

























. If the detail 







fulfills the stopping

criteria for a multivariate IMF, apply the above

procedure to















, otherwise apply it to









Then, the mEMD of a signal x







can be written as

detailed in equation 1.

5 NEURAL NETWORK

In recent years several classification systems have

been implemented using different techniques, such

as Neural Networks.

The widely used Neural Networks techniques are

very well known in pattern recognition applications.

An artificial neural network (ANN) is a

mathematical model that tries to simulate the

structure and/or functional aspects of biological

neural networks. It consists of an interconnected

group of artificial neurons and processes information

using a connectionist approach to computation. In

most cases an ANN is an adaptive system that

changes its structure based on external or internal

information that flows through the network during

the learning phase.

Neural networks are non-linear statistical data

modelling tools. They can be used to model complex

relationships between inputs and outputs or to find

patterns in data.

One of the simplest ANN is the so called

perceptron that consist of a simple layer that

establishes its correspondence with a rule of

discrimination between classes based on the linear

discriminator. However, it is possible to define

discriminations for non-linearly separable classes

using multilayer perceptrons (MLP).

The Multilayer Perceptron (Multilayer

Perceptron, MLP), also known as Backpropagation

Net (BPN), is one of the best known and used

artificial neural network model as pattern classifiers

and functions approximators (Lippman, 1987),

(Freeman and Skapura, 1991). It belongs to the so-

called feedforward networks class, and its topology

is composed by different fully interconnected layers

of neurons, where the information always flows

from the input layer, whose only role is to send input

data to the rest of the network, toward the output

layer, crossing all the existing layers (called hidden

layers) between the input and output. Essentially the

inner layers are responsible for carrying out

information processing, extracting features of the

input data.

Although there are many variants, usually each

neuron in one layer has directed connections to the

neurons of the subsequent layer but there is no

connection or interaction between neurons on the

same layer (Bishop, 1995, Hush and Horne, 1993).

6 IMAGE PROCESSING

The proposed procedure is detailed in Figure 2. The

system works as follow:

(i) The first 5 images are kept as representative for

each class. The mean image of these 5 images is

obtained for each class. These images will be

named as R



∀1  i  N, where N is the total

number of classes.

(ii) The rest of the images will be used to be

classified as belonging to one of the classes.

(iii) For each new input image I to be classified,

mEMD decomposition between I and R



calculated, obtaining a total of N mEMD

decompositions:



 mEMD(R



, I) ∀1    

(2)

Each one of these D



decompositions is composed by

two sets (matrix) of IMFs, one set (matrix)

belonging to I and the other belonging to R



, and

each IMF have 986 points, where 986 is derived as

29*34 (unfolding an image to a vector, taking into

account that the original size of each image has

previously been reshaped to 29 x 34).

(iv) Then the distance between IMFs is calculated

for each D

, obtaining a vector of N values

corresponding to the distances between input

image I and each one of the classes.

EmpiricalModeDecomposition-basedFaceRecognitionSystem

447

Figure 2: Scheme of the proposed image processing procedure.

Concerning distance measures, we have explored

different possibilities. Considering two matrix A and

B, corresponding to the obtained two sets (D



) of

IMFs, we can use:

1. Correlation coefficient between matrices A

and B

2. Matrix scalar product, also known as the

normalized Frobenius inner product:







,







:

‖



‖



‖



‖



(3)

Where A:B is the the Frobenius inner product of the

matrices A and B, defined as A:B  traceA



B,

and

‖



‖



is the Frobenius norm defined as

‖









traceA



A, where

denotes the transpose of a

matrix.

3. Frobenius norm of the differenceAB:







,





‖





‖



(4)

(v) The input image I is associated to a class as a

function of some criteria on the distance. For

that, two different methods were used. The first

one consist in associate the image to the

corresponding class obtained by the minimum

distance, therefore the decomposition D

that

presented the lowest distance value is associated

to that class. This is the same strategy

previously used in (Gallego-Jutglà and Solé-

Casals, 2012), but taking into account that now

the size of the images is greater, hence the

results are improved due to this fact. The second

one is based on an ANN classifier, where the

computed distances are used as input vector of

the system and the class association is done asw

a nonlinear mapping between vector distances

and classes.

For the classification step with ANN, a Multi

Layer Perceptrom (MLP) with one hidden layer of

100 neurons has been used. For the training and

validation phases, we used  -fold cross-validation

with 10, in order to ensure solid results. In that

evaluation methodology the original sample set is

randomly divided into  subsets. Then, a single

subset is retained as the validation data for testing

the model, and the remaining 1 subsets are used

as training data. The cross-validation process is

repeated  times, with each of the  subsets used

exactly once as the validation data. The  obtained

results from the folds are then averaged in order to

produce a single estimation. The advantage of this

method is that all observations are used for both

training and validation, and each observation is used

for validation exactly once. This is the most robust

evaluation method because tries to overcome a

possible over-fitting. 10-fold cross-validation is

commonly used but in under-resourced condition the

leave-one-out cross-validation (LOOCV) could be

the best option. LOOCV uses a single observation

from the original set as the validation data, and the

remaining observations as the training data. This is

the same as a -fold cross-validation with  equal to

the number of observations in the original sample

set. LOOCV is computationally expensive because it

requires many repetitions of training but

successfully with very small data sets.

BIOSIGNALS2013-InternationalConferenceonBio-inspiredSystemsandSignalProcessing

448

Figure 3: Obtained classification rates (%) for the three

different distances using the first method: class association

only based on the criterion of minimum distance. Exact

experimental values are presented at the top of each bar.

7 EXPERIMENTS

AND DISCUSSION

Initially, as explained in section 6, each image is

resized to 29 x 34. The choice of this size is justified

in order to have the same number of parameters (986

pixels) as in (Travieso et al., 2007), giving us the

possibility to compare performances.

Applying the detailed procedure to the images,

and using the described three different distances

measures, the following results are obtained:

Figure 3 summarize the obtained classification

results only based on the criterion of the minimum

distance. 88.5% of accuracy was obtained with

correlation and matrix scalar product, and a 90%

was obtained with Frobenius norm.

Figure 4 summarize the obtained classification

results using the MLP previously described. In that

figure, the results for the three distances (correlation,

scalar product and Frobenius) and for the

combination of the three distances are presented. In

this last case, the input vector of the MLP was

obtained concatenating all the three vector distances.

Classification rate obtained with MLP was much

better than before for the three distances: 97.25% for

correlation, 97.5% for scalar product and 96.5% for

Frobenius norm. The combination of all the three

distances increased this result up to a 98.25%.

As can be seen in Figure 3, the best result for the

first method is obtained with the 3

proposed

distance measure, the Frobenius norm distance.

Figure 4: Obtained classification rates (%) for the three

different distances and the combination of all of them,

using the second method: an ANN as a classifier. Exact

experimental values are presented at the top of each bar.

Contrarily, when we use ANN this measure is the

one with the lowest classification rate, even if the

difference is very small. By combining all the three

measures instead of only one measure, we obtain the

best results for the system (98.25%).

Comparing this result with results obtained in

(Travieso et al., 2007) we can see that we obtain the

same performance (98%).

In (Travieso et al., 2007) a DCT or DWT

(Biorthonal 4.4 family) parameterization was used

combined, with an SVM classifier. Here, the new

mEMD technique is used to decompose the data set

and the vector of distance measures is used as the

input to a MLP. Since our system do not use any

kind of transformation (DCT, DWT or others), a

combination of both strategies could even improve

the results and can be a future work to explore.

In any case, the method also needs to be tested

with other databases in order to ensure its general

performance.

Finally, some improvement of the computational

time would be interesting, as the mEMD algorithm

is not fast.

8 CONCLUSIONS

This study proposes a new strategy for face

recognition systems, where a new technique is used,

mEMD, and distance measures are used as input

vectors to an ANN in order to decide the class of the

input image.

With this technique, the simultaneously

decomposition of two images is computed, obtaining

EmpiricalModeDecomposition-basedFaceRecognitionSystem

449

the same number of IMF for both images. One

image is the image to be classified (input, unknown

image), and the other one is a reference image of one

class. This decomposition is performed with each

one of the reference image of each class. Once the

decompositions are computed, the distance between

the modes, arranged as a matrix, are computed. The

classification is done using two different methods. In

the first one the classification is only based on the

lowest distance between input image and reference

images decompositions, and in the other method the

classification uses these distances as input vector of

a MLP.

Three different distance measures were analyzed

and Frobenius norm distance measure gave the best

results when the association is based exclusively on

the distance. The combination of the three distances

gave the best result when an ANN was used as a

classifier.

The success of the proposed method is promising

and will encourage us to continuing investigating the

use of mEMD decomposition as a feature extracting

system for face recognition problems, with new and

bigger data base.

ACKNOWLEDGEMENTS

This work has been partially supported by the

University of Vic under the grant R904, and under a

predoctoral grant from the University of Vic to Mr.

Esteve Gallego-Jutglà, ("Amb el suport de l'ajut

predoctoral de la Universitat de Vic"); and by

SAIOTEK from the Basque Government, to Dra.

Karmele López-de-Ipiña.

REFERENCES

Bishop, C. M., (1995). Neural Networks for Pattern

Recognition. Oxford University Press.

Diez, P. F., Mut, V., Laciar, E., Torres, A., Avilla, E.

(2009). Application of the Empirical Mode

Decomposition to the Extraction of Features form

EEG signals for Mental Task Classification. 31

Annual International Conference of the IEEE EMBS.

Freeman, J. A., Skapura, D. M. (1991). Neural Networks:

Algorithms, Applications and Programming

Techniques. Addison-Wesley Publishing Company,

Inc. Reading, MA

Gallego-Jutglà, E., Solé-Casals, J., “Exploring mEMD for

face recognition”, BIOSIGNALS conference

proceedings, BIOSTEC 2012.

Huang, N. E., Shen, Z., Long,S. R., Wu, M. C., Shih, H.

H:, Zheng, Q., Yen, N. C., Tung, C. C., Liu, H. H.

(1998). The empirical mode decomposition and the

Hilbert spectrum for nonlinear and non-stationary time

series analysis. Proc. R. Soc. Lond., 495, 2317-2345.

Hush, D. R., Horne, B. G. (1993). Progress in supervised

neural networks. IEEE Signal Processing Magazine,10

(1), pp. 8-39.

Iancu, C., Corcoran, P., Costache, G. (2007). A Review of

Face Recognition Techniques for In-Camera

Applications. International Symposium on Signals,

Circuits and Systems,1,1-4.

Lippmann, D. E. (1987). An Introduction to Computing

with Neural Networks. IEEE ASSP Magazine, 3(4),

pp. 4-22

Molla, K. I., Tanaka, T., Rutkowski, T. M., Cichocki, A.,

(2010). Separation of EOG artifacts from EEG singals

using bivariate EMD. Acoustics Speech and Signal

Processing (ICASSP), 2010 IEEE Interational

Conference On. 562-565.

Mutlu, A. Y., Aviyente, S. (2011). Mutivariate Empirical

Mode Decomposition for Quantifying Multivariate

Phase Synchronization. EURASIP Jounal on Advances

in Signal Processing. Article ID 615717

Rehman, N., Mandic, D. P., (2010). Multivariate empirical

mode decomposition. Proc. R. Soc. A. 466, 1291-

1302.

Travieso, C. M., Solé-Casals, J., Zaiats, V., Alonso, J. B.,

Ferrer, M.A., “Reducción del Vector de Características

en Reconocimiento Facial”, XXIII Simposium

Nacional de la Unión Científica Internacional de

Radio URSI 2008.

Woodward, J. D., Orlans, N.M., Higgins P. T. (2003).

Biometrics. McGraw-Hill.

Xiao Q. (2007). Technology review - Biometrics-

Technology, Application, Challenge, and

Computational Intelligence Solutions. IEEE

Computational Intelligence Magazine, 2, 5-25.

BIOSIGNALS2013-InternationalConferenceonBio-inspiredSystemsandSignalProcessing

450