Learning of Graph Compressed Dictionaries for Sparse

Representation Classiﬁcation

Farshad Nourbakhsh and Eric Granger

Laboratoire D’Imagerie de Vision et D’Intelligence Artiﬁcielle,

Ecole de Technologie Sup

erieure,

Universit

e du Qu

ebec, Montr

eal, Canada

Keywords:

Matrix Factorization, Graph Compression, Dictionary Learning, Sparse Representation Classiﬁcation, Face

Recognition, Video Surveillance.

Abstract:

Despite the limited target data available to design face models in video surveillance applications, many faces

of non-target individuals may be captured over multiple cameras in operational environments to improve ro-

bustness to variations. This paper focuses on Sparse Representation Classiﬁcation (SRC) techniques that are

suitable for the design of still-to-video FR systems based on under-sampled dictionaries. The limited refer-

ence data available during enrolment is complemented by an over-complete external dictionary that is formed

with an abundance of faces from non-target individuals. In this paper, the Graph-Compressed Dictionary

Learning (GCDL) technique is proposed to learn compact auxiliary dictionaries for SRC. GCDL is based on

matrix factorization, and allows to maintain a high level of SRC accuracy with compressed dictionaries be-

cause it exploits structural information to represent intra-class variations. Graph compression based on matrix

factorization shown to efﬁciently compress data, and can therefore rapidly construct compact dictionaries. Ac-

curacy and efﬁciency of the proposed GCDL technique is assessed and compared to reference sparse coding

and dictionary learning techniques using images from the CAS-PEAL database. GCDL is shown to provide

fast matching and adaptation of compressed dictionaries to new reference faces from the video surveillance

environments.

1 INTRODUCTION

In the recent years, sparse modelling has become an

important tool in the pattern recognition and computer

vision communities. They have been successfully ap-

plied in many image/video processing tasks like face

recognition, image denoising and super-resolution

(Mairal et al., 2014). The well-known Sparse Rep-

resentation Classiﬁcation (SRC) techniques (Wright

et al., 2009) typically need a sufﬁcient amount of rep-

resentative training data to construct over-complete

dictionaries that can provide a high level of perfor-

mance. In many real-world applications, the number

of reference images that are available for system de-

sign is limited. Classiﬁcation systems designed with

few reference samples per class are less robust to

the intra-class variabilities encountered during oper-

ations.

In video surveillance, the amount of reference

stills and videos captured during enrolment to design

a face recognition (FR) system is typically limited, es-

pecially in watch-list screening applications (Dewan

et al., 2016). The still-to-video FR systems employed

for watch-list screening seek to match probe face

images captured using surveillance cameras against

the reference still images of each individual of inter-

est enrolled in the gallery. The appearance of faces

captured under uncontrolled conditions using surveil-

lance cameras, varies dues to changes in illumina-

tion, pose, expression, scale, blur, etc., and divergence

from gallery images.

To enhance the robustness of facial models to

intra-class variability, the methods proposed for Sin-

gle Sample Per Person (SSPP) problems (Tan et al.,

2006) performs FR using multiple synthetically gen-

erated reference faces from an original reference im-

age (Mokhayeri et al., 2015), multiple face repre-

sentation (Bashbaghi et al., 2014), and external data

captured from individuals in the operational environ-

ment (Su et al., 2010). In video surveillance appli-

cations, the faces of many unknown non-target indi-

viduals may be captured in the operational environ-

ment, and over multiple cameras, to improve robust-

ness to intra-class variations. This paper focuses on

techniques based on under-sampled dictionary of still

Nourbakhsh, F. and Granger, E.

Learning of Graph Compressed Dictionaries for Sparse Representation Classiﬁcation.

DOI: 10.5220/0005710403090316

In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2016), pages 309-316

ISBN: 978-989-758-173-1

309

reference faces that are appropriate for the design of

still-to-video FR systems. The lack of target refer-

ence facial samples available to design face models is

compensated by using an over-complete external dic-

tionary that is formed based on non-target individual

captured in the operational environment.

Some techniques in SRC have been developed to

utilize such external data. For instance, Extended

SRC (Deng et al., 2012) was one of the ﬁrst meth-

ods that modiﬁed SRC framework to achieve ben-

eﬁt of external data on under-sampled dictionary.

RADL and SVDL (Wei and Wang, 2015; Yang et al.,

2013) are more recent techniques on sparse modelling

that combine dictionary learning and classiﬁcation to

manage external data more efﬁciently.

One of the main drawbacks of using large exter-

nal data is storage and computational time. The time

and memory complexity is proportional to the size

of dictionary |D|. Dictionary Learning (DL) tech-

niques are a possible solution to build a compact rep-

resentation of the large external data (Mairal et al.,

2014). DL techniques used by researchers in SRC

in recent years can be categorised into structural, hi-

erarchical and topological techniques (Shaﬁee et al.,

2013). For instance, K-Means Singular Value De-

composition (K-SVD) is a common method (Aharon

et al., 2006) to learn dictionary from external data.

It is worth to mention that although DL techniques

provides a compact representation of data, they are

not necessarily time efﬁcient. It has been shown that

learning a dictionary based on sparse representation is

NP-hard (Tillmann, 2015).

One way to reduce memory and time complexity,

and increase accuracy is data compression (Choi and

Szpankowski, 2012; Navlakha et al., 2008; Toivo-

nen et al., 2011) as applied in big data applications

(Nourbakhsh, 2015). Compressing data consists in

changing its representation such that it requires fewer

bits. Lossy or lossless compression are possible de-

pending on the reversibility of encoding. Many graph

compression methods have been proposed in litera-

ture with a different focus, e.g., information theoretic

approach, partitioning as a regular pairs and summa-

rization of data. A graph compression method is suit-

able for preprocessing of SRC methods that provides

cluster-based compactness.

Nourbakhsh et al. (Nourbakhsh et al., 2015) have

proposed a graph compression method based on ma-

trix factorization that focuses on structural informa-

tion. Data is presented as a similarity matrix of n data

samples, and matrix factorization encodes the order

of graph data into a compressed graph of order k ≤ n

in a way to minimize reconstruction error. It has been

shown that this data compression method reduces the

complexity of many algorithms from n

to (k

+ n)

by replacing the original data with its corresponding

factorization. Although this method has not been de-

veloped for graph clustering, practical results suggest

that it provide a potentially powerful preprocessing

tools for DL and SRC.

In this paper, a new graph compression technique

based on matrix factorization is proposed to learn

compact auxiliary dictionaries for accurate SRC.

This techniques, entitled Graph-Compressed Dictio-

nary Learning (GCDL), allows to maintain a high

level of accuracy with compressed dictionaries be-

cause it exploits structural information to represent

intra-class variations. In addition to providing fast

matching, graph factorization compression has been

shown to efﬁciently compress data, and can therefore

rapidly construct compressed dictionaries compared

to many reference DL methods. It is therefore suitable

for adaptation of compressed dictionaries to newly-

acquired reference faces captured in changing video

surveillance environments.

2 SPARSE REPRESENTATION

CLASSIFICATION

In sparse modelling, probe samples are represented as

sparse linear combination of reference samples. The

sparse representation of a signal can be formulated

under the assumption that the samples from a single

class lie on a liner subspace. Given an over complete

matrix D = [D

, D

, ...,D

] with the l distinct classes.

The size of over complete dictionary is m × n, where

the size of column is greater than the size of row, and

usually n  m. The ith class of D can be presented

as D

= [d

i,1

, d

i,2

, ..., d

i,n

] with a size of m × n

. The

probe sample y with a size m × 1 from class D

will

lie in the linear span of over complete dictionary as

below:

y = a

i,1

+ a

i,2

+ ... + a

i,n

, (1)

where (a

i,1

, ..., a

i,n

) are the linear representation of y

on the over complete dictionary, deﬁned as,

y = [D

, D

, ..., D

]x. (2)

where x is a sparse coefﬁcient vector. The entries of x

are always zero, except for the ones belong to the ith

class.

In the real FR application because of the presence

of factors such as noise, the linear Eq (2) is not de-

termined when there is no unique solution. It is pos-

sible to resolve this issue, using sparse representation

of probe sample by increasing the value of l. Using

sparse representation, the natural solution is applying

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

310

− norm minimization to determine the coefﬁcients

that are equivalent to the number of non-zero compo-

nents in vector x.

ˆx = min

kxk

s.t. y = Dx. (3)

Eq (3) is NP-hard and difﬁcult to approximate. By

combining compressed sensing (Donoho and Tsaig,

2008) and sparse representation theory, an approxi-

mate solution is obtained by replacing the l

norm in

Eq (4) with the l

norm:

ˆx = min

kxk

s.t. y = Dx. (4)

If the solution of vector x is sparse enough so the

solution of Eq (4) is equivalent as Eq (3). Eq (4) can

be solved with different optimization techniques like

basis pursuit using linear programming.

In this paper, difference SRC techniques based on

(Wei and Wang, 2015) are considered, so some of

these are renewed. Eq (5) is a generalized version of

Eq (4), which allows for certain degree of noise that

is called LASSO (Yang et al., 2010a), it seeks to ﬁnd

x such that the following objective function is mini-

mized:

min

ky − Dxk

+ λkxk

(5)

where λ > 0 controls reconstruction error and

sparsity which is called scalar regularization param-

eter. When the sparse vector coefﬁcient is obtained

by Eq (4) or (5), a probe image y is assigned to a class

by calculating the distance between the probe image

and the reconstructed image based on sparse vector

coefﬁcient. This image indicates the class elements as

non zero value and almost zero for non classes. The

main idea is when the class is recognized correctly,

the query image can be reconstructed linearly with a

relevant bases of dictionary D.

SRC method needs large amount of training to

built over complete dictionary and the training data

size has a direct effect on the classiﬁcation. In many

real applications like video surveillance, there is no

enough reference target training data so one of the

solutions is to apply non target external data. (Deng

et al., 2012) extended SRC method for undersampled

dictionary data. In their method, undresampled dic-

tionary D = [d

, d

, ..., d

] (D is overcomplete dictio-

nary) is populated with one or few samples for each

class. External dictionary from non target data ED

covers the known distortion like illumination. It adds

intra class variation to the undersampled dictionary D

as follow

min

ky − [[d

, d

, ..., d

], ED][

+ λkxk

(6)

To assign probe image to the closest class, like

SRC, the distance calculated between query image

and reconstructed image respect the nonzero coefﬁ-

cient x

and extra coefﬁcient x

Robust Sparse Coding (RSC) proposed in (Yang

et al., 2011b) is a robust face classiﬁer based on SRC.

An extra weighting term W is assigned to each pixel

of probe image. Pixels from outlier part of image

are less informative than central pixels. For instance

eyes and nose of a face have more information than

hair. Wei and Wang (2015) have proposed a similar

frame work to integrate auxiliary dictionary learning

and classiﬁcation as follows:

min

kW(y−[[d

, d

, ..., d

], ED][

])k

+λkxk

(7)

where ED is external dictionary. They have pro-

posed two methods RADL

for classiﬁcation and

RADL

DL for dictionary learning and classiﬁcation.

In RADL

DL method, ED is calculated based on an

optimization method on overcomplete external data

and each column of learned dictionary ED is called

as atom.

3 DICTIONARY LEARNING

The performance of SRC methods is limited by the

number of reference samples. For instance, the time

complexity respect to the number of sample data is

quadratic (Donoho and Tsaig, 2008). This challenge

has been addressed in the literature by applying a

compact representation or reducing the number of ref-

erence data. For instance, Wright et al. (2009) has

suggested a random selection of reference training

sample to reduce the time complexity although it im-

pacts the accuracy. A common solution in literature is

to reduce the time complexity of SRC by applying DL

techniques although most of them can only be applied

off-line because of their time complexity.

Olshausen and Field (1996) introduced dictionary

learning in the pattern recognition community (Ol-

shausen and Field, 1996). They proposed an unsu-

pervised method based on data structure that learns

the bases/atom of dictionary from training data which

is different from classical methods such as discrete

Fourier transform (DFT) and various types of Wavelet

methods which use ﬁxed standardized format dictio-

naries. The DL problem can be viewed from different

perspectives like matrix factorization, risk minimiza-

tion and constrained variants.

DL methods have recently been applied as prepro-

cessing step for the SRC. For example Shaﬁee et al.

(2013) have investigated the effect of the impact on

performance of three different DL methods for SRC.

They used Metaface dictionary learning (Yang et al.,

2010b), Fisher Discriminative Dictionary Learning

Learning of Graph Compressed Dictionaries for Sparse Representation Classiﬁcation

311

(FDDL) (Yang et al., 2011a), Sparse Modelling Rep-

resentative Selection (SMRS) (Elhamifar et al., 2012)

to obtain compact representation of training data.

They showed that the FDDL method provides a high

recognition accuracy compare to other methods al-

though SMRS method requires a less learning time

compare to others.

K-Means Singular Value Decomposition (KSVD)

(Aharon et al., 2006) and Method of Optimal Direc-

tions (MOD) (Engan et al., 1999) are two popular un-

supervised DL techniques which have been used in

the literature. These EM style methods alternate be-

tween dictionary and sparse coding. The difference

between these two methods are in dictionary updat-

ing, where KSVD updates atom by atom, and MOD

updates all the atoms simultaneously.

(Ram

ırez et al., 2010) proposed a framework to man-

age sparse modelling and clustering. They introduce

set of dictionaries which are optimized one for each

cluster. The learned data is a cluster of the union of

low dimensional subspaces. Most of the reported DL

methods require an overcomplete dictionary to gener-

alize well the intra class variations except (Wei and

Wang, 2015; Yang et al., 2013) to some extent. How-

ever DL methods based on SR provide a compact rep-

resentation of overcomplete dictionary that reduces

the time complexity of SRCs. However, they are al-

most NP-hard to execute so they are used mostly off-

line.

4 GRAPH COMPRESSION FOR

SRC

Most of DL methods for SRC require considerable

amount of construction time that increases with the

size of the reference data. To rapidly construct

over-complete dictionary without losing dictionary is

graph compression which changes the representation

of data and requires less memory. Depending on

the type of encoding these methods produce a lossy

or lossless compression. Data can be presented as

a collection of feature vectors or representation of

the similarity/dissimilarity relations among data sam-

ples. Therefore, it can be easily convert data to the

adjacency matrix of a weighted graph. Compres-

sion methods can be addressed using information-

theory to compress graphical structures (Choi and Sz-

pankowski, 2012) without preventing a graph struc-

ture as the compressed representation. As a second

category of methods relies on Szemer

edi regularity

lemma (Szemer

edi, 1978) that is a well-known re-

sult in extremal graph theory. He roughly states that

a dense graph can be approximated by a bounded

number of random bipartite graphs. An algorith-

mic version of this lemma has been used for ac-

celerating pairwise clustering (Sperotto and Pelillo,

2007). Finally, a compression method can take to ac-

count the structural information of data. For example

(Navlakha et al., 2008) propose a summarization al-

gorithm for unweighted graphs and (Toivonen et al.,

2011) suggested a greedy procedure to determine a

set of supernodes and superedges that to approximate

a weighted graph.

4.1 Graph-Compressed Dictionary

Learning (GCDL)

In this paper, a graph compression method is pro-

posed for application on large external dictionaries.

Although, using a compact dictionary representation

is not new, this method executes rapidly as a prepro-

cessing step of SRC. Because SRC methods are NP-

hard by nature, the homotopy method for the sparse

optimization has been selected in this paper due to its

time efﬁciency (Yang et al., 2010a).

Figure 1 illustrates the proposed method. Assume

that external data is collected a priori from non target

individual over a network of surveillance cameras. It

is assumed that the system can use a reference still

face image, and faces captured in surveillance cam-

eras for several non target persons. The ﬁrst column

represents the still image of its corresponding row. An

overcomplete dictionary is constructed by calculating

the difference between each image and it correspond-

ing still image. A similarity graph is calculated based

on overcomplete dictionary, where each image is a

node in the graph and weighted edges present the sim-

ilarity between nodes. In the similarity Graph block,

the nodes that are marked with circles are atoms of

the compressed dictionary. Finally, each probe sam-

ple is sparse linear combination of reference stills and

compressed dictionary.

The edge weighted graph G = (V, E, w) represent-

ing a set of n vertices that each vertices V

is con-

nected to the vertices V

with an edge weighted and

E ⊆ V ×V is the set of edges. The weight w(i, j) is

obtained from the following formula.

w(i, j) = exp(

−||ed(i) − ed( j)||

) (8)

where σ is a positive real number and it is bounded to

(0, 1], ed(i) is a feature vector from external overcom-

plete dictionary ED and ||.||

is the Euclidean distance

between the two values which gives dissimilarity be-

tween two considered elements. The graph G is com-

plete and undirected with order of n that is the size

of external overcomplete dictionary ED. Therefore

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

312

External Data

Overcomplete

Dictionary

Similarity Graph

≈

Probe

image

Gallery

Compressed

Dictionary

With

Coefficient

Design Phase:

Operational Phase:

Compressed

Dictionary

Figure 1: Schematic presentation of Graph Compressed Dictionary Learning (GCDL) method.

w(i, j) = w( j, i) for all (i, j) ∈ E. A graph G is called

a symmetric matrix G with order of n. Let k ≤ n be

a constant representing that is the number of atoms or

the order of compressed new graph C. The rate

regarded as the graph compression rate.

A many to one mapping ψ : [n] → [k] is needed be-

tween vertices of the original graph and compressed

graph. The compressed graph must be determined to

reduce the order of graph from n to k. To estimate

the mapping function and compressed graph, a least

squares approximation is applied on following mini-

mizer by dropping a left-stochastic constrain to a real

matrix X.

min f (X,C) = kG − X

CX k

s.t. X ∈ S , C ∈ R

k×k

(9)

where

f (X,C) =

∑

(i, j)∈{1,2,...,N}

∑

k,h∈{1,2,3,...K}

(k,i)6=(h, j)

h j

×(G

i j

− C

)

∑

i∈{1,2,...,N}

∑

k∈{1,2,3,...K}

− C

)

The optimization can be addressed as a EM method

which alternates updates of the variable C and up-

dates of the variable X . The minimization approach

converges to a stationary point by updating a decrease

of the objective function in every iteration.

Update rule for C.

The update rule for the unconstrained matrix C,

(X), is obtained based on the ﬁrst-order partial

derivative of f with respect to C to zero ((Nourbakhsh

et al., 2015), proof of Theorem for details).

Update rule for X.

(X,C) is a multiplicative rule for X that is in

similar manner to the ones suggested in (Lee and

Seung, 2001) for Non-negative Matrix Factorization

(NMF).

We say that X is Karush-Kuhn-Tucker (KKT-

)point for the following optimization if it satisﬁes the

ﬁrst-order necessary conditions for the local optimal-

ity (details are given in (Nourbakhsh et al., 2015)).

min f

(X) = f (X ,C)

s.t. X ∈ Y

(10)

where C ∈ R

k×k

, Z ∈ S and Y

= {X ∈ S : (Z

0) ⇒ (X

= 0)}.

Algorithm 1 provides the summary of GCDL

approach that works as follows. After calculating

weighted graph from overcomplete dictionary (ex-

plained in Figure 1), the minimization optimization

starts by a random selection of X ∈ S continued

by repeatedly alternate between updating C and X

with their respective update rules that are U

(X) and

(X,C), until convergence. The stopping criteria is

not when the distance between Xs of two consecutive

iterations is below a given threshold, and maximum

number of iterations reaches a threshold. This proce-

dure may converge to a local minima and it guarantees

Learning of Graph Compressed Dictionaries for Sparse Representation Classiﬁcation

313

a strict decrease of the objective until a KKT-point is

reached. Finally, a discrete solution is obtained by

projecting of binary left stochastic matrices by setting

to 1 the element having highest value in each column

of X and put to 0 the rest.

Although, the above compression method is not

designed speciﬁcity for clustering, it converges good

clustering that generate compact representation of in-

put data. From this perspective, the mapping X en-

codes the clustering result. Then the representative

vertices of each cluster based on the mapping X pro-

duce the dictionary atoms. The complexity of a ma-

trix vector multiplication reduces from n

to (k

+ n).

Algorithm 1: Graph Compressed Dictionary Learning.

Input Data: Over-Complete Dictionary from Ex-

ternal Data, ← ED

Output: Graph Compressed Dictionary, D

GCDL

Graph G ← Calculate similarity matrix from ED

X ← draw a random matrix from S that is stochas-

tic matrices

while stopping criterion is not met do

C ← U

(X) /* Update C */

X ← U

(X,C) /* Update X */

end while

Project X to binary left stochastic matrix

GCDL

← Select the representative of each cluster

based on the mapping matrix X

5 EXPERIMENTAL RESULTS

In this section the performance of the proposed

GCDL method is compared to several state of the-art

SRC

and DL methods: SRC (Wright et al., 2009),

RSC (Yang et al., 2011b), ESRC (Deng et al., 2012),

and RADL (Wei and Wang, 2015). Baseline methods

are categorized as methods without external dictio-

nary (SRC and RSC) with external dictionary (such as

ESRC and RADL

) and with both dictionary learn-

ing and classiﬁcation like (SVDL and RADL

DL). In

addition, the time required to reconstruct compressed

dictionary is also compared.

The results were obtained with images from the

CAS-PEAL database (Gao et al., 2008) that is a large-

scale Chinese face database which contains pose

variations, expression, accessory, lighting and back-

grounds. It contains facial captures from 1040 indi-

viduals (595 males and 445 females). For this exper-

iment, we follow the protocol that has been discussed

http://mml.citi.sinica.edu.tw/publications.html

in the paper of (Wei and Wang, 2015). 100 subjects

from the neutral category as gallery images D and

their corresponding distorted images from accessory

category were selected for testing. The accessory cat-

egory contains 3 images with hats and 3 images with

sunglasses so 600 images are collected for testing in

total. The 60 subjects with 6 instances of its accessory

category from the rest of 1040 individuals are chosen

as external data to build dictionary ED. The pixel-

based feature vector is obtained by downsampling the

original grey-scale face images to 50 ×40 pixels.

1) For the ﬁrst experiment the performance of

GCDL is shown by varying the number of atoms

(compression rate) with respect to other methods. The

60 individuals are used as external data with 6 in-

stances for each. In our experiment, the size of exter-

nal data is increased in each experiment by randomly

selecting the data from 60 individuals that starts from

1 to 15 individuals for ESRC and RADL

. In a same

manner, the external overcomplete dictionary is com-

pressed from 360 elements, to have the same number

of atom in each experiment.

Table 1 shows the accuracy of ESRC and RADL

classiﬁcation with different size of external data and

compressed data. The average accuracy is shown for

10 experiments. Because the standard deviation is al-

ways negligible, it is not reported in the Table 1. We

notice that by increasing the number of atoms, the

performance increases in general. The overall per-

formance for GCDL is generally higher than methods

without external dictionary learning. The result shows

that in low compression rate (equivalent to high num-

ber of atoms), accuracy is low compare to other meth-

ods. Since GCDL is based on clustering, the amount

of information added to enhance SRC declines as the

number of atoms grows. In another words, by increas-

ing the number of atoms close to the total number of

clusters, GCDL tends to partition data randomly with-

out using structural information of data.

2) In the second experiment, the performance

of GCDL is compared to some dictionary learning

methods followed by classiﬁcation algorithms such

as RADL

DL and SVDL, then with K-SVD, MOD

and SMRS are used for dictionary learning followed

by RADL

as classiﬁcation. Results of ESRC and

RADL

are provided by applying whole external data

without dictionary learning. Finally, results are also

provided for SRC and RSC as a base line. Since these

methods do not require external dictionary. The exe-

cution time of these methods are also compared. All

codes are implemented in MATLAB , using a 3.40

GHz and 8 GB RAM computer.

Table 2 shows that the average execution time

of constructing compressed dictionary by GCDL is

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

314

Table 1: Average classiﬁcation accuracy of GCDL and references methods on CAS-PEAL data.

Methods with Number of Atoms

External Dictionary 6 12 18 30 60 90

ESRC 74.50% 74.50% 75.17% 75.83% 77.33% 78.33%

ESRC+GCDL 74.90% 75.57% 76.08% 76.35% 77.62% 77.85%

RDAL

83.17 % 83.83% 84.33 % 84.00 % 85.90 % 86.83 %

RDAL

+GCDL 83.95% 84.87% 85.53% 85.15% 85.97% 86.73%

Table 2: Average accuracy and reconstruction time of GCDL and references methods on CAS-PEAL data.

Methods with Dictionary Learning and Classiﬁcation

Number of Atoms Dictionary Size Accuracy (%) Time (Sec)

SVDL 19 2000×360 82.33 % 1607.87

RADL

DL 18 2000×360 85.67% 924.62

Methods with Dictionary Learning and RADL

as Classiﬁcation

K-SVD 18 2000×360 84.96% 115.38

MOD 18 2000×360 85.17% 71.759

SMRS 17 2000×360 83.83% 3.28

Methods with Compression and RADL

as Classiﬁcation

GCDL 18 2000×360 86.00% 0.87

Apply whole Raw Dictionary

ESRC 360 2000×360 78.50% 169.69

RADL

360 2000×360 87.00% 267.53

Methods without External Dictionary

SRC 72.67%

RSC 82.33%

much lower than other methods. In addition, our

method doesn’t rely on parameter tuning like K-SVD,

MOD and SMRS. And GCDL provides a comparable

level of accuracy.

6 CONCLUSIONS

In this paper, the Graph Compression Dictionary

Learning (GCDL) method based on matrix factoriza-

tion is proposed to construct compact representation

of overcomplete external data. The GCDL exploits

the structural information on external dictionary to

build compressed dictionary. It affects a trade-off

between time complexity and accuracy. Experiment

conducted with a high compression rate produces a

better accuracy. Therefore we show that it is more

robust to intra class variation compared to commonly

used dictionary learning methods on literature. As a

result, the proposed algorithm allows managing oc-

clusion face images, or illumination and expression.

Moreover, GCDL handles even one or few gallery

images per individuals. The result on CAS-PEAL

dataset show that GCDL has a better time efﬁciency

for the construction of compact dictionary. It can be

employed to accelerate many SRC approaches, and

the complexity of a matrix-vector multiplication can

be signiﬁcantly reduced.

REFERENCES

Aharon, M., Elad, M., and Bruckstein, A. (2006). The k-

svd: An algorithm for designing overcomplete dictio-

naries for sparse representation. Trans. Signal Pro-

cessing, 54(11):4311–4322.

Bashbaghi, S., Granger, E., Sabourin, R., and Bilodeau,

G. (2014). Watch-list screening using ensembles

based on multiple face representations. In Inter-

national Conference on Pattern Recognition, pages

4489–4494.

Choi, Y. and Szpankowski, W. (2012). Compression of

graphical structures: Fundamental limits, algorithms,

and experiments. IEEE Trans. on Information Theory,

58(2):620–638.

Deng, W., Hu, J., and Guo, J. (2012). Extended src: Un-

dersampled face recognition via intraclass variant dic-

tionary. IEEE Trans. Pattern Analysis Machine Intel-

ligence, 34(9):1864–1870.

Dewan, M. A. A., Granger, E., Marcialis, G. L., Sabourin,

R., and Roli, F. (2016). Adaptive appearance model

tracking for still-to-video face recognition. Pattern

Recognition, 49:129–151.

Learning of Graph Compressed Dictionaries for Sparse Representation Classiﬁcation

315

Donoho, D. L. and Tsaig, Y. (2008). Fast solution of l1-

norm minimization problems when the solution may

be sparse. Information Theory, IEEE Transactions on,

54(11):4789–4812.

Elhamifar, E., Sapiro, G., and Vidal, R. (2012). See all by

looking at a few: Sparse modeling for ﬁnding repre-

sentative objects. In IEEE Conference on Computer

Vision and Pattern Recognition,, pages 1600–1607.

Engan, K., Aase, S. O., and Hakon Husoy, J. (1999).

Method of optimal directions for frame design. In In-

ternational Conference of Acoustics, Speech, and Sig-

nal Processing, pages 2443–2446.

Gao, W., Cao, B., Shan, S., Chen, X., Zhou, D., Zhang, X.,

and Zhao, D. (2008). The cas-peal large-scale chinese

face database and baseline evaluations. IEEE Trans.

System Man Cybernetics Part A, 38(1):149–161.

Lee, D. D. and Seung, H. S. (2001). Algorithms for non-

negative matrix factorization. In Advances in Neural

Information Processing Systems 13, pages 556–562.

Mairal, J., Bach, F., and Ponce, J. (2014). Sparse modeling

for image and vision processing. Foundations Trends

in Computer Graphics and Vision, 8(2-3):85–283.

Mokhayeri, F., Granger, E., and Bilodeau, G. (2015). Syn-

thetic face generation under various operational con-

ditions in video surveillance. In International Confer-

ence on Image Processing.

Navlakha, S., Rastogi, R., and Shrivastava, N. (2008).

Graph summarization with bounded error. In Inter-

national Conference on Management of Data (ACM),

pages 419–432.

Nourbakhsh, F. (2015). Algorithms for Graph Compres-

sion: Theory and Experiments. PhD thesis, Diparta-

mento di Scienze Ambientali, Informatica e Statistica,

Universit

a Ca’Foscari, Venice, Italy.

Nourbakhsh, F., Bul

o, S. R., and Pelillo, M. (2015). A ma-

trix factorization approach to graph compression with

partial information. International Journal of Machine

Learning & Cybernetics, 6(4):523–536.

Olshausen, B. A. and Field, D. J. (1996). Emergence of

simple-cell receptive ﬁeld properties by learning a

sparse code for natural images. Nature, 381:607–609.

Ram

ırez, I., Sprechmann, P., and Sapiro, G. (2010). Clas-

siﬁcation and clustering via dictionary learning with

structured incoherence and shared features. In IEEE

Conference on Computer Vision and Pattern Recogni-

tion, pages 3501–3508.

Shaﬁee, S., Kamangar, F., Athitsos, V., and Huang, J.

(2013). The role of dictionary learning on sparse

representation-based classiﬁcation. In International

Conference on PErvasive Technologies Related to As-

sistive Environments, PETRA ’13, pages 47:1–47:8.

Sperotto, A. and Pelillo, M. (2007). Szemer

edis regular-

ity lemma and its applications to pairwise clustering

and segmentation. In International Conference of En-

ergy Minimization Methods in Computer Vision and

Pattern Recognition, volume 4679 of Lecture Notes in

Computer Science, pages 13–27.

Su, Y., Shan, S., Chen, X., and Gao, W. (2010). Adap-

tive generic learning for face recognition from a sin-

gle sample per person. In International Conference

on Computer Vision and Pattern Recognition, pages

2699–2706.

Szemer

edi, E. (1978). Regular partitions of graphs. In

Probl

emes combinatoires et thorie des graphes, pages

399–401.

Tan, X., Chen, S., hua Zhou, Z., and Zhang, F. (2006).

Face recognition from a single image per person: A

survey. International Journal of Pattern Recognition,

39:1725–1745.

Tillmann, A. M. (2015). On the computational intractability

of exact and approximate dictionary learning. IEEE

Signal Processing Letter, 22(1):45–49.

Toivonen, H., Zhou, F., Hartikainen, A., and Hinkka, A.

(2011). Compression of weighted graphs. In Interna-

tional Conference on Knowledge Discovery and Data

Mining (ACM), pages 965–973.

Wei, C. and Wang, Y. F. (2015). Undersampled face recog-

nition via robust auxiliary dictionary learning. IEEE

Transactions on Image Processing, 24(6):1722–1734.

Wright, J., Yang, A. Y., Ganesh, A., Sastry, S. S., and Ma,

Y. (2009). Robust face recognition via sparse repre-

sentation. IEEE Trans. Pattern Analysis Machine In-

telligence, 31(2):210–227.

Yang, A. Y., Ganesh, A., Zhou, Z., Sastry, S., and Ma, Y.

(2010a). Fast l

-minimization algorithms for robust

face recognition: A review. International Conference

on Image Processing, pages 1849–1852.

Yang, M., Van, L., and Zhang, L. (2013). Sparse variation

dictionary learning for face recognition with a single

training sample per person. In International Confer-

ence on Computer Vision, pages 689–696.

Yang, M., Zhang, L., Feng, X., and Zhang, D. (2011a).

Fisher discrimination dictionary learning for sparse

representation. In International Conference on Com-

puter Vision, pages 543–550.

Yang, M., Zhang, L., Yang, J., and Zhang, D. (2010b).

Metaface learning for sparse representation based face

recognition. In International Conference on Image

Processing,, pages 1601–1604.

Yang, M., Zhang, L., Yang, J., and Zhang, D. (2011b).

Robust sparse coding for face recognition. In Inter-

national Conference on Computer Vision and Pattern

Recognition, pages 625–632.

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

316