Learning of Graph Compressed Dictionaries for Sparse
Representation Classification
Farshad Nourbakhsh and Eric Granger
Laboratoire D’Imagerie de Vision et D’Intelligence Artificielle,
´
Ecole de Technologie Sup
´
erieure,
Universit
´
e du Qu
´
ebec, Montr
´
eal, Canada
Keywords:
Matrix Factorization, Graph Compression, Dictionary Learning, Sparse Representation Classification, Face
Recognition, Video Surveillance.
Abstract:
Despite the limited target data available to design face models in video surveillance applications, many faces
of non-target individuals may be captured over multiple cameras in operational environments to improve ro-
bustness to variations. This paper focuses on Sparse Representation Classification (SRC) techniques that are
suitable for the design of still-to-video FR systems based on under-sampled dictionaries. The limited refer-
ence data available during enrolment is complemented by an over-complete external dictionary that is formed
with an abundance of faces from non-target individuals. In this paper, the Graph-Compressed Dictionary
Learning (GCDL) technique is proposed to learn compact auxiliary dictionaries for SRC. GCDL is based on
matrix factorization, and allows to maintain a high level of SRC accuracy with compressed dictionaries be-
cause it exploits structural information to represent intra-class variations. Graph compression based on matrix
factorization shown to efficiently compress data, and can therefore rapidly construct compact dictionaries. Ac-
curacy and efficiency of the proposed GCDL technique is assessed and compared to reference sparse coding
and dictionary learning techniques using images from the CAS-PEAL database. GCDL is shown to provide
fast matching and adaptation of compressed dictionaries to new reference faces from the video surveillance
environments.
1 INTRODUCTION
In the recent years, sparse modelling has become an
important tool in the pattern recognition and computer
vision communities. They have been successfully ap-
plied in many image/video processing tasks like face
recognition, image denoising and super-resolution
(Mairal et al., 2014). The well-known Sparse Rep-
resentation Classification (SRC) techniques (Wright
et al., 2009) typically need a sufficient amount of rep-
resentative training data to construct over-complete
dictionaries that can provide a high level of perfor-
mance. In many real-world applications, the number
of reference images that are available for system de-
sign is limited. Classification systems designed with
few reference samples per class are less robust to
the intra-class variabilities encountered during oper-
ations.
In video surveillance, the amount of reference
stills and videos captured during enrolment to design
a face recognition (FR) system is typically limited, es-
pecially in watch-list screening applications (Dewan
et al., 2016). The still-to-video FR systems employed
for watch-list screening seek to match probe face
images captured using surveillance cameras against
the reference still images of each individual of inter-
est enrolled in the gallery. The appearance of faces
captured under uncontrolled conditions using surveil-
lance cameras, varies dues to changes in illumina-
tion, pose, expression, scale, blur, etc., and divergence
from gallery images.
To enhance the robustness of facial models to
intra-class variability, the methods proposed for Sin-
gle Sample Per Person (SSPP) problems (Tan et al.,
2006) performs FR using multiple synthetically gen-
erated reference faces from an original reference im-
age (Mokhayeri et al., 2015), multiple face repre-
sentation (Bashbaghi et al., 2014), and external data
captured from individuals in the operational environ-
ment (Su et al., 2010). In video surveillance appli-
cations, the faces of many unknown non-target indi-
viduals may be captured in the operational environ-
ment, and over multiple cameras, to improve robust-
ness to intra-class variations. This paper focuses on
techniques based on under-sampled dictionary of still
Nourbakhsh, F. and Granger, E.
Learning of Graph Compressed Dictionaries for Sparse Representation Classification.
DOI: 10.5220/0005710403090316
In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2016), pages 309-316
ISBN: 978-989-758-173-1
Copyright
c
2016 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
309
reference faces that are appropriate for the design of
still-to-video FR systems. The lack of target refer-
ence facial samples available to design face models is
compensated by using an over-complete external dic-
tionary that is formed based on non-target individual
captured in the operational environment.
Some techniques in SRC have been developed to
utilize such external data. For instance, Extended
SRC (Deng et al., 2012) was one of the first meth-
ods that modified SRC framework to achieve ben-
efit of external data on under-sampled dictionary.
RADL and SVDL (Wei and Wang, 2015; Yang et al.,
2013) are more recent techniques on sparse modelling
that combine dictionary learning and classification to
manage external data more efficiently.
One of the main drawbacks of using large exter-
nal data is storage and computational time. The time
and memory complexity is proportional to the size
of dictionary |D|. Dictionary Learning (DL) tech-
niques are a possible solution to build a compact rep-
resentation of the large external data (Mairal et al.,
2014). DL techniques used by researchers in SRC
in recent years can be categorised into structural, hi-
erarchical and topological techniques (Shafiee et al.,
2013). For instance, K-Means Singular Value De-
composition (K-SVD) is a common method (Aharon
et al., 2006) to learn dictionary from external data.
It is worth to mention that although DL techniques
provides a compact representation of data, they are
not necessarily time efficient. It has been shown that
learning a dictionary based on sparse representation is
NP-hard (Tillmann, 2015).
One way to reduce memory and time complexity,
and increase accuracy is data compression (Choi and
Szpankowski, 2012; Navlakha et al., 2008; Toivo-
nen et al., 2011) as applied in big data applications
(Nourbakhsh, 2015). Compressing data consists in
changing its representation such that it requires fewer
bits. Lossy or lossless compression are possible de-
pending on the reversibility of encoding. Many graph
compression methods have been proposed in litera-
ture with a different focus, e.g., information theoretic
approach, partitioning as a regular pairs and summa-
rization of data. A graph compression method is suit-
able for preprocessing of SRC methods that provides
cluster-based compactness.
Nourbakhsh et al. (Nourbakhsh et al., 2015) have
proposed a graph compression method based on ma-
trix factorization that focuses on structural informa-
tion. Data is presented as a similarity matrix of n data
samples, and matrix factorization encodes the order
of graph data into a compressed graph of order k n
in a way to minimize reconstruction error. It has been
shown that this data compression method reduces the
complexity of many algorithms from n
2
to (k
2
+ n)
by replacing the original data with its corresponding
factorization. Although this method has not been de-
veloped for graph clustering, practical results suggest
that it provide a potentially powerful preprocessing
tools for DL and SRC.
In this paper, a new graph compression technique
based on matrix factorization is proposed to learn
compact auxiliary dictionaries for accurate SRC.
This techniques, entitled Graph-Compressed Dictio-
nary Learning (GCDL), allows to maintain a high
level of accuracy with compressed dictionaries be-
cause it exploits structural information to represent
intra-class variations. In addition to providing fast
matching, graph factorization compression has been
shown to efficiently compress data, and can therefore
rapidly construct compressed dictionaries compared
to many reference DL methods. It is therefore suitable
for adaptation of compressed dictionaries to newly-
acquired reference faces captured in changing video
surveillance environments.
2 SPARSE REPRESENTATION
CLASSIFICATION
In sparse modelling, probe samples are represented as
sparse linear combination of reference samples. The
sparse representation of a signal can be formulated
under the assumption that the samples from a single
class lie on a liner subspace. Given an over complete
matrix D = [D
1
, D
2
, ...,D
l
] with the l distinct classes.
The size of over complete dictionary is m × n, where
the size of column is greater than the size of row, and
usually n m. The ith class of D can be presented
as D
i
= [d
i,1
, d
i,2
, ..., d
i,n
i
] with a size of m × n
i
. The
probe sample y with a size m × 1 from class D
i
will
lie in the linear span of over complete dictionary as
below:
y = a
i,1
d
i,1
+ a
i,2
d
i,2
+ ... + a
i,n
d
i,n
i
, (1)
where (a
i,1
, ..., a
i,n
) are the linear representation of y
on the over complete dictionary, defined as,
y = [D
1
, D
2
, ..., D
l
]x. (2)
where x is a sparse coefficient vector. The entries of x
are always zero, except for the ones belong to the ith
class.
In the real FR application because of the presence
of factors such as noise, the linear Eq (2) is not de-
termined when there is no unique solution. It is pos-
sible to resolve this issue, using sparse representation
of probe sample by increasing the value of l. Using
sparse representation, the natural solution is applying
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
310
l
0
norm minimization to determine the coefficients
that are equivalent to the number of non-zero compo-
nents in vector x.
ˆx = min
x
kxk
0
s.t. y = Dx. (3)
Eq (3) is NP-hard and difficult to approximate. By
combining compressed sensing (Donoho and Tsaig,
2008) and sparse representation theory, an approxi-
mate solution is obtained by replacing the l
0
norm in
Eq (4) with the l
1
norm:
ˆx = min
x
kxk
1
s.t. y = Dx. (4)
If the solution of vector x is sparse enough so the
solution of Eq (4) is equivalent as Eq (3). Eq (4) can
be solved with different optimization techniques like
basis pursuit using linear programming.
In this paper, difference SRC techniques based on
(Wei and Wang, 2015) are considered, so some of
these are renewed. Eq (5) is a generalized version of
Eq (4), which allows for certain degree of noise that
is called LASSO (Yang et al., 2010a), it seeks to find
x such that the following objective function is mini-
mized:
min
x
ky Dxk
2
2
+ λkxk
1
(5)
where λ > 0 controls reconstruction error and
sparsity which is called scalar regularization param-
eter. When the sparse vector coefficient is obtained
by Eq (4) or (5), a probe image y is assigned to a class
by calculating the distance between the probe image
and the reconstructed image based on sparse vector
coefficient. This image indicates the class elements as
non zero value and almost zero for non classes. The
main idea is when the class is recognized correctly,
the query image can be reconstructed linearly with a
relevant bases of dictionary D.
SRC method needs large amount of training to
built over complete dictionary and the training data
size has a direct effect on the classification. In many
real applications like video surveillance, there is no
enough reference target training data so one of the
solutions is to apply non target external data. (Deng
et al., 2012) extended SRC method for undersampled
dictionary data. In their method, undresampled dic-
tionary D = [d
1
, d
2
, ..., d
l
] (D is overcomplete dictio-
nary) is populated with one or few samples for each
class. External dictionary from non target data ED
covers the known distortion like illumination. It adds
intra class variation to the undersampled dictionary D
as follow
min
x
ky [[d
1
, d
2
, ..., d
l
], ED][
x
d
x
ED
]k
2
2
+ λkxk
1
(6)
To assign probe image to the closest class, like
SRC, the distance calculated between query image
and reconstructed image respect the nonzero coeffi-
cient x
d
and extra coefficient x
ED
.
Robust Sparse Coding (RSC) proposed in (Yang
et al., 2011b) is a robust face classifier based on SRC.
An extra weighting term W is assigned to each pixel
of probe image. Pixels from outlier part of image
are less informative than central pixels. For instance
eyes and nose of a face have more information than
hair. Wei and Wang (2015) have proposed a similar
frame work to integrate auxiliary dictionary learning
and classification as follows:
min
x
kW(y[[d
1
, d
2
, ..., d
l
], ED][
x
d
x
ED
])k
2
2
+λkxk
1
(7)
where ED is external dictionary. They have pro-
posed two methods RADL
wo
for classification and
RADL
w
DL for dictionary learning and classification.
In RADL
w
DL method, ED is calculated based on an
optimization method on overcomplete external data
and each column of learned dictionary ED is called
as atom.
3 DICTIONARY LEARNING
The performance of SRC methods is limited by the
number of reference samples. For instance, the time
complexity respect to the number of sample data is
quadratic (Donoho and Tsaig, 2008). This challenge
has been addressed in the literature by applying a
compact representation or reducing the number of ref-
erence data. For instance, Wright et al. (2009) has
suggested a random selection of reference training
sample to reduce the time complexity although it im-
pacts the accuracy. A common solution in literature is
to reduce the time complexity of SRC by applying DL
techniques although most of them can only be applied
off-line because of their time complexity.
Olshausen and Field (1996) introduced dictionary
learning in the pattern recognition community (Ol-
shausen and Field, 1996). They proposed an unsu-
pervised method based on data structure that learns
the bases/atom of dictionary from training data which
is different from classical methods such as discrete
Fourier transform (DFT) and various types of Wavelet
methods which use fixed standardized format dictio-
naries. The DL problem can be viewed from different
perspectives like matrix factorization, risk minimiza-
tion and constrained variants.
DL methods have recently been applied as prepro-
cessing step for the SRC. For example Shafiee et al.
(2013) have investigated the effect of the impact on
performance of three different DL methods for SRC.
They used Metaface dictionary learning (Yang et al.,
2010b), Fisher Discriminative Dictionary Learning
Learning of Graph Compressed Dictionaries for Sparse Representation Classification
311
(FDDL) (Yang et al., 2011a), Sparse Modelling Rep-
resentative Selection (SMRS) (Elhamifar et al., 2012)
to obtain compact representation of training data.
They showed that the FDDL method provides a high
recognition accuracy compare to other methods al-
though SMRS method requires a less learning time
compare to others.
K-Means Singular Value Decomposition (KSVD)
(Aharon et al., 2006) and Method of Optimal Direc-
tions (MOD) (Engan et al., 1999) are two popular un-
supervised DL techniques which have been used in
the literature. These EM style methods alternate be-
tween dictionary and sparse coding. The difference
between these two methods are in dictionary updat-
ing, where KSVD updates atom by atom, and MOD
updates all the atoms simultaneously.
(Ram
´
ırez et al., 2010) proposed a framework to man-
age sparse modelling and clustering. They introduce
set of dictionaries which are optimized one for each
cluster. The learned data is a cluster of the union of
low dimensional subspaces. Most of the reported DL
methods require an overcomplete dictionary to gener-
alize well the intra class variations except (Wei and
Wang, 2015; Yang et al., 2013) to some extent. How-
ever DL methods based on SR provide a compact rep-
resentation of overcomplete dictionary that reduces
the time complexity of SRCs. However, they are al-
most NP-hard to execute so they are used mostly off-
line.
4 GRAPH COMPRESSION FOR
SRC
Most of DL methods for SRC require considerable
amount of construction time that increases with the
size of the reference data. To rapidly construct
over-complete dictionary without losing dictionary is
graph compression which changes the representation
of data and requires less memory. Depending on
the type of encoding these methods produce a lossy
or lossless compression. Data can be presented as
a collection of feature vectors or representation of
the similarity/dissimilarity relations among data sam-
ples. Therefore, it can be easily convert data to the
adjacency matrix of a weighted graph. Compres-
sion methods can be addressed using information-
theory to compress graphical structures (Choi and Sz-
pankowski, 2012) without preventing a graph struc-
ture as the compressed representation. As a second
category of methods relies on Szemer
´
edi regularity
lemma (Szemer
´
edi, 1978) that is a well-known re-
sult in extremal graph theory. He roughly states that
a dense graph can be approximated by a bounded
number of random bipartite graphs. An algorith-
mic version of this lemma has been used for ac-
celerating pairwise clustering (Sperotto and Pelillo,
2007). Finally, a compression method can take to ac-
count the structural information of data. For example
(Navlakha et al., 2008) propose a summarization al-
gorithm for unweighted graphs and (Toivonen et al.,
2011) suggested a greedy procedure to determine a
set of supernodes and superedges that to approximate
a weighted graph.
4.1 Graph-Compressed Dictionary
Learning (GCDL)
In this paper, a graph compression method is pro-
posed for application on large external dictionaries.
Although, using a compact dictionary representation
is not new, this method executes rapidly as a prepro-
cessing step of SRC. Because SRC methods are NP-
hard by nature, the homotopy method for the sparse
optimization has been selected in this paper due to its
time efficiency (Yang et al., 2010a).
Figure 1 illustrates the proposed method. Assume
that external data is collected a priori from non target
individual over a network of surveillance cameras. It
is assumed that the system can use a reference still
face image, and faces captured in surveillance cam-
eras for several non target persons. The first column
represents the still image of its corresponding row. An
overcomplete dictionary is constructed by calculating
the difference between each image and it correspond-
ing still image. A similarity graph is calculated based
on overcomplete dictionary, where each image is a
node in the graph and weighted edges present the sim-
ilarity between nodes. In the similarity Graph block,
the nodes that are marked with circles are atoms of
the compressed dictionary. Finally, each probe sam-
ple is sparse linear combination of reference stills and
compressed dictionary.
The edge weighted graph G = (V, E, w) represent-
ing a set of n vertices that each vertices V
i
is con-
nected to the vertices V
j
with an edge weighted and
E V ×V is the set of edges. The weight w(i, j) is
obtained from the following formula.
w(i, j) = exp(
−||ed(i) ed( j)||
2
2
σ
2
) (8)
where σ is a positive real number and it is bounded to
(0, 1], ed(i) is a feature vector from external overcom-
plete dictionary ED and ||.||
2
is the Euclidean distance
between the two values which gives dissimilarity be-
tween two considered elements. The graph G is com-
plete and undirected with order of n that is the size
of external overcomplete dictionary ED. Therefore
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
312
External Data
Overcomplete
Dictionary
Similarity Graph
+
Probe
image
Gallery
Compressed
Dictionary
With
Coefficient
Design Phase:
Operational Phase:
CD
Compressed
Dictionary
Figure 1: Schematic presentation of Graph Compressed Dictionary Learning (GCDL) method.
w(i, j) = w( j, i) for all (i, j) E. A graph G is called
a symmetric matrix G with order of n. Let k n be
a constant representing that is the number of atoms or
the order of compressed new graph C. The rate
k
n
is
regarded as the graph compression rate.
A many to one mapping ψ : [n] [k] is needed be-
tween vertices of the original graph and compressed
graph. The compressed graph must be determined to
reduce the order of graph from n to k. To estimate
the mapping function and compressed graph, a least
squares approximation is applied on following mini-
mizer by dropping a left-stochastic constrain to a real
matrix X.
min f (X,C) = kG X
T
CX k
2
2
s.t. X S , C R
k×k
.
(9)
where
f (X,C) =
(i, j)∈{1,2,...,N}
k,h∈{1,2,3,...K}
δ
(k,i)6=(h, j)
X
ki
X
h j
×(G
i j
C
kh
)
2
+
i∈{1,2,...,N}
k∈{1,2,3,...K}
X
ki
(G
ii
C
kk
)
2
.
The optimization can be addressed as a EM method
which alternates updates of the variable C and up-
dates of the variable X . The minimization approach
converges to a stationary point by updating a decrease
of the objective function in every iteration.
Update rule for C.
The update rule for the unconstrained matrix C,
U
C
(X), is obtained based on the first-order partial
derivative of f with respect to C to zero ((Nourbakhsh
et al., 2015), proof of Theorem for details).
Update rule for X.
U
X
(X,C) is a multiplicative rule for X that is in
similar manner to the ones suggested in (Lee and
Seung, 2001) for Non-negative Matrix Factorization
(NMF).
We say that X is Karush-Kuhn-Tucker (KKT-
)point for the following optimization if it satisfies the
first-order necessary conditions for the local optimal-
ity (details are given in (Nourbakhsh et al., 2015)).
min f
C
(X) = f (X ,C)
s.t. X Y
Z
(10)
where C R
k×k
, Z S and Y
Z
= {X S : (Z
ki
=
0) (X
ki
= 0)}.
Algorithm 1 provides the summary of GCDL
approach that works as follows. After calculating
weighted graph from overcomplete dictionary (ex-
plained in Figure 1), the minimization optimization
starts by a random selection of X S continued
by repeatedly alternate between updating C and X
with their respective update rules that are U
C
(X) and
U
X
(X,C), until convergence. The stopping criteria is
not when the distance between Xs of two consecutive
iterations is below a given threshold, and maximum
number of iterations reaches a threshold. This proce-
dure may converge to a local minima and it guarantees
Learning of Graph Compressed Dictionaries for Sparse Representation Classification
313
a strict decrease of the objective until a KKT-point is
reached. Finally, a discrete solution is obtained by
projecting of binary left stochastic matrices by setting
to 1 the element having highest value in each column
of X and put to 0 the rest.
Although, the above compression method is not
designed specificity for clustering, it converges good
clustering that generate compact representation of in-
put data. From this perspective, the mapping X en-
codes the clustering result. Then the representative
vertices of each cluster based on the mapping X pro-
duce the dictionary atoms. The complexity of a ma-
trix vector multiplication reduces from n
2
to (k
2
+ n).
Algorithm 1: Graph Compressed Dictionary Learning.
Input Data: Over-Complete Dictionary from Ex-
ternal Data, ED
Output: Graph Compressed Dictionary, D
GCDL
Graph G Calculate similarity matrix from ED
X draw a random matrix from S that is stochas-
tic matrices
while stopping criterion is not met do
C U
C
(X) /* Update C */
X U
X
(X,C) /* Update X */
end while
Project X to binary left stochastic matrix
D
GCDL
Select the representative of each cluster
based on the mapping matrix X
5 EXPERIMENTAL RESULTS
In this section the performance of the proposed
GCDL method is compared to several state of the-art
SRC
1
and DL methods: SRC (Wright et al., 2009),
RSC (Yang et al., 2011b), ESRC (Deng et al., 2012),
and RADL (Wei and Wang, 2015). Baseline methods
are categorized as methods without external dictio-
nary (SRC and RSC) with external dictionary (such as
ESRC and RADL
wo
) and with both dictionary learn-
ing and classification like (SVDL and RADL
w
DL). In
addition, the time required to reconstruct compressed
dictionary is also compared.
The results were obtained with images from the
CAS-PEAL database (Gao et al., 2008) that is a large-
scale Chinese face database which contains pose
variations, expression, accessory, lighting and back-
grounds. It contains facial captures from 1040 indi-
viduals (595 males and 445 females). For this exper-
iment, we follow the protocol that has been discussed
1
http://mml.citi.sinica.edu.tw/publications.html
in the paper of (Wei and Wang, 2015). 100 subjects
from the neutral category as gallery images D and
their corresponding distorted images from accessory
category were selected for testing. The accessory cat-
egory contains 3 images with hats and 3 images with
sunglasses so 600 images are collected for testing in
total. The 60 subjects with 6 instances of its accessory
category from the rest of 1040 individuals are chosen
as external data to build dictionary ED. The pixel-
based feature vector is obtained by downsampling the
original grey-scale face images to 50 ×40 pixels.
1) For the first experiment the performance of
GCDL is shown by varying the number of atoms
(compression rate) with respect to other methods. The
60 individuals are used as external data with 6 in-
stances for each. In our experiment, the size of exter-
nal data is increased in each experiment by randomly
selecting the data from 60 individuals that starts from
1 to 15 individuals for ESRC and RADL
wo
. In a same
manner, the external overcomplete dictionary is com-
pressed from 360 elements, to have the same number
of atom in each experiment.
Table 1 shows the accuracy of ESRC and RADL
wo
classification with different size of external data and
compressed data. The average accuracy is shown for
10 experiments. Because the standard deviation is al-
ways negligible, it is not reported in the Table 1. We
notice that by increasing the number of atoms, the
performance increases in general. The overall per-
formance for GCDL is generally higher than methods
without external dictionary learning. The result shows
that in low compression rate (equivalent to high num-
ber of atoms), accuracy is low compare to other meth-
ods. Since GCDL is based on clustering, the amount
of information added to enhance SRC declines as the
number of atoms grows. In another words, by increas-
ing the number of atoms close to the total number of
clusters, GCDL tends to partition data randomly with-
out using structural information of data.
2) In the second experiment, the performance
of GCDL is compared to some dictionary learning
methods followed by classification algorithms such
as RADL
w
DL and SVDL, then with K-SVD, MOD
and SMRS are used for dictionary learning followed
by RADL
wo
as classification. Results of ESRC and
RADL
wo
are provided by applying whole external data
without dictionary learning. Finally, results are also
provided for SRC and RSC as a base line. Since these
methods do not require external dictionary. The exe-
cution time of these methods are also compared. All
codes are implemented in MATLAB , using a 3.40
GHz and 8 GB RAM computer.
Table 2 shows that the average execution time
of constructing compressed dictionary by GCDL is
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
314
Table 1: Average classification accuracy of GCDL and references methods on CAS-PEAL data.
Methods with Number of Atoms
External Dictionary 6 12 18 30 60 90
ESRC 74.50% 74.50% 75.17% 75.83% 77.33% 78.33%
ESRC+GCDL 74.90% 75.57% 76.08% 76.35% 77.62% 77.85%
RDAL
wo
83.17 % 83.83% 84.33 % 84.00 % 85.90 % 86.83 %
RDAL
wo
+GCDL 83.95% 84.87% 85.53% 85.15% 85.97% 86.73%
Table 2: Average accuracy and reconstruction time of GCDL and references methods on CAS-PEAL data.
Methods with Dictionary Learning and Classification
Number of Atoms Dictionary Size Accuracy (%) Time (Sec)
SVDL 19 2000×360 82.33 % 1607.87
RADL
w
DL 18 2000×360 85.67% 924.62
Methods with Dictionary Learning and RADL
wo
as Classification
K-SVD 18 2000×360 84.96% 115.38
MOD 18 2000×360 85.17% 71.759
SMRS 17 2000×360 83.83% 3.28
Methods with Compression and RADL
wo
as Classification
GCDL 18 2000×360 86.00% 0.87
Apply whole Raw Dictionary
ESRC 360 2000×360 78.50% 169.69
RADL
wo
360 2000×360 87.00% 267.53
Methods without External Dictionary
SRC 72.67%
RSC 82.33%
much lower than other methods. In addition, our
method doesn’t rely on parameter tuning like K-SVD,
MOD and SMRS. And GCDL provides a comparable
level of accuracy.
6 CONCLUSIONS
In this paper, the Graph Compression Dictionary
Learning (GCDL) method based on matrix factoriza-
tion is proposed to construct compact representation
of overcomplete external data. The GCDL exploits
the structural information on external dictionary to
build compressed dictionary. It affects a trade-off
between time complexity and accuracy. Experiment
conducted with a high compression rate produces a
better accuracy. Therefore we show that it is more
robust to intra class variation compared to commonly
used dictionary learning methods on literature. As a
result, the proposed algorithm allows managing oc-
clusion face images, or illumination and expression.
Moreover, GCDL handles even one or few gallery
images per individuals. The result on CAS-PEAL
dataset show that GCDL has a better time efficiency
for the construction of compact dictionary. It can be
employed to accelerate many SRC approaches, and
the complexity of a matrix-vector multiplication can
be significantly reduced.
REFERENCES
Aharon, M., Elad, M., and Bruckstein, A. (2006). The k-
svd: An algorithm for designing overcomplete dictio-
naries for sparse representation. Trans. Signal Pro-
cessing, 54(11):4311–4322.
Bashbaghi, S., Granger, E., Sabourin, R., and Bilodeau,
G. (2014). Watch-list screening using ensembles
based on multiple face representations. In Inter-
national Conference on Pattern Recognition, pages
4489–4494.
Choi, Y. and Szpankowski, W. (2012). Compression of
graphical structures: Fundamental limits, algorithms,
and experiments. IEEE Trans. on Information Theory,
58(2):620–638.
Deng, W., Hu, J., and Guo, J. (2012). Extended src: Un-
dersampled face recognition via intraclass variant dic-
tionary. IEEE Trans. Pattern Analysis Machine Intel-
ligence, 34(9):1864–1870.
Dewan, M. A. A., Granger, E., Marcialis, G. L., Sabourin,
R., and Roli, F. (2016). Adaptive appearance model
tracking for still-to-video face recognition. Pattern
Recognition, 49:129–151.
Learning of Graph Compressed Dictionaries for Sparse Representation Classification
315
Donoho, D. L. and Tsaig, Y. (2008). Fast solution of l1-
norm minimization problems when the solution may
be sparse. Information Theory, IEEE Transactions on,
54(11):4789–4812.
Elhamifar, E., Sapiro, G., and Vidal, R. (2012). See all by
looking at a few: Sparse modeling for finding repre-
sentative objects. In IEEE Conference on Computer
Vision and Pattern Recognition,, pages 1600–1607.
Engan, K., Aase, S. O., and Hakon Husoy, J. (1999).
Method of optimal directions for frame design. In In-
ternational Conference of Acoustics, Speech, and Sig-
nal Processing, pages 2443–2446.
Gao, W., Cao, B., Shan, S., Chen, X., Zhou, D., Zhang, X.,
and Zhao, D. (2008). The cas-peal large-scale chinese
face database and baseline evaluations. IEEE Trans.
System Man Cybernetics Part A, 38(1):149–161.
Lee, D. D. and Seung, H. S. (2001). Algorithms for non-
negative matrix factorization. In Advances in Neural
Information Processing Systems 13, pages 556–562.
Mairal, J., Bach, F., and Ponce, J. (2014). Sparse modeling
for image and vision processing. Foundations Trends
in Computer Graphics and Vision, 8(2-3):85–283.
Mokhayeri, F., Granger, E., and Bilodeau, G. (2015). Syn-
thetic face generation under various operational con-
ditions in video surveillance. In International Confer-
ence on Image Processing.
Navlakha, S., Rastogi, R., and Shrivastava, N. (2008).
Graph summarization with bounded error. In Inter-
national Conference on Management of Data (ACM),
pages 419–432.
Nourbakhsh, F. (2015). Algorithms for Graph Compres-
sion: Theory and Experiments. PhD thesis, Diparta-
mento di Scienze Ambientali, Informatica e Statistica,
Universit
´
a Ca’Foscari, Venice, Italy.
Nourbakhsh, F., Bul
`
o, S. R., and Pelillo, M. (2015). A ma-
trix factorization approach to graph compression with
partial information. International Journal of Machine
Learning & Cybernetics, 6(4):523–536.
Olshausen, B. A. and Field, D. J. (1996). Emergence of
simple-cell receptive field properties by learning a
sparse code for natural images. Nature, 381:607–609.
Ram
´
ırez, I., Sprechmann, P., and Sapiro, G. (2010). Clas-
sification and clustering via dictionary learning with
structured incoherence and shared features. In IEEE
Conference on Computer Vision and Pattern Recogni-
tion, pages 3501–3508.
Shafiee, S., Kamangar, F., Athitsos, V., and Huang, J.
(2013). The role of dictionary learning on sparse
representation-based classification. In International
Conference on PErvasive Technologies Related to As-
sistive Environments, PETRA ’13, pages 47:1–47:8.
Sperotto, A. and Pelillo, M. (2007). Szemer
´
edis regular-
ity lemma and its applications to pairwise clustering
and segmentation. In International Conference of En-
ergy Minimization Methods in Computer Vision and
Pattern Recognition, volume 4679 of Lecture Notes in
Computer Science, pages 13–27.
Su, Y., Shan, S., Chen, X., and Gao, W. (2010). Adap-
tive generic learning for face recognition from a sin-
gle sample per person. In International Conference
on Computer Vision and Pattern Recognition, pages
2699–2706.
Szemer
´
edi, E. (1978). Regular partitions of graphs. In
Probl
`
emes combinatoires et thorie des graphes, pages
399–401.
Tan, X., Chen, S., hua Zhou, Z., and Zhang, F. (2006).
Face recognition from a single image per person: A
survey. International Journal of Pattern Recognition,
39:1725–1745.
Tillmann, A. M. (2015). On the computational intractability
of exact and approximate dictionary learning. IEEE
Signal Processing Letter, 22(1):45–49.
Toivonen, H., Zhou, F., Hartikainen, A., and Hinkka, A.
(2011). Compression of weighted graphs. In Interna-
tional Conference on Knowledge Discovery and Data
Mining (ACM), pages 965–973.
Wei, C. and Wang, Y. F. (2015). Undersampled face recog-
nition via robust auxiliary dictionary learning. IEEE
Transactions on Image Processing, 24(6):1722–1734.
Wright, J., Yang, A. Y., Ganesh, A., Sastry, S. S., and Ma,
Y. (2009). Robust face recognition via sparse repre-
sentation. IEEE Trans. Pattern Analysis Machine In-
telligence, 31(2):210–227.
Yang, A. Y., Ganesh, A., Zhou, Z., Sastry, S., and Ma, Y.
(2010a). Fast l
1
-minimization algorithms for robust
face recognition: A review. International Conference
on Image Processing, pages 1849–1852.
Yang, M., Van, L., and Zhang, L. (2013). Sparse variation
dictionary learning for face recognition with a single
training sample per person. In International Confer-
ence on Computer Vision, pages 689–696.
Yang, M., Zhang, L., Feng, X., and Zhang, D. (2011a).
Fisher discrimination dictionary learning for sparse
representation. In International Conference on Com-
puter Vision, pages 543–550.
Yang, M., Zhang, L., Yang, J., and Zhang, D. (2010b).
Metaface learning for sparse representation based face
recognition. In International Conference on Image
Processing,, pages 1601–1604.
Yang, M., Zhang, L., Yang, J., and Zhang, D. (2011b).
Robust sparse coding for face recognition. In Inter-
national Conference on Computer Vision and Pattern
Recognition, pages 625–632.
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
316