images per person. This is called the small sample
problem (SSP). Many research attempts target SSP
(Yan et al., 2014; Lu et al., 2013; Su et al., 2010), and
in this paper we also propose a new algorithm to deal
with few training examples for face identification.
Related Work. The first successful face recog-
nition algorithm, called Eigenfaces (Turk and Pent-
land, 1991), was based on the nowadays well-known
subspace method principal component analysis. An-
other often used method is Fisherfaces (Belhumeur
et al., 1997) that uses linear discriminant analysis.
These methods can perform well if a large amount
of correctly aligned and normalized face data is avail-
able. However, since they directly use pixel intensi-
ties as input data, pose variances and alignment errors
can easily deteriorate the performance of these algo-
rithms.
To cope with the noise caused by illumination
and pose variances, edge and local feature extraction
based methods have been proposed. Some of the best
known of these are Gabor filters (Jemaa and Khan-
fir, 2009), the histogram of oriented gradients (HOG)
(Dalal and Triggs, 2005), the scale invariant feature
transform (SIFT) (Lowe, 2004) and local binary pat-
terns (LBP) (Ahonen et al., 2004). These methods
have been shown to yield better performances than
the use of Eigenfaces or Fisherfaces. However, with-
out additional preprocessing on the input data and a
sufficient number of training images, they cannot very
well handle pose differences or alignment errors.
To cope with pose differences and alignment prob-
lems, the bag of words (BOW) method (Csurka et al.,
2004), which has been successfully applied for differ-
ent computer vision problems (Shekhar and Jawahar,
2012; Montazer et al., 2015), was proposed for the
face recognition problem (Li et al., 2010; Wu et al.,
2012). In this method, input images are treated non-
holistically by their many sub-images. These sub-
images are processed by a clustering algorithm to cre-
ate a codebook (the bag of words) and this codebook
is then used to extract feature vectors from images
which are finally given to the classifier.
Similarly to the BOW approach, in (Simonyan
et al., 2013), many sub-images processed by the SIFT
descriptor are used to train gaussian mixture models
to compute improved Fisher vectors (Perronnin et al.,
2010) for face verification. The results reported in
their paper are comparable with the results of state-
of-the-art face verification papers.
As for classifiers used for face recognition, k-
nearest neighbour (K-NN), support vector machines
(SVM) (Vapnik, 1998) and artificial neural networks
(ANN) have been shown to be successful. If classifier
speed is important and features from face images are
selected robustly, then K-NN can be a good choice.
Since no training is required for using the K-NN clas-
sifier, it is practical for fast face recognition applica-
tions, in which possibly new people are continuously
added to the dataset. However, if accuracy is more
important than speed, then an SVM (Wei et al., 2011)
and an ANN can be preferable, even though they need
retraining in case the dataset is augmented with new
people and images.
Convolutional neural networks (CNNs), as a pow-
erful feature extractor and classifier, are currently
considered by researchers as one of the state-of-the-
art machine learning algorithms. CNN is a special
kind of multi-layer perceptron, which has many spe-
cialized layers used for feature extraction and classifi-
cation. In a recent CNN based face verification study
(Parkhi et al., 2015), a novel database construction
and a CNN architecture are presented. Here, they con-
struct a face database with 2.6K subjects composing
of total 2.6M images from Internet, removing the du-
plicate images by employing a state-of-the face recog-
nition application as well as a group of human annota-
tors. After the database construction, they optimize a
relatively simpler new CNN which integrates a com-
bination of the most efficient features of the state-of-
the-art CNNs proposed recently for face recognition.
The SVM has also several varieties. Although it
was first proposed as a linear classifier, non-linear
models have been proposed to classify data sets,
which are not separable with the standard linear SVM.
Another popular SVM algorithm is the L2-norm reg-
ularized SVM (L2-SVM) (Koshiba and Abe, 2003;
Deng et al., 2012). It is used to tackle the problem
that occurs when the size of the feature vectors is very
long (e.g. more than 2,000 items) which cannot be
handled very efficiently by the standard SVM.
2 FACE RECOGNITION BY THE
HOG-BOW METHOD
Contributions. In this paper, as our main contribu-
tion, a bag of words (BOW) algorithm is proposed
that uses feature vectors extracted with the histogram
of oriented gradients (HOG) to recognize faces under
small sample per person conditions (SSPP). Although
the HOG and BOW algorithms are well-known algo-
rithms, to the best of our knowledge, the combination
of them is not evaluated for face recognition, espe-
cially in the case of SSPP.
In our method, a K-means clustering algorithm
is used to compute the visual codebook from fea-
ture vectors extracted by HOG from many randomly
cropped sub-images. Then this codebook is used to
Robust Face Identification with Small Sample Sizes using Bag of Words and Histogram of Oriented Gradients
583