test data set, respectively. Then, the Pearson
product-moment correlation coefficient r between
and
was calculated by
∑
,
,
∑
,
∑
,
∑
,
∑
,
∑
,
∑
,
(5)
where and denote the pixel position and is the
number of pixels. We simplified this as the phi
coefficient (Guilford, 1941) if the bit depth of
and
were 1 (i.e. binary image). However, calculating
the Pearson product-moment correlation coefficient
required high cost operations. Thus, we proposed a
new correlation score method, XNOR+AND
correlation score u, as follows:
,
↓
,
,
∧
,
(6)
where ↓ and ∧ denote the XNOR and
AND operation, respectively. Since the range of
varied between 0 and 2, was normalized by
2
(7)
If the reference image is fixed, this normalization
can be skipped for faster computation. Both the
XNOR and AND operations presented correlations
between two binary images. The XNOR operation
showed information about the overlapping areas of
both the face feature regions (1s) and the non-face
feature regions (0s). On the other hand, the AND
operation considered only the face feature regions.
The non-face feature regions (0s) also provided
information on the overall face shape. However, the
AND operation considered only the facial regions.
On the other hand, the XNOR operation considered
both facial and non-facial regions. However, noise
areas in the facial regions and those in the non-facial
feature regions were treated equally. By using both
the AND operation and the XNOR operation, noise
areas in the non-face feature regions had fewer
influences on the correlation score estimation.
4.4 Head Pose Variation Compensation
Head pose estimation is one of the important issues
in face recognition. 3D model-based methods,
learning-based methods and active appearance
models are frequently used for pose-invariant face
recognition. However, these methods were not
suitable since reference sets continuously vary and
the necessary long processing times may not be
useful with mobile devices.
We assumed that a small amount of head
rotations in yaw and pitch angles can be ignored by
using binary images. Also, a dilation operation was
applied to minimize the small differences in the pose
variations. The proposed XNOR+AND score was
robust against these head pose variations.
(a)
(b) (c) (d) (e)
Figure 5: Cropped facial images taken from original
images of Samsung Galaxy S3 (upper row), Sky Vega X
(middle row) and Samsung NX10 (lower row). (a) frontal
face (b) -15 ~ -45˚ yaw (c) +15 ~ +45˚ yaw (d) -15 ~ -45˚
pitch (e) +15 ~ +45˚ pitch angle tilted.
5 EXPERIMENTS
All the reference images were manually cropped to
6464 pixels. The test image sizes were generally
larger than those of the reference images.
Recognition rates of gray images (8-bit images) with
and without the pre-processing procedures are
shown below for comparison.
Our test database consisted of 135 indoor facial
images taken from two mobile phones and a DSLR
camera. There were nine subjects with five different
head poses, including frontal faces. As shown in Fig.
5, 15 facial images were taken for each person and
they sometimes contained blurred images.
Table 1 shows a performance comparison. Scale
pyramid image registration was applied to all the
methods. The proposed method with the
XNOR+AND similarity measure achieved the best
overall performance (85.93%). The proposed
method also showed the best performance among all
pose variations. With the grayscale images, which
were produced from the RGB images without the
pre-processing procedure, the overall accuracy was
61.48%. When the pre-processing procedure was
used (Tan and Triggs 2007), the overall accuracy
improved to 68.15%. When the Pearson correlation
was used, the overall recognition rate was 83.7%.
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
592