ON THE EFFECT OF PERSPECTIVE DISTORTIONS

IN FACE RECOGNITION

Zahid Riaz and Michael Beetz

Intelligent Autonomous Systems (IAS), Department of Computer Science,

Technical University of Munich, Munich, Germany

Keywords:

Face Recognition, Active Appearance Models, Feature Extraction, Biometrics.

Abstract:

Face recognition is one of the widely studied topics in the literature image processing and pattern recognition.

Generally for the face images, the distance between camera and face is larger than the face size, hence in

practice the effects of perspective distortions on the face edges are often ignored by the researchers. While

these effects become more prominent if faces are viewed from different angles. In this paper, we study effects

of perspective distortion and obtain improved results for face recognition against varying view-points. The

approach follows by ﬁtting a 3D model to the face images and creating a texture map by texture rectiﬁcation

at each triangle level. We compare our results with active appearance models (AAM) on two standard face

databases.

1 INTRODUCTION

Over the past few years, several face recognition sys-

tems have been introduced in the commercial market

and have been successfully applied under different

scenarios (Zhao et al., 2003). Despite these efforts,

the performance of the face recognition systems is

still below iris and automated ﬁngerprints identiﬁca-

tion systems (AFIS) due to uniqueness, permanence

and reliability issues (Jain et al., 2005). However,

on the other hand faces are natural way of interac-

tion and non-intrusive. Cognitive sciences explains

human face as a complex 3D object whose defor-

mations can be controlled in a low dimensional sub-

space (O’Toole, 2009)(Blanz and Vetter, 2003) show-

ing several meaningful variations which are challeng-

ing for the researchers towards the development of

a face recognition system. These variations include

changing view-points, varying lighting conditions and

illuminations, facial expressions, temporal deforma-

tions like aging and image acquisition under differ-

ent sessions of the day, occlusions and nevertheless

spooﬁng.

In this paper we study face recognition problem

under varying view-points and facial expressions by

using a sparse 3D face model. Since faces are cap-

tured in actions, some of the facial areas are under

self occlusions and considered as missing informa-

tion. However, under slight out-of-plane rotations m-

ost of the facial areas are tilted and have very less

textural information due to their oblique shape. In

conventional face modeling approaches (Cootes et al.,

1998), the distortions of these areas at the face edges

are ignored by assuming that the distance between

camera and the face is larger than the original face

size. In these cases, effects of perspective distortions

are ignored to obtain a high recognition rate. How-

ever, we study this problem in detail and obtain an

improved performance by considering the effect of

perspective distortion on local facial areas. The con-

tributions of this paper are two fold: (1) to develop

an unconstrained face recognition system which is ro-

bust against varying facial poses and slight facial ex-

pressions, (2) to study the effect of generally ignored

perspective distortions on facial surface and recom-

mend texture rectiﬁcation. For this purpose, we use

a generic 3D wireframe face model called Candide-

III (Ahlberg, 2001). This model is deﬁned with 184

triangular patches representing different areas on the

surface of a 3D face. These are sparse ﬂat trian-

gles and capable to deform under facial action cod-

ing system (FACS) (Ekman and Friesen, 1978), ac-

tion units and MPEG-4 facial animations units (Li and

Jain, 2005). Each triangle deﬁnes a texture which is

stored in a standard texture map by using camera ro-

tation and translation. This rectiﬁed texture increases

the recognition rate as compared to conventional 2D

active appearance models (AAM) (Edwards et al.,

718

Riaz Z. and Beetz M..

ON THE EFFECT OF PERSPECTIVE DISTORTIONS IN FACE RECOGNITION.

DOI: 10.5220/0003859107180722

In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2012), pages 718-722

ISBN: 978-989-8565-03-7

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

1998) on standard face databases. Model based ap-

proaches for human faces obtained popularity in last

decade due to their compact and detailed representa-

tion (Blanz and Vetter, 2003)(Abate et al., 2007)(Park

et al., 2004)(Cootes et al., 1998). In the literature of

face modeling, some useful face models are point dis-

tribution models (PDM), 3D morphable models, pho-

torealistic models, deformable models and wireframe

models (Abate et al., 2007).

The remaining part of the paper is divided in three

main section. Section 2 discusses face modeling in

detail. In section 3, we thoroughly provide experi-

ments performed using our approach as compared to

conventional AAM. Finally section 4 gives conclu-

sions of our work with future extension of this ap-

proach.

2 HUMAN FACE MODELING

We study structural and textural parameterization of

the face model separately in this section.

2.1 Structural Modeling and Model

Fitting

Our proposed algorithm is initialized by applying a

face detector in the given image. We use Viola and

Jones face detector (Viola and Jones, 2004). If a face

is found then the system proceeds towards face model

ﬁtting. Structural features are obtained after ﬁtting

the model to the face image. For model ﬁtting, lo-

cal objective functions are calculated using haar-like

features. An objective function is a cost function

which is given by the equation 1. A ﬁtting algorithm

searches for the optimal parameters which minimizes

the value of the objective function. For a given image

I, if E(I, c

(p)) represents the magnitude of the edge

at point c

(p), where p represents set of parameters

describing the model, then objective function is given

by:

f (I, p) =

∑

i=1

(I, c

(p)) =

∑

i=1

(1 − E(I, c

(p)))

(1)

Where n = 1, . . . , 113 is the number of vertices c

describing the face model. This approach is less prone

to errors because of better quality of annotated images

which are provided to the system for training. Further,

this approach is less laborious because the objective

function design is replaced with automated learning.

For details we refer to (Wimmer et al., 2008).

The geometry of the model is controlled by a set

of action units and animation units. Any shape s can

be written as a sum of mean shape s and a set of action

units and shape units.

s(α, σ) = s + φ

α + φ

σ (2)

Where φ

is the matrix of action unit vectors and

is the matrix of shape vectors. Whereas α denotes

action units parameters and σ denotes shape param-

eters (Li and Jain, 2005). Model deformation gov-

erns under facial action coding systems (FACS) prin-

ciples (Ekman and Friesen, 1978). The scaling, rota-

tion and translation of the model is described by

s(α, σ, π) = mRs(α, σ) +t (3)

Where R and t are rotation and translation matri-

ces respectively, m is the scaling factor and π contains

six pose parameters plus a scaling factor. By chang-

ing the model parameters, it is possible to generate

some global rotations and translations. We extract 85

parameters to control the structural deformation.

2.2 View Invariant Texture Extraction

The robustness of textural parameters depend upon

the quality of input texture image. We consider per-

spective transformation because afﬁne warping of the

rendered triangle is not invariant to 3D rigid transfor-

mations. Afﬁne warping works reasonably well if the

triangle is not tilted with respect to the camera coor-

dinate frame. However, most of the triangles on the

edges are tilted and hence texture is heavily distorted

in these triangles. In order to solve this problem we

ﬁrst apply perspective transformation. Since, the 3D

position of each triangle vertex as well as the cam-

era parameters are known, we determine the homo-

geneous mapping between the image plane and the

texture coordinates by using homography H.

H = K.



−R.t



(4)

Where K, R and t denotes the camera matrix, rota-

tion matrix and translation vector respectively, r

and

are the components of rotation matrix. It maps a

2D point of the texture image to the corresponding

2D point of the rendered image of the triangle. A pro-

jection q of a general 3D point p in homogeneous co-

ordinates is,

q = K.



R −R.t



. p (5)

Each 3D homogeneous point lying on a plane with

z = 0, i.e. p = (x y 0 1) leads to above equation. If p

being the homogeneous 2D point in texture coordi-

nates then,

ON THE EFFECT OF PERSPECTIVE DISTORTIONS IN FACE RECOGNITION

719

Figure 1: Detailed texture extraction approach described in

section 2.2.

q = K.



−Rt















= H. p

Where R and t are the unknowns to be calculated.

We use the upper triangle from each block of a rect-

angular texture map (see Figure 3) for storing the tex-

ture patches. In order to ﬁt any arbitrary triangle to

this upper triangle, we use an afﬁne transformation A.

The ﬁnal homogeneous transformation M is given by,

M = A.K.



R −R.t



= A.H (6)

The extracted texture vector g is parameterized by

using principal component analysis (PCA) with mean

vector g

and matrix of eigenvectors P

to obtain the

parameter vector b

(Li and Jain, 2005).

g = g

+ P

(7)

2.3 Optimal Texture Representation

Each triangular patch represents meaningful texture

which is stored in a square block of the texture map.

A single unit of the texture map represents a triangu-

lar patch. We experiment with three different sizes

of the texture blocks and choose an optimal size for

our experimentation. These three block sizes include

× 2

, 2

× 2

and 2

× 2

. We calculate energy

function from these texture maps of individual per-

sons and observe the energy spectrum of the images

in our database for each triangular patch. If N is the

total number of images, and p

be a texel value (which

is equal to a single pixel value) in texture map, then

we deﬁne energy function as:

∑

i=1

− p

)

(8)

Where

is the mean value of the pixels in j

block, j = 1 . . . M and M = 184 is the number of

blocks in a texture map. In addition to Equation 8, we

ﬁnd variance energy by using PCA for each block and

Figure 2: Energy spectrum of two randomly selected sub-

jects from PIE database. Energy values for each patch is

comparatively calculated and observed for three different

texture sizes.

observe the energy spectrum. The variation within

the given block has similar behavior for two kinds of

energy functions except a slight variation in the en-

ergy values. Figure 2 shows the energy values for two

different subjects randomly chosen from our experi-

ments. It can be seen from Figure 2 that behavior of

the textural components is similar between different

texture sizes. The size of the raw feature vector ex-

tracted directly from texture map increases exponen-

tially with the increase of texture block size. If d × d

is the size of the block, then the length of the raw

feature vector is

d(d+1)

. This vector length calcula-

tion depends upon how texture is stored in the texture

map. This can be seen in Figure 3. We store each

triangular patch from the face surface to upper trian-

gle of the texture block. The size of raw feature vec-

tor extracted for d = 2

, d = 2

and d = 2

is 6624,

25024 and 97152 respectively. Any higher value will

exponentially increase the raw vector without any im-

provement in the texture energy. We do not consider

higher values due to increase in vector length. The

overall recognition rate produced by different texture

sizes from eight randomly selected subjects with 2145

images from PIE database is shown in Figure 4. The

results are obtained using decision trees and Bayesian

networks for classiﬁcation. The classiﬁcation proce-

dure is given in detail in next section 3. By trading off

between the performance and size of the feature vec-

tors, we choose texture block size to 16 × 16 during

our experiments.

3 EXPERIMENTATION

In order to study perspective effects on face im-

ages, we experiment mainly on PIE-database (Ter-

VISAPP 2012 - International Conference on Computer Vision Theory and Applications

720

Figure 3: Texture from each triangular patch is stored as

upper triangle of the texture block in texture map. A raw

feature vector is obtained by concatenating the pixel values

from each block.

Figure 4: Comparison over eight random subjects from the

database with three different sizes of texture blocks. Recog-

nition rate slightly improved as texture size is increased

however causes a high increase on the length of raw feature

vector. We compromise on texture block of size 16 × 16.

ence et al., 2002) for face recognition and verify this

fact on FG-NET (FG-NET, 2011) for age estimation.

There are two sessions of PIE database captured from

1) October 2000 to November 2000 and 2) November

2000 to December 2000. The database contains pose,

illuminations and facial expression variations. We ex-

periment on all subject from second session of the

database which consists of 15 subjects with 9162 im-

ages. Since our algorithm starts with Viola and Jones

face detector (Viola and Jones, 2004), hence we con-

sider only those images where face detector results

are positive. In this ﬁlteration we obtain 3578 im-

ages where faces are successfully detected. The im-

ages with high pose variations (proﬁle poses), dark ef-

fects and illuminations are ﬁltered out in this step. We

obtain images with frontal, half proﬁle in both direc-

tions, looking upwards and looking downwards faces.

Texture is extracted from each image by using method

described in section 2.2. We obtain 3578 vectors of

25024 length. For dimensionality reduction, we use

40% of the data randomly selected from these raw

features to learn PCA based subspace. For all texture

sizes, we retain 97% of the covariance by choosing

among the eigenvalues. The remaining data is pro-

jected on this space to obtain parameters which serve

as feature vectors. The size of the parameter vector

for three different textures is almost equal. This vec-

tor length is 188, 185 and 186 respectively for 8 × 8,

16 × 16 and 32 × 32.

For classiﬁcation purpose, we apply decision tree.

However, other classiﬁers can also be applied depend-

Table 1: Comparison of traditional AAM approach and rec-

tiﬁed texture. The results are shown for textural parameters

and combined structural and textural parameters.

Database 2D Texture

parame-

ters

Rectiﬁed

Textural

Parameters

AAM 3D Struc-

tural +

Textural

Parameters

PIE 63.02% 69.64% 79.93% 84.15%

FG-NET 51.35% 54.09% 51.15% 55.39%

ing upon the application (Bayesian Networks (BN)

were also used with comparable results during exper-

imentation (refer Figure 4)). We choose J48 decision

tree with 10-fold cross validation algorithm for ex-

perimentation which uses tree pruning called subtree

raising and recursively classiﬁes until the last leave is

pure. The parameters used in decision tree are: con-

ﬁdence factor C = 0.25, with minimum two number

of instances per leaf and C4.5 approach for reduced

error-pruning (Witten and Frank, 2005). Face recog-

nition rate under varying poses and facial expressions

is given in Table 1. In order to verify the effect of

perspective distortions, we further study age classiﬁ-

cation from all subjects of FG-NET database. This

database consists of 1002 images of 62 subjects with

age ranging from 0 −69 years. We divide the database

in seven groups with 10 years band. The results are

shown in Table 1.

4 CONCLUSIONS AND FUTURE

WORK

In this paper, we study a fact that performance of a

face recognition system can be improved by consid-

ering the perspective effect on a face image. This is-

sue is generally not given an attention by the research

community. By ignoring this effect, better recogni-

tion rate can be achieved however by considering this

effect further improvements are achieved. This ap-

proach gives equal weights to each triangular patch

on the surface of the face. However, different weights

can also be applied by considering the context and

prior knowledge of the facial deformations. A remedy

to texture rectiﬁcation is proposed by applying a 3D

model, which is sparse and achieves better recogni-

tion. The triangular patches which represent the face

surface are ﬂat, however curved triangles can further

improve the results. Since, the proposed solution con-

sists of 3D modeling, it is also recommended to use it

for light modeling to obtain illumination invariance.

ON THE EFFECT OF PERSPECTIVE DISTORTIONS IN FACE RECOGNITION

721

REFERENCES

Abate, A., Nappi, M., Riccio, D., and Sabatino, G. (2007).

2d and 3d face recognition: A survey. Pattern Recog-

nition Letters, 28:1885–1906.

Ahlberg, J. (2001). An experiment on 3d face model adap-

tation using the active appearance algorithm. Im-

age Coding Group, Deptt of Electric Engineering,

Link

oping University.

Blanz, V. and Vetter, T. (2003). Face recognition based

on ﬁtting a 3d morphable model. IEEE Transac-

tions on Pattern Analysis and Machine Intelligence,

25(9):1063–1074.

Cootes, T., Edwards, G., and Taylor, C. (1998). Active ap-

pearance models. Proceedings of European Confer-

ence on Computer Vision, 2:484–498.

Edwards, G., Cootes, T., and Taylor, C. J. (1998). Face

recognition using active appearance models. In 5th

European Conference on Computer Vision-Volume II,

ECCV ’98, pages 581–595, London, UK. Springer-

Verlag.

Ekman, P. and Friesen, W. (1978). The facial action cod-

ing system: A technique for the measurement of facial

movement. Consulting Psychologists Press.

FG-NET (2011). Fg-net aging database.

http://www.fgnet.rsunit.com/.

Jain, A. K., Bolle, R. M., and Pankanti, S. (2005). Bio-

metrics: Personal Identiﬁcation in Networked Society.

Springer.

Li, S. Z. and Jain, A. K. (2005). Handbook of Face Recog-

nition. Springer.

O’Toole, A. J. (2009). Cognitive and computational ap-

proaches to face recognition. The University of Texas

at Dallas.

Park, I. K., Zhang, H., Vezhnevets, V., and Choh, H. (2004).

H.k.: Image-based photorealistic 3-d face modeling.

In: FGR, pages 49–56.

Terence, S., Baker, S., and Bsat, M. (2002). The cmu

pose, illumination, and expression (pie) database. In

FGR ’02: Fifth FGR, page 53, Washington, DC, USA.

IEEE Computer Society.

Viola, P. and Jones, M. J. (2004). Robust real-time face

detection. International Journal of Computer Vision,

57(2):137–154.

Wimmer, M., Stulp, F., Pietzsch, S., and Radig, B. (2008).

Learning local objective functions for robust face

model ﬁtting. IEEE PAMI, 30(8):1357–1370.

Witten, I. H. and Frank, E. (2005). Data Mining: Practi-

cal machine learning tools and techniques. Morgan

Kaufmann, San Francisco.

Zhao, W., Chellappa, R., Phillips, P. J., and Rosenfeld, A.

(2003). Face recognition: A literature survey.

VISAPP 2012 - International Conference on Computer Vision Theory and Applications

722