A Statistical Quadtree Decomposition to Improve Face Analysis

Vagner Amaral

, Gilson A. Giraldi

and Carlos E. Thomaz

Department of Electrical Engineering, Centro Universitario da FEI, Av. Humberto de Alencar Castelo Branco 3972,

Sao Bernardo do Campo, Sao Paulo, Brazil

Department of Computer Science, LNCC, Av. Getulio Vargas 333, Petropolis, Rio de Janeiro, Brazil

Keywords:

Spatial Mapping, Task Driven, Face Recognition, Local Binary Pattern, Quadtree.

Abstract:

The feature extraction is one of the most important steps in face analysis applications and this subject always

received attention in the computer vision and pattern recognition areas due to its applicability and wide scope.

However, to deﬁne the correct spatial relevance of physiognomical features remains a great challenge. It has

been proposed recently, with promising results, a statistical spatial mapping technique that highlights the most

discriminating facial features using some task driven information from data mining. Such priori information

has been employed as a spatial weighted map on Local Binary Pattern (LBP), that uses Chi-Square distance as

a nearest neighbour based classiﬁer. Intending to reduce the dimensionality of LBP descriptors and improve

the classiﬁcation rates we propose and implement in this paper two quad-tree image decomposition algorithms

to task related spatial map segmentation. The ﬁrst relies only on split step (top-down) of distinct regions and

the second performs the split step followed by a merge step (bottom-up) to combine similar adjacent regions.

We carried out the experiments with two distinct face databases and our preliminary results show that the

top-down approach achieved similar classiﬁcation results to standard segmentation using though less regions.

1 INTRODUCTION

The feature extraction is one of the most impor-

tant steps in face analysis applications and this sub-

ject always received attention in the computer vision

and pattern recognition areas due to its applicability

and wide scope. However, deﬁning the correct spa-

tial relevance of physiognomical features remains a

great challenge (Blais et al., 2012). In the last few

years a method called Local Binary Pattern (LBP) has

been successfully used in face analysis research (Shan

et al., 2009; Pietik

ainen et al., 2011; Shan, 2012; Tor-

risi et al., 2015; Santarcangelo et al., 2015). Nev-

ertheless, many studies on this method have ignored

the contribution provided by the contextual informa-

tion (Shan et al., 2005; Shan et al., 2009; Shan,

2012). Thus, a recently proposed approach (Ama-

ral et al., 2013) has used a statistical technique, em-

ploying some task driven information from data min-

ing, to highlight the most discriminant facial features

and provide a spatial weighted map to LBP. This ap-

proach has enabled subsequent studies to explore the

relevance of physiognomical features according to the

task under investigation (Amaral et al., 2014; Amaral

et al., 2015). At ﬁrst, they employed the uniform grid,

that consists of a square non-overlapped segmentation

of the face images to extract the features descriptors.

But then, they also investigated a non uniform seg-

mentation and concluded that it would be an interest-

ing way to describe facial features with few regions.

In this context, the aim of this paper is to improve

the segmentation of the task-driven statistical spatial

maps, intending to reduce the dimensionality of LBP

spatial feature descriptors and improve the accuracy

of classiﬁcation in face analysis applications. We ex-

pect to show as a result that the adaptive decompo-

sition emphasises the facial features by their contex-

tual relevance, providing more consistent spatial seg-

ments, that is, segments that contain more pixels with

similar values. More speciﬁcally, in this study we in-

vestigate the spatial segmentation of facial features in

gender and facial expression classiﬁcations.

This paper is organised as follows. Next, in sec-

tion 2, we review the LBP. In section 3 we show

the statistical spatial mapping. Then, section 4 de-

scribes the quadtree image segmentation techniques

and their application in this study. Experiments and

results have been explained in sections 5 and 6, re-

spectively. Finally, in section 7, we conclude the pa-

per, discussing its main contribution and future works.

Amaral, V., Giraldi, G. and Thomaz, C.

A Statistical Quadtree Decomposition to Improve Face Analysis.

DOI: 10.5220/0005823903750380

In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2016), pages 375-380

ISBN: 978-989-758-173-1

375

2 LOCAL BINARY PATTERN

Initially implemented as a texture operator (Ojala

et al., 1996), LBP has been widely employed in

face image processing due to its low computational

complexity and invariance to monotonic gray level

changes (Ahonen et al., 2006). In short, the original

approach labels the pixels of an image to encode the

local structure around each pixel by thresholding the

neighbourhood by its center pixel value:

i j

(

0, v

i j

< v

1, v

i j

≥ v

(1)

where v

is center pixel value, v

i j

is the pixel value in

(i, j) position, wherein 1 ≤ i, j ≤ N and N is the neigh-

bourhood size. The b

i j

values are concatenated as a

binary number and converted to decimal basis to label

the central pixel. Figure 1 shows the images before

and after this process. The output image is divided

(a) (b)

Figure 1: LBP initial step: a) Original image; b) LBP pixel

labeled image.

into R

regions, j = 1, 2, ..., N, usually arranged in a

regular grid. The textures descriptors are taken out

from each region R

by their histograms of LBP la-

bels that are grouped in a single feature vector. In the

classiﬁcation process the distinct relevance of phys-

iognomical features are often emphasized (Zhao et al.,

2003). Therefore speciﬁc w

weights are deﬁned for

each R

region. In this work, for example, we used a

Chi-Square distance (Ahonen et al., 2006):

(x, y) =

∑

i, j

− y

i, j

)

i, j

+ y

i, j

, (2)

where x and y are feature vectors to be compared, x

i, j

is the i histogram bin corresponding to j-th region and

its pre-deﬁned weight.

2.1 Uniform Patterns

In this work, we have implemented an useful exten-

sion of the original LBP operator called uniform pat-

tern (Ojala et al., 2002; Ahonen et al., 2006), which

reduces the length of the feature vector and provides

a simple rotation invariant descriptor. This approach

is motivated by the fact that some binary patterns oc-

cur more frequently in texture images than others. A

pixel neighbourhood is called uniform if the pattern

contains at most two binary transitions. Using this

extension, the length of the feature vector histograms

for a 3x3 kernel reduces from 256 values to 59 val-

ues, being 58 bins for uniform patterns and 1 bin to

represent all non uniform patterns.

3 STATISTICAL SPATIAL MAP

The possibility to emphasise some physiognomical

features among others, provided by LBP, allows us

to improve the classiﬁcation step in face image analy-

sis. Thus, a recent work proposed a method that high-

lights more relevant facial regions in according to the

task, employing the statistical signiﬁcance extracted

from pixel intensity of samples (Amaral et al., 2013).

This approach consists of calculating the t-Student

test from two distinct face image sample groups to

generate a statistical spatial map, as follows:

T =

− X

, (3)

where X

and X

are face image groups, n

is the total

number of samples from group X

and n

is the total

number of samples from group X

. S

is given by:

− 1)S

+ (n

− 1)S

+ n

− 2

, (4)

where S

and S

are the variances of the X

and X

groups, respectively.

In the uniform segmentation procedure the map

is divided in a regular grid, composed of rectangu-

lar regions. Then, for each j region, we calculate

the absolute mean value for T and apply this informa-

tion as w

weight in Chi-Square distance (Equation 2)

to compare two feature vectors x and y. However,

another approaches that generate non-regular regions

have been analyzed intending to optimize the use of

statistical spatial map (Amaral et al., 2014) as well.

4 QUADTREE DECOMPOSITION

Quadtree decomposition has been widely used in dig-

ital image analysis to deﬁne regions of interest for

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

376

further processing as image segmentation, feature se-

lection, object detection, sample annotation, among

others (Conde-Marquez et al., 2011). This technique

consists of a hierarchical data structure whose nodes

are recursively subdivided in four parts until a pre-

deﬁned split condition is not satisﬁed. This pro-

cess performs a top-down approach beginning with

a single node, that represents the entire image, and

give a structure with variable block sizes at the end,

where smaller blocks describe ﬁne details more accu-

rately and bigger blocks incorporates similar regions

to represent them with fewer information as possi-

ble (Samet, 1984; Muhsin et al., 2014). In this work

we have used a simple pixel value homogeneity crite-

ria to perform the blocks partition, deﬁned as follow:

h =

(

True, max(r) − min(r) ≥ t

False, max(r) − min(r) < t,

(5)

where r is the analysed region and t is the threshold

value used as split criteria to set block homogeneity.

Figure 2 illustrate this process.

(a) (b) (c)

Figure 2: Quadtree segmentation: a) Image sample; b) First

slice throughout entire image; c) Second slice only in spe-

ciﬁc blocks that satisfy the segmentation criteria.

To establish a parsimonious feature vector repre-

sentation, we have deﬁned the minimum block size in

8x8 pixels, i.e. 64 values, due to the length of local

texture descriptor histogram in uniform LBP method

that contains 59 bins (Ojala et al., 2002).

4.1 Bottom-up Strategy

In order to overcome the limitations of quadtree seg-

mentation and to reduce dimensionality of data struc-

ture, we perform a second processing step based on

a bottom-up strategy (Us

o, 2003; Fu et al., 2013; Sc-

holeﬁeld and Dragotti, 2014), that consists of merg-

ing similar adjacent regions from quadtree that satis-

ﬁes a joint condition, providing non square regions as

shown in Figure 3.

Our implemented bottom-up approach receives a

pre-segmented quadtree as input, then it performs a

search throughout the tree to ﬁnd the best combina-

tion between two adjacent blocks that satisfy the join

(a) (b) (c)

Figure 3: Segmentation differences: a) Image sample; b)

Default quadtree segmentation with symmetric slices, com-

posed by 13 square blocks; c) Bottom-up strategy with

merge step, composed of only 2 non square blocks.

condition and repeat this procedure iteratively until

the quadtree hasn’t a pair of blocks that comply with

the merge criteria, deﬁned as bellow:

h =

(

True, max(r

, r

) − min(r

, r

) ≤ t

False, max(r

, r

) − min(r

, r

) > t,

(6)

where r

and r

are the adjacent regions and t is the

threshold used as homogeneity condition to join.

5 EXPERIMENTS

In this section we describe automatic gender and fa-

cial expression classiﬁcation experiments carried out

to compare the top-down and bottom-up quadtree ap-

proaches in different face analysis tasks.

5.1 Databases and Setup

To perform the proposed experiments we used two

public available sample sources that meet the nec-

essary requirements for the experiments. The ﬁrst

is the FEI Face Database (Thomaz and Giraldi,

2010), employed to training and testing the pro-

posed approaches. And the second is the Grayscale

FERET (Phillips et al., 2000), used to validate the best

results obtained with FEI Face Database. In this study

we employ only frontal face images, of both genders,

and two samples for each subject, one with neutral

facial expression and the other with smiling facial ex-

pression, providing a total of 400 images from each

database. All the images were normalized to reduce

sample variability by the following steps: rotation,

until to align the both pupils with the horizontal axis;

resize, to adjust the interpupillary distance; cutting to

speciﬁed measures; conversion to grey scale, between

0 and 255; and ﬁnally histograms equalization of pix-

els. Figure 4 shows the adjusting dimensions.

A Statistical Quadtree Decomposition to Improve Face Analysis

377

Figure 4: Dimensions used in the normalisation process.

5.2 Procedure

Initially, we generate the statistical spatial maps for

gender and facial expression classiﬁcation, using FEI

Face Database images. Figure 5 shown such statisti-

cal signiﬁcance maps.

(a) (b)

Figure 5: Statistical spatial maps: a) Gender (male and fe-

male); Facial expression (neutral and smile).

The statistical spatial maps were segmented using

19 distinct threshold values, between 0 and 1, with in-

tervals of 0.05, for each proposed quadtree approach.

This process provides 28 maps, containing the facial

regions to extract the texture descriptor histograms

and calculate spatial weights. Then, we arranged the

samples in four classiﬁcation groups for two classiﬁ-

cation tasks: gender (male vs. female) and facial ex-

pression (neutral vs. smiling). Next, it performed the

classiﬁcation procedure with the speciﬁc task driven

maps. Thus, each sample was compared to all other

samples, except to samples from the same subject, to

identify the nearest neighbour by Chi-Square distance

(Equation 2) as classiﬁcation criteria.

To compare with earlier studies (Amaral et al.,

2013; Amaral and Thomaz, 2013), we performed the

same experiment using 4 uniform grid layouts for

each task (2x2, 4x4, 8x8 and 16x16), with and with-

out spatial weights. To validate the results obtained

with the FEI Face Database, we have reproduced the

experiments with Grayscale FERET employing only

the maps that achieved the best classiﬁcation rates.

6 CLASSIFICATION RESULTS

In this section, we present the classiﬁcation accu-

racy for gender and facial expression using a sim-

ple quadrtree strategy and a quadtree with subsequent

merge step. These approaches were veriﬁed with 19

thresholds and the results are shown in ﬁgures 6 and 7.

Figure 6: Gender classiﬁcation rates and their correspond-

ing number of regions for both approaches.

Figure 7: Expression classiﬁcation rates and their corre-

sponding number of regions for both approaches.

The spatial maps that provided the best classiﬁ-

cation rates are illustrated in Figures 8 and 9, which

correspond to the 0.90 and 0.75 threshold shown on

the previous Figures 6 and 7, respectively.

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

378

(a) 37 regions (b) 256 regions

Figure 8: Spatial maps for gender analysis. The images

above highlights the segmentation maps and below are their

corresponding spatial weights: a) Quadtree with the best

accuracy; b) Uniform grid with the best accuracy.

(a) 46 regions (b) 64 regions

Figure 9: Spatial maps for facial expression analysis. The

images above highlights the segmentation maps and below

are their corresponding spatial weights: a) Quadtree with

the best accuracy; b) Uniform grid with the best accuracy.

The best classiﬁcation rates achieved in experi-

ments with the above maps and FEI Database are

shown in Table 1. Table 2 highlights the results ob-

tained with the FERET Database to validate the pro-

posed spatial mapping approach.

As shown in the previous tables, the quadtree top-

Table 1: Classiﬁcation rates for FEI Face Database.

Map

Gender Expression

Rate Size Rate Size

QT Top-Down 93.5% 37 95.0% 46

Weighted Grid 91.2% 256 95.2% 64

Default Grid 88.5% 256 96.0% 256

Table 2: Classiﬁcation rates for FERET Face Database.

Map

Gender Expression

Rate Size Rate Size

QT Top-Down 77.2% 37 85.7% 46

Weighted Grid 79.2% 256 86.7% 64

Default Grid 76.5% 256 84.7% 256

down approach proposed in this study achieves simi-

lar classiﬁcation results to static grids, using though

much less regions. The distinct classiﬁcation rates

observed between the databases have been probably

given by the differences in the original sample qual-

ity. The better resolution of FEI face images provided

more texture details to the LBP feature descriptors.

7 CONCLUSION

In this paper we have compared two quadtree im-

age segmentation methods to ﬁnd the most impor-

tant physiognomical regions, in according to statis-

tical spatial maps generated for speciﬁc binary classi-

ﬁcation tasks. One of them performs the top-down

quadtree decomposition, with only recursive split

function, and the other approach executes the merge

step after satisfying the split criteria (bottom-up). The

experiments were limited to binary selection tasks due

to the t-Student test restrictions.

Analyzing the preliminary results we can see that

the top-down quadtree segmentation has obtained

similar classiﬁcation results to the uniform grid layout

using, however, much less regions, based on the same

task driven statistical spatial maps. In our opinion,

these differences are very relevant in terms of storage

and computational time, mainly in real time applica-

tions and mobile platforms, because the segmentation

with less regions implies on fewer histograms to store

and compare during the classiﬁcation process.

Therefore, as future work, we intend to explore

the proposed quadtree decomposition for other facial

classiﬁcation problems, e. g. disgusting vs. angry

or fear vs. surprise, and for face recognition as well.

Besides, we believe that another physiognomical rele-

vance prior information could be employed as decom-

position criteria, such as the human perception maps

provided by eye tracking devices.

A Statistical Quadtree Decomposition to Improve Face Analysis

379

ACKNOWLEDGEMENTS

The authors would like to thank the ﬁnancial sup-

port provided by FAPESP (2012/22377-6), CNPq

(309532/2014-0) and the Royal Society (NA140147).

REFERENCES

Ahonen, T., Hadid, A., and Pietik

ainen, M. (2006). Face

description with local binary patterns: application to

face recognition. IEEE Trans. Pattern Analisys and

Machine Intelligence, 28:2037–2041.

Amaral, V., Giraldi, G. A., and Thomaz, C. E. (2013). LBP

estat

ıstico aplicado ao reconhecimento de express

oes

faciais. In Proceedings of the X Encontro Nacional de

Intelig

encia Artiﬁcial e Computacional, ENIAC’13,

Fortaleza, Ceara, Brasil.

Amaral, V., Giraldi, G. A., and Thomaz, C. E. (2014).

Segmentac¸

ao espacial n

ao uniforme aplicada ao re-

conhecimento de g

enero e expressoes faciais. In Pro-

ceedings of the XI Encontro Nacional de Intelig

encia

Artiﬁcial e Computacional, ENIAC’14, S

ao Carlos,

ao Paulo, Brazil.

Amaral, V., Giraldi, G. A., and Thomaz, C. E. (2015). Sta-

tistical and cognitive spatial mapping applied to face

analysis. In Proceedings of the 28th SIBGRAPI, Con-

ference on Graphics, Patterns and Images - Workshop

of Works in Progress, Salvador, Bahia, Brazil.

Amaral, V. and Thomaz, C. E. (2013). Um estudo sobre o

detalhamento espacial de descritores locais aplicados

ao reconhecimento de g

enero e express

oes faciais. In

Anais do 3 Simp

osio de Pesquisa do Grande ABC, S

Bernardo do Campo, S

ao Paulo, Brazil.

Blais, C., Roy, C., Fiset, D., Arguin, M., and Gosselin, F.

(2012). The eyes are not the window to basic emo-

tions. Neuropsychologia, 50(12):2830–2838.

Conde-Marquez, G. R., Escalante, H. J., and Sucar, E.

(2011). Simpliﬁed quadtree image segmentation for

image annotation. In Sucar, E. and Escalante, H. J.,

editors, Proceedings of the 2010 Automatic Image An-

notation and Retrieval Workshop (2010), volume 719,

pages 24–34. CEUR-Workshop Proceedings.

Fu, G., Zhao, H., Li, C., and Shi, L. (2013). Segmen-

tation for high-resolution optical remote sensing im-

agery using improved quadtree and region adjacency

graph technique. Remote Sensing, 5(7):3259.

Muhsin, Z. F., Rehman, A., Altameem, A., Saba, T., and

Uddin, M. (2014). Improved quadtree image segmen-

tation approach to region information. The Imaging

Science Journal, 62(1):56–62.

Ojala, T., Pietik

ainen, M., and Harwood, D. (1996). A com-

parative study of texture measures with classiﬁcation

based on featured distributions. Pattern Recognition,

29(1):51–59.

Ojala, T., Pietik

ainen, M., and M

aenp

a, T. (2002). Mul-

tiresolution gray-scale and rotation invariant texture

classiﬁcation with local binary patterns. IEEE Trans-

action on Pattern Analysis and Machine Intelligence,

24(7):971–987.

Phillips, P. J., Moon, H., Rizvi, S. A., and Rauss, P.

(2000). The FERET evaluation methodology for

face-recognition algorithms. In IEEE Transaction

on Pattern Analysis and Machine Intelligence, vol-

ume 22, pages 1090–1104, Washington, DC, USA.

IEEE Computer Society.

Pietik

ainen, M., Zhao, G., Hadid, A., and Ahonen, T.

(2011). Computer Vision Using Local Binary Pat-

terns. Number 40 in Computational Imaging and Vi-

sion. Springer.

Samet, H. (1984). The quadtree and related hierarchical

data structures. ACM Comput. Surv., 16(2):187–260.

Santarcangelo, V., Farinella, G. M., and Battiato, S. (2015).

Gender recognition: Methods, datasets and results. In

IEEE International Conference on Multimedia Expo

Workshops (ICMEW 2015), pages 1–6.

Scholeﬁeld, A. and Dragotti, P. L. (2014). Quadtree struc-

tured image approximation for denoising and inter-

polation. IEEE Transactions on IMage Processing,

23:1226–1239.

Shan, C. (2012). Learning local binary patterns for gen-

der classiﬁcation on real-world face images. Pattern

Recognition Letters, 33(4):431–437.

Shan, C., Gong, S., and McOwan, P. W. (2005). Robust

facial expression recognition using local binary pat-

terns. In ICIP 2005. IEEE International Conference

on Image Processing, 2005. IEEE.

Shan, C., Gong, S., and McOwan, P. W. (2009). Facial

expression recognition based on local binary patterns:

A comprehensive study. Image and Vision Computing,

27(6):803–816.

Thomaz, C. E. and Giraldi, G. A. (2010). A new ranking

method for principal components analysis and its ap-

plication to face image analysis. Image and Vision

Computing, 28:902–913.

Torrisi, A., Farinella, G. M., Puglisi, G., and Battiato, S.

(2015). Selecting discriminative clbp patterns for age

estimation. In IEEE International Conference on Mul-

timedia Expo Workshops (ICMEW), pages 1–6.

o, A. M. (2003). A quadtree-based unsupervised segmen-

tation algorithm for fruit visual inspection. In L

opez,

F. J. P., Campilho, A. C., de la Blanca, N. P., and

Sanfeliu, A., editors, IbPRIA, volume 2652 of Lecture

Notes in Computer Science, pages 510–517. Springer.

Zhao, W., Chellappa, R., Phillips, P. J., and Rosenfeld, A.

(2003). Face recognition: a literature survey. ACM

Computing Surveys, 35(4):399–458.

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

380