SEMI-SUPERVISED ESTIMATION OF PERCEIVED AGE FROM

FACE IMAGES

Kazuya Ueki

VALWAY Technology Center, NEC Soft, Ltd., Tokyo, Japan

Masashi Sugiyama

Department of Computer Science, Tokyo Institute of Technology and JST PRESTO, Tokyo, Japan

Yasuyuki Ihara

VALWAY Technology Center, NEC Soft, Ltd., Tokyo, Japan

Keywords:

Perceived age estimation, Active sample selection, Weighted regression, Semi-supervised learning, Manifold

regularization, Human age perception.

Abstract:

We address the problem of perceived age estimation from face images and propose a new semi-supervised age

prediction method that involves two novel aspects. The ﬁrst novelty is an efﬁcient active learning strategy for

reducing the cost of labeling face samples. Given a large number of unlabeled face samples, we reveal the

cluster structure of the data and propose to label cluster representative samples for covering as many clusters

as possible. This simple sampling strategy allows us to boost the performance of a manifold-based semi-

supervised learning method only with a relatively small number of labeled samples. The second contribution

is to take the heterogeneous characteristics of human age perception into account. It is rare to misregard

the age of a 5-year-old child as 15 years old, but the age of a 35-year-old person is often misregarded as

45 years old. Thus, magnitude of the error is different depending on subjects’ age. We carried out a large-

scale questionnaire survey for quantifying human age perception characteristics and propose to encode the

quantiﬁed characteristics by weighted regression. Consequently, our proposed method is expressed in the

form of weighted least-squares with a manifold regularizer, which is scalable to massive datasets. Through

real-world age estimation experiments, we demonstrate the usefulness of the proposed method.

1 INTRODUCTION

Demographic analysis in public places such as shop-

ping malls and stations is attracting a great deal of

attention these days since it is useful for designing

effective marketing strategies. Such demographic in-

formation is often collected manually, e.g., at conve-

nient stores, sales clerks input customers’ attributes

such as age and gender to a point-of-sale (POS) sys-

tem. However, such manual data collection requires

a lot of human labor and automating this process is

highly desired.

In this paper, we address the problem of age es-

timation from face images using machine learning

techniques. Most of the existing studies on age esti-

mation try to predict real age (Kwon and Lobo, 1999;

Horng et al., 2001; Lanitis et al., 2002; Geng et al.,

2006; Fu et al., 2007) and several databases are avail-

able publicly (e.g., (Phillips et al., 2005; Ricanek and

Tesafaye, 2006)). However, the problem of estimat-

ing subjects’ real age is highly ill-posed since the cor-

respondence between appearance and real age is not

clear even for humans.

When designing marketing strategies, analyzing

perceived age is often more preferred than real age.

However, little attention has been paid for perceived

age analysis so far—in this paper, we therefore pro-

pose a new method of perceived age estimation from

face images. Perceived age of a subject is deﬁned as

the mean estimated age by a large number of people.

Thus the problem of perceived age estimation can be

naturally formulated as a regression problem, which

319

Ueki K., Sugiyama M. and Ihara Y. (2010).

SEMI-SUPERVISED ESTIMATION OF PERCEIVED AGE FROM FACE IMAGES.

In Proceedings of the International Conference on Computer Vision Theory and Applications, pages 319-324

DOI: 10.5220/0002817503190324

 SciTePress

is aimed at estimating the conditional mean of out-

puts (age) given inputs (face images).

Face images often contain complex variability due

to diversity of individual characteristics, angles, light-

ing conditions, etc. Thus a large number of labeled

face samples are usually needed to obtain good pre-

diction performance. However, labeling face samples

requires much time and effort and it is desirable to

reduce the number of labeled samples without perfor-

mance degradation. In this paper, we ﬁrst propose

an active learning strategy for reducing the sampling

cost. We focus on a semi-supervised setup where a

large number of unlabeled face samples are available

abundantly. Our active learning idea is to apply a clus-

tering technique to reveal the cluster structure of the

face data and to label cluster representative samples

for covering as many clusters as possible. This simple

sampling strategy allows us to boost the performance

of a manifold-based semi-supervised learning method

(Sindhwani et al., 2006) only with a relatively small

number of labeled samples.

In order to further improve the estimation ac-

curacy, we second propose to take the heteroge-

neous characteristics of human age perception into

account—the age of a 5-year-old child may not be

misregarded as 15 years old, but the age of a 35-

year-old person is often misregarded as 45 years old.

Thus, deviation of the estimation error is different

depending on subjects’ age (which is referred to as

heteroscedastic noise). We carried out a large-scale

questionnaire survey in order to quantify human age

perception characteristics, and propose to take ac-

count of the quantiﬁed characteristics by weighted re-

gression, which is shown to be able to cope with het-

eroscedastic noise.

Combining the above two ideas, we propose a

kernel-based semi-supervised perceived age estima-

tion method which is expressed in the form of kernel

weighted least-squares with a manifold regularizer;

thanks to its simple formulation, the proposed method

is scalable to large-scale datasets. Through real-world

age estimation experiments, we demonstrate the use-

fulness of the proposed method.

2 SEMI-SUPERVISED

ALGORITHM FOR

PERCEIVED AGE ESTIMATION

In this section, we describe the proposed procedure

for perceived age estimation.

2.1 Clustering-based Active Learning

Strategy

First, we explain our active learning strategy for re-

ducing the cost of labeling face samples.

Face samples contain various diversity such as in-

dividual characteristics, angles, lighting conditions,

etc. They often possess cluster structure, and face

samples in each cluster tend to have similar ages (Fu

et al., 2007; Guo et al., 2008; Ueki et al., 2008).

Based on these observations, we propose to label the

face images which are closest to cluster centroids.

For revealing the cluster structure, we apply the

k-means clustering method to a large number of unla-

beled samples. Since clustering of high-dimensional

data is often unreliable, we ﬁrst apply principal com-

ponent analysis (PCA) to the face images for dimen-

sion reduction and then apply the k-means clustering

algorithm. The proposed active learning strategy is

summarized as follows.

1. For a set of n-dimensional unlabeled face image

samples {X

}

i=1

, we compute {x

}

i=1

of m (≪ n)

dimensions by the PCA projection.

2. Using the k-means clustering algorithm, we com-

pute the l (≪ t) cluster centroids {m

}

j=1

3. We choose {x

}

j=1

as samples to be labeled,

where

= argmin

− m

k · k denotes the Euclidean norm.

This is a simple procedure, but highly effective as

demonstrated later.

For making the notation simple, we permute the

order of samples {x

}

i=1

so that the ﬁrst l samples

}

i=1

are labeled and the remaining u (= t −l) sam-

ples {x

}

l+u

i=l+1

are unlabeled—this is always possible

without loss of generality. Let {y

}

i=1

be the labels

for {x

}

i=1

2.2 Semi-supervised Age Regression

with Manifold Regularization

As explained above, face images possess cluster

structure and face samples in each cluster tend to have

similar ages. Here we utilize this cluster structure

by employing a method of semi-supervised regression

with manifold regularization (Sindhwani et al., 2006).

This subsection is devoted to reviewing the manifold

regularization method.

VISAPP 2010 - International Conference on Computer Vision Theory and Applications

320

For age regression, we use the following kernel

model:

f(x;α) =

l+u

∑

i=1

k(x, x

), (1)

where α = (α

, . .. , α

l+u

)

⊤

are parameters to be

learned, ⊤ denotes the transpose, and k(x, x

′

) is a re-

producing kernel function; we use the Gaussian ker-

nel:

k(x, x

′

) = exp



−

kx− x

′

2σ



where σ

is the Gaussian variance. We included

(l + u) kernels in the kernel regression model (1),

but u may be very large in age prediction. In prac-

tice, we only use c (< u) elements randomly chosen

from the set {k(x, x

)}

l+u

i=l+1

for reducing the computa-

tional cost; thus the total number of basis functions is

b = l + c. However, we stick to using Eq.(1) below in

order to keep the explanation simple.

The basic assumption behind manifold regulariza-

tion is that the target function we want to learn is

‘smooth’ within clusters. In order to let our learned

function satisfy this property, a manifold regularizer

is included in the training criterion—the parameter α

is learned so that the following criterion is minimized.

∑

i=1

− f(x

;α))

+ λkαk

4(l + u)

l+u

∑

i, j=1

i, j

( f(x

;α) − f(x

;α))

, (2)

where λ and µ are non-negative constants. W

i, j

rep-

resents the similarity between x

and x

, which is de-

ﬁned by

i, j

= exp



−

− x

2γ



(3)

if x

is a h-nearest neighbor of x

or vice versa; other-

wise,

i, j

= 0.

The ﬁrst term in Eq.(2) is the goodness-of-ﬁt term

and the second term is the ordinary regularizer for

avoiding overﬁtting (Hoerl and Kennard, 1970). The

third term is the manifold regularizer; the weight W

i, j

takes large values when x

and x

belong to the same

cluster, so the manifold regularizer works for keep-

ing the outputs of the function f(x) within the cluster

close to each other. Consequently, we can obtain a

function that is smooth inside clusters.

Figure 1: The relation between subjects’ true age y

∗

(hor-

izontal axis) and the standard deviation of perceived age

(vertical axis).

2.3 Incorporating Age Perception

Characteristics

Next, we extend the above manifold regularization

method so that human age perception characteristics

can be incorporated.

First, we quantify human age perception charac-

teristics through a large-scale questionnaire survey.

We asked each of 72 volunteers to give age labels y

to approximately 500 face images. The ‘true’ age

of subjects is deﬁned as the average perceived age

(rounded-off to the nearest integer), and denoted by

∗

. This is because our purpose is not to predict the

subjects’ real age, but their perceived age. We sum-

marize the standard deviation of the perceived age as

a function of the true age y

∗

in Figure 1.

The standard deviation is approximately 2 (years)

when the true age y

∗

is less than 15. The standard de-

viation is increased and goes beyond 6 as the true age

∗

increases from 15 to 35. Then the standard devi-

ation is decreased to around 5 as the true age y

∗

in-

creases from 35 to 70. This graph shows that the per-

ceived age deviation tends to be small in younger age

brackets and large in older age groups. This would

well agree with our intuition considering the human

growth process.

Now let us incorporate the above survey result

into the perceived age estimation framework in Sec-

tion 2.2. When the standard deviation is small (large),

making an error is regarded as more (less) critical.

This idea follows a similar line to the Mahalanobis

distance (Duda et al., 2001), so it would be reasonable

to incorporate the above survey result into the frame-

work of weighted regression analysis. More precisely,

weighting the goodness-of-ﬁt term in Eq.(2) accord-

ing to the inverse error variance optimally adjusts to

SEMI-SUPERVISED ESTIMATION OF PERCEIVED AGE FROM FACE IMAGES

321

the characteristics of human perception. Thus, our

proposed training criterion is given as

∑

i=1

− f(x

;α))

w(y

)

+ λkαk

4(l + u)

l+u

∑

i, j=1

i, j

( f(x

;α) − f(x

;α))

, (4)

where w(y) is the value given in Figure 1.

An important advantage of the above training

method is that the solution can be obtained analyti-

cally by

α=



⊤

DK + lλI

l+u

lµ

(l + u)

⊤



−1

⊤

Dy, (5)

where K is the (l + u) × (l + u) kernel Gram matrix

whose (i, j)-th element is deﬁned by

i, j

= k(x

, x

D is the (l + u) × (l + u) diagonal weight matrix with

diagonal elements deﬁned by

w(y

)

, . .. ,

w(y

)

, 0, .. . , 0.

L is the (l+u)×(l+u) Laplacian matrix whose (i, j)-

th entry is deﬁned by

i, j

= δ

i, j

l+u

∑

′

i, j

′

−W

i, j

where δ

i, j

is the Kronecker delta. I

l+u

denotes the

(l + u) × (l + u) identity matrix. y is the (l + u)-

dimensional label vector deﬁned by

y = (y

, . . . , y

, 0, . . . , 0)

⊤

If u is very large (which would be the case in age

prediction), computing the inverse of the (l+u)×(l+

u) matrix in Eq.(5) is not tractable. To cope with this

problem, reducing the number of kernels from (l + u)

to a smaller number b would be a realistic option, as

explained in Section 2.2. Then the matrix K becomes

an (l + u) × b rectangular matrix and the identity ma-

trix in Eq.(5) becomes I

. Thus the size of the ma-

trix we need to invert becomes b× b, which would be

tractable when b is kept moderate. We may further

reduce the computational cost by numerically com-

puting the solution by a stochastic gradient-descent

method.

Figure 2: Examples of face images.

2.4 Evaluation Criteria

Conventionally, the performance of an age prediction

function f(x) for test samples {(ex

, ey

∗

)}

i=1

was evalu-

ated by the mean absolute error (MAE) (Lanitis et al.,

2002; Lanitis et al., 2004; Geng et al., 2006; Ueki

et al., 2008):

MAE =

∑

i=1

|ey

∗

− f(ex

)|.

However, as explained in Section 2.3, this does not

properly reﬂect human intuition. Here we propose to

use the weighted criterion also for performance evalu-

ation in the experiments. More speciﬁcally, we evalu-

ate the prediction performance by the weighted mean

square error (WMSE):

WMSE =

∑

i=1

(ey

∗

− f(ex

))

w(ey

∗

)

The smaller the value of WMSE is, the better the age

prediction function is.

3 EMPIRICAL EVALUATION

In this section, we apply the proposed age prediction

method to in-house face-age datasets and experimen-

tally evaluate its performance.

VISAPP 2010 - International Conference on Computer Vision Theory and Applications

322

3.1 Data Acquisition and Experimental

Setup

Age prediction systems are often used in public places

such as shopping malls or stations. In order to

make our experiments realistic, we collected face im-

age samples from video sequences taken by ceiling-

mounted surveillance cameras with depression angle

5–10 degrees. The recording method, image reso-

lution, and the image size are diverse depending on

the recording conditions—for example, some subjects

were illuminated by dominant light sources, walking

naturally, seated on a stool, and keeping their heads

still. The subjects’ facial expressions are typically

subtle, switching between neutral and smiling. We

used a face detector for localizing the two eye-centers,

and then rescaled the image to 64× 64 pixels. Exam-

ples of face images are shown in Figure 2. Female

faces whose age ranges from 1 to 70 were used in our

experiments.

As pre-processing, we extracted 100-dimensional

features from the 64 × 64 face images using a neu-

ral network feature extractor proposed in (Tivive and

Bouzerdoumi, 2006; Tivive and Bouzerdoum, 2006).

In total, we have 28500 face samples in our database.

Among them, u = 27000 are treated as unlabeled sam-

ples and the remaining m = 1500 are used as test

samples. From the 27000 unlabeled samples, we

choose l = 200 samples to be labeled by active learn-

ing. The Gaussian variance σ

and the regularization

parameters λ and µ were determined so that WMSE

for the test data is minimized (i.e., they are opti-

mally tuned). For manifold regularization, we ﬁxed

the nearest neighbor number and the decay rate of

the similarity to h = 5 and γ = 1.0, respectively (see

Eq.(3)).

3.2 Results

We applied the k-means clustering algorithm to

27000 unlabeled samples in the 4-dimensional or 10-

dimensional PCA subspace and extracted 200 clus-

ters. We chose 200 samples that are closest to the 200

cluster centroids and labeled them; then we trained

a regressor using the weighted manifold method pro-

posed in Section 2.3 with the 200 labeled samples and

5000 unlabeled samples randomly chosen from the

pool of 26800 (= 27000− 200) unlabeled samples.

We compared the above method with random sam-

pling strategy. Figure 3 summarizes WMSE obtained

by each method; in the comparison, we also included

supervised regression where unlabeled samples were

not used (i.e., µ = 0).

Figure 3 shows that the proposed active learn-

Figure 3: Comparison of WMSE (with/without clustering

and supervised/semi-supervised learning).

Figure 4: WMSE for each age-group.

ing method gave smaller WMSE than the random

sampling strategy; the use of unlabeled samples

for learning also improved the performance. Thus

the proposed active learning method combined with

manifold-based semi-supervised learning is shown to

be effective for improving the age prediction perfor-

mance.

In order to more closely understand the effect of

age weighting, we investigated the prediction error

for each age bracket. Figure 4 shows age-bracket-

wise WMSE when the weighted learning method (see

Eq.(4)) or the non-weighted learning method (see

Eq.(2)) is used. The ﬁgure shows that the error in

young age groups (less than 20 years old) is signiﬁ-

cantly reduced by the use of weights, which is highly

important in practical human evaluation (as explained

in Section 2.3). On the other hand, the prediction er-

ror for middle/older age groups is slightly increased,

but a small increase of the error in these age brackets

is shown to be less signiﬁcant from our questionnaire

survey. Therefore, the experimental result indicates

that our approach qualitatively improves the age pre-

diction accuracy.

SEMI-SUPERVISED ESTIMATION OF PERCEIVED AGE FROM FACE IMAGES

323

4 SUMMARY AND

CONCLUSIONS

Perceived age estimation is highly useful in various

real-world applications such as developing efﬁcient

marketing strategies. In this paper, we proposed a

novel method for perceived age estimation from face

images by combining two ideas. The ﬁrst idea was

an efﬁcient active learning strategy for reducing the

cost of labeling face samples. Experiments showed

that our active learning strategy together with man-

ifold regularization can improve the performance of

perceived age estimation even with a relatively small

number of labeled face samples. The second idea was

to take account of heterogeneous characteristics of

human age perception in the form of weighted regres-

sion. Experimental results showed that our weighted

regression method can properly handle heteroscedas-

tic noise and thus the prediction performance is qual-

itatively improved.

We have used characteristics of human age per-

ception as weights—error in younger age brackets is

more serious than that in older age groups. On the

other hand, our framework can accommodate arbi-

trary weights, which opens up new interesting re-

search possibilities. Higher weights lead to better

prediction in the corresponding age brackets, so we

can improve the prediction accuracy of arbitrary age

groups (but the price we have to pay for this is a

performance decrease in other age brackets). This

property could be useful, for example, in cigarettes

and alcohol retail, where accuracy around 20 years

old needs to be enhanced but accuracy in other age

brackets are not so important. Another possible usage

of our weighted regression framework is to combine

learned functions obtained from several different age

weights. This could further improve the age predic-

tion performance, which we would like to pursue in

our future work.

REFERENCES

Duda, R. O., Hart, P. E., and Stor, D. G. (2001). Pattern

Classiﬁcation. Wiley, New York.

Fu, Y., Xu, Y., and Huang, T. S. (2007). Estimating human

age by manifold analysis of face pictures and regres-

sion on aging features. In Proceedings of the IEEE

Multimedia and Expo. 1383–1386.

Geng, X., Zhou, Z., Zhang, Y., Li, G., and Dai, H. (2006).

Learning from facial aging patterns for automatic age

estimation. In Proceedings of the 14th ACM Interna-

tional Conference on Multimedia, 307–316.

Guo, G., Fu, Y., Dyer, C. R., and Huang, T. S. (2008).

Image-based human age estimation by manifold learn-

ing and locally adjusted robust regression. In IEEE

Transactions on Image Processing, 17(7), 1178–1188.

Hoerl, A. E. and Kennard, R. W. (1970). Ridge regres-

sion: biased estimation for nonorthogonal problems.

In Technometrics, 12:55–67, 1970.

Horng, W. B., Lee, C. P., and Chen, C. W. (2001). Clas-

siﬁcation of age groups based on facial features.

In Tamkang Journal of Science and Engineering,

4(3):183–192.

Kwon, Y. H. and Lobo, N. V. (1999). Age classiﬁcation

from facial images. In Computer Vision and Image

Understanding, 74(1):1–21.

Lanitis, A., Draganova, C., and Christodoulou, C. (2004).

Comparing different classiﬁers for automatic age esti-

mation. In IEEE Transactions on Systems, Man, and

Cybernetics Part B, 34(1):621–628.

Lanitis, A., Taylor, C. J., and Cootes, T. F. (2002). To-

ward automatic simulation of aging effects on face im-

ages. In IEEE Transactions on Pattern Analysis and

Machine Intelligence, 24(4):442–455.

Phillips, P. J., Flynn, P. J., Scruggs, W. T., Bowyer, K. W.,

Chang, J., Hoffman, K., Marques, J., Min, J., and

Worek, W. J. (2005). Overview of the face recognition

grand challenge. In Proceedings of the IEEE Com-

puter Society Conference on Computer Vision and

Pattern Recognition (CVPR 2005). 947–954.

Ricanek, K. J. and Tesafaye, T. (2006). Morph: A longitu-

dinal image database of normal adult age-progression.

In Proceedings of the IEEE 7th International Con-

ference on Automatic Face and Gesture Recognition

(FGR ’06), 341–345.

Sindhwani, V., Belkin, M., and Niyogi, P. (2006). The ge-

ometric basis of semi-supervised learning. In Semi-

Supervised Learning, The MIT Press, 2006.

Tivive, F. H. C. and Bouzerdoum, A. (2006). A shunting in-

hibitory convolutional neural network for gender clas-

siﬁcation. In Proceedings of the 18th International

Conference on Pattern Recognition (ICPR2006),

4:421–424.

Tivive, F. H. C. and Bouzerdoumi, A. (2006). A gender

recognition system using shunting inhibitory convolu-

tional neural networks. In Proceedings of the Interna-

tional Joint Conference on Neural Networks (IJCNN

’06), 5336–5341.

Ueki, K., Miya, M., Ogawa, T., and Kobayashi, T. (2008).

Class distance weighted locality preserving projection

for automatic age estimation. In Proceedings of the

2nd IEEE International Conference on Biometrics:

Theory, Applications and Systems (BTAS 08), 1–5.

VISAPP 2010 - International Conference on Computer Vision Theory and Applications

324