SEMI-SUPERVISED ESTIMATION OF PERCEIVED AGE FROM
FACE IMAGES
Kazuya Ueki
VALWAY Technology Center, NEC Soft, Ltd., Tokyo, Japan
Masashi Sugiyama
Department of Computer Science, Tokyo Institute of Technology and JST PRESTO, Tokyo, Japan
Yasuyuki Ihara
VALWAY Technology Center, NEC Soft, Ltd., Tokyo, Japan
Keywords:
Perceived age estimation, Active sample selection, Weighted regression, Semi-supervised learning, Manifold
regularization, Human age perception.
Abstract:
We address the problem of perceived age estimation from face images and propose a new semi-supervised age
prediction method that involves two novel aspects. The first novelty is an efcient active learning strategy for
reducing the cost of labeling face samples. Given a large number of unlabeled face samples, we reveal the
cluster structure of the data and propose to label cluster representative samples for covering as many clusters
as possible. This simple sampling strategy allows us to boost the performance of a manifold-based semi-
supervised learning method only with a relatively small number of labeled samples. The second contribution
is to take the heterogeneous characteristics of human age perception into account. It is rare to misregard
the age of a 5-year-old child as 15 years old, but the age of a 35-year-old person is often misregarded as
45 years old. Thus, magnitude of the error is different depending on subjects’ age. We carried out a large-
scale questionnaire survey for quantifying human age perception characteristics and propose to encode the
quantified characteristics by weighted regression. Consequently, our proposed method is expressed in the
form of weighted least-squares with a manifold regularizer, which is scalable to massive datasets. Through
real-world age estimation experiments, we demonstrate the usefulness of the proposed method.
1 INTRODUCTION
Demographic analysis in public places such as shop-
ping malls and stations is attracting a great deal of
attention these days since it is useful for designing
effective marketing strategies. Such demographic in-
formation is often collected manually, e.g., at conve-
nient stores, sales clerks input customers’ attributes
such as age and gender to a point-of-sale (POS) sys-
tem. However, such manual data collection requires
a lot of human labor and automating this process is
highly desired.
In this paper, we address the problem of age es-
timation from face images using machine learning
techniques. Most of the existing studies on age esti-
mation try to predict real age (Kwon and Lobo, 1999;
Horng et al., 2001; Lanitis et al., 2002; Geng et al.,
2006; Fu et al., 2007) and several databases are avail-
able publicly (e.g., (Phillips et al., 2005; Ricanek and
Tesafaye, 2006)). However, the problem of estimat-
ing subjects’ real age is highly ill-posed since the cor-
respondence between appearance and real age is not
clear even for humans.
When designing marketing strategies, analyzing
perceived age is often more preferred than real age.
However, little attention has been paid for perceived
age analysis so far—in this paper, we therefore pro-
pose a new method of perceived age estimation from
face images. Perceived age of a subject is defined as
the mean estimated age by a large number of people.
Thus the problem of perceived age estimation can be
naturally formulated as a regression problem, which
319
Ueki K., Sugiyama M. and Ihara Y. (2010).
SEMI-SUPERVISED ESTIMATION OF PERCEIVED AGE FROM FACE IMAGES.
In Proceedings of the International Conference on Computer Vision Theory and Applications, pages 319-324
DOI: 10.5220/0002817503190324
Copyright
c
SciTePress
is aimed at estimating the conditional mean of out-
puts (age) given inputs (face images).
Face images often contain complex variability due
to diversity of individual characteristics, angles, light-
ing conditions, etc. Thus a large number of labeled
face samples are usually needed to obtain good pre-
diction performance. However, labeling face samples
requires much time and effort and it is desirable to
reduce the number of labeled samples without perfor-
mance degradation. In this paper, we first propose
an active learning strategy for reducing the sampling
cost. We focus on a semi-supervised setup where a
large number of unlabeled face samples are available
abundantly. Our active learning idea is to apply a clus-
tering technique to reveal the cluster structure of the
face data and to label cluster representative samples
for covering as many clusters as possible. This simple
sampling strategy allows us to boost the performance
of a manifold-based semi-supervised learning method
(Sindhwani et al., 2006) only with a relatively small
number of labeled samples.
In order to further improve the estimation ac-
curacy, we second propose to take the heteroge-
neous characteristics of human age perception into
account—the age of a 5-year-old child may not be
misregarded as 15 years old, but the age of a 35-
year-old person is often misregarded as 45 years old.
Thus, deviation of the estimation error is different
depending on subjects’ age (which is referred to as
heteroscedastic noise). We carried out a large-scale
questionnaire survey in order to quantify human age
perception characteristics, and propose to take ac-
count of the quantified characteristics by weighted re-
gression, which is shown to be able to cope with het-
eroscedastic noise.
Combining the above two ideas, we propose a
kernel-based semi-supervised perceived age estima-
tion method which is expressed in the form of kernel
weighted least-squares with a manifold regularizer;
thanks to its simple formulation, the proposed method
is scalable to large-scale datasets. Through real-world
age estimation experiments, we demonstrate the use-
fulness of the proposed method.
2 SEMI-SUPERVISED
ALGORITHM FOR
PERCEIVED AGE ESTIMATION
In this section, we describe the proposed procedure
for perceived age estimation.
2.1 Clustering-based Active Learning
Strategy
First, we explain our active learning strategy for re-
ducing the cost of labeling face samples.
Face samples contain various diversity such as in-
dividual characteristics, angles, lighting conditions,
etc. They often possess cluster structure, and face
samples in each cluster tend to have similar ages (Fu
et al., 2007; Guo et al., 2008; Ueki et al., 2008).
Based on these observations, we propose to label the
face images which are closest to cluster centroids.
For revealing the cluster structure, we apply the
k-means clustering method to a large number of unla-
beled samples. Since clustering of high-dimensional
data is often unreliable, we first apply principal com-
ponent analysis (PCA) to the face images for dimen-
sion reduction and then apply the k-means clustering
algorithm. The proposed active learning strategy is
summarized as follows.
1. For a set of n-dimensional unlabeled face image
samples {X
i
}
t
i=1
, we compute {x
i
}
t
i=1
of m ( n)
dimensions by the PCA projection.
2. Using the k-means clustering algorithm, we com-
pute the l ( t) cluster centroids {m
j
}
l
j=1
.
3. We choose {x
b
i
j
}
l
j=1
as samples to be labeled,
where
b
i
j
= argmin
i
kx
i
m
j
k.
k · k denotes the Euclidean norm.
This is a simple procedure, but highly effective as
demonstrated later.
For making the notation simple, we permute the
order of samples {x
i
}
t
i=1
so that the first l samples
{x
i
}
l
i=1
are labeled and the remaining u (= t l) sam-
ples {x
i
}
l+u
i=l+1
are unlabeled—this is always possible
without loss of generality. Let {y
i
}
l
i=1
be the labels
for {x
i
}
l
i=1
.
2.2 Semi-supervised Age Regression
with Manifold Regularization
As explained above, face images possess cluster
structure and face samples in each cluster tend to have
similar ages. Here we utilize this cluster structure
by employing a method of semi-supervised regression
with manifold regularization (Sindhwani et al., 2006).
This subsection is devoted to reviewing the manifold
regularization method.
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
320
For age regression, we use the following kernel
model:
f(x;α) =
l+u
i=1
α
i
k(x, x
i
), (1)
where α = (α
1
, . .. , α
l+u
)
are parameters to be
learned, denotes the transpose, and k(x, x
) is a re-
producing kernel function; we use the Gaussian ker-
nel:
k(x, x
) = exp
kx x
k
2
2σ
2
,
where σ
2
is the Gaussian variance. We included
(l + u) kernels in the kernel regression model (1),
but u may be very large in age prediction. In prac-
tice, we only use c (< u) elements randomly chosen
from the set {k(x, x
i
)}
l+u
i=l+1
for reducing the computa-
tional cost; thus the total number of basis functions is
b = l + c. However, we stick to using Eq.(1) below in
order to keep the explanation simple.
The basic assumption behind manifold regulariza-
tion is that the target function we want to learn is
‘smooth’ within clusters. In order to let our learned
function satisfy this property, a manifold regularizer
is included in the training criterion—the parameter α
is learned so that the following criterion is minimized.
1
l
l
i=1
(y
i
f(x
i
;α))
2
+ λkαk
2
+
µ
4(l + u)
2
l+u
i, j=1
W
i, j
( f(x
i
;α) f(x
j
;α))
2
, (2)
where λ and µ are non-negative constants. W
i, j
rep-
resents the similarity between x
i
and x
j
, which is de-
fined by
W
i, j
= exp
kx
i
x
j
k
2
2γ
2
(3)
if x
i
is a h-nearest neighbor of x
j
or vice versa; other-
wise,
W
i, j
= 0.
The first term in Eq.(2) is the goodness-of-fit term
and the second term is the ordinary regularizer for
avoiding overfitting (Hoerl and Kennard, 1970). The
third term is the manifold regularizer; the weight W
i, j
takes large values when x
i
and x
j
belong to the same
cluster, so the manifold regularizer works for keep-
ing the outputs of the function f(x) within the cluster
close to each other. Consequently, we can obtain a
function that is smooth inside clusters.
Figure 1: The relation between subjects’ true age y
(hor-
izontal axis) and the standard deviation of perceived age
(vertical axis).
2.3 Incorporating Age Perception
Characteristics
Next, we extend the above manifold regularization
method so that human age perception characteristics
can be incorporated.
First, we quantify human age perception charac-
teristics through a large-scale questionnaire survey.
We asked each of 72 volunteers to give age labels y
to approximately 500 face images. The true age
of subjects is defined as the average perceived age
(rounded-off to the nearest integer), and denoted by
y
. This is because our purpose is not to predict the
subjects’ real age, but their perceived age. We sum-
marize the standard deviation of the perceived age as
a function of the true age y
in Figure 1.
The standard deviation is approximately 2 (years)
when the true age y
is less than 15. The standard de-
viation is increased and goes beyond 6 as the true age
y
increases from 15 to 35. Then the standard devi-
ation is decreased to around 5 as the true age y
in-
creases from 35 to 70. This graph shows that the per-
ceived age deviation tends to be small in younger age
brackets and large in older age groups. This would
well agree with our intuition considering the human
growth process.
Now let us incorporate the above survey result
into the perceived age estimation framework in Sec-
tion 2.2. When the standard deviation is small (large),
making an error is regarded as more (less) critical.
This idea follows a similar line to the Mahalanobis
distance (Duda et al., 2001), so it would be reasonable
to incorporate the above survey result into the frame-
work of weighted regression analysis. More precisely,
weighting the goodness-of-fit term in Eq.(2) accord-
ing to the inverse error variance optimally adjusts to
SEMI-SUPERVISED ESTIMATION OF PERCEIVED AGE FROM FACE IMAGES
321
the characteristics of human perception. Thus, our
proposed training criterion is given as
1
l
l
i=1
(y
i
f(x
i
;α))
2
w(y
i
)
2
+ λkαk
2
+
µ
4(l + u)
2
l+u
i, j=1
W
i, j
( f(x
i
;α) f(x
j
;α))
2
, (4)
where w(y) is the value given in Figure 1.
An important advantage of the above training
method is that the solution can be obtained analyti-
cally by
b
α=
K
DK + lλI
l+u
+
lµ
(l + u)
2
K
LK
1
K
Dy, (5)
where K is the (l + u) × (l + u) kernel Gram matrix
whose (i, j)-th element is defined by
K
i, j
= k(x
i
, x
j
).
D is the (l + u) × (l + u) diagonal weight matrix with
diagonal elements defined by
1
w(y
1
)
2
, . .. ,
1
w(y
l
)
2
, 0, .. . , 0.
L is the (l+u)×(l+u) Laplacian matrix whose (i, j)-
th entry is defined by
L
i, j
= δ
i, j
l+u
j
=1
W
i, j
W
i, j
,
where δ
i, j
is the Kronecker delta. I
l+u
denotes the
(l + u) × (l + u) identity matrix. y is the (l + u)-
dimensional label vector defined by
y = (y
1
, . . . , y
l
, 0, . . . , 0)
.
If u is very large (which would be the case in age
prediction), computing the inverse of the (l+u)×(l+
u) matrix in Eq.(5) is not tractable. To cope with this
problem, reducing the number of kernels from (l + u)
to a smaller number b would be a realistic option, as
explained in Section 2.2. Then the matrix K becomes
an (l + u) × b rectangular matrix and the identity ma-
trix in Eq.(5) becomes I
b
. Thus the size of the ma-
trix we need to invert becomes b× b, which would be
tractable when b is kept moderate. We may further
reduce the computational cost by numerically com-
puting the solution by a stochastic gradient-descent
method.
Figure 2: Examples of face images.
2.4 Evaluation Criteria
Conventionally, the performance of an age prediction
function f(x) for test samples {(ex
i
, ey
i
)}
m
i=1
was evalu-
ated by the mean absolute error (MAE) (Lanitis et al.,
2002; Lanitis et al., 2004; Geng et al., 2006; Ueki
et al., 2008):
MAE =
1
m
m
i=1
|ey
i
f(ex
i
)|.
However, as explained in Section 2.3, this does not
properly reflect human intuition. Here we propose to
use the weighted criterion also for performance evalu-
ation in the experiments. More specifically, we evalu-
ate the prediction performance by the weighted mean
square error (WMSE):
WMSE =
1
m
m
i=1
(ey
i
f(ex
i
))
2
w(ey
i
)
2
.
The smaller the value of WMSE is, the better the age
prediction function is.
3 EMPIRICAL EVALUATION
In this section, we apply the proposed age prediction
method to in-house face-age datasets and experimen-
tally evaluate its performance.
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
322
3.1 Data Acquisition and Experimental
Setup
Age prediction systems are often used in public places
such as shopping malls or stations. In order to
make our experiments realistic, we collected face im-
age samples from video sequences taken by ceiling-
mounted surveillance cameras with depression angle
5–10 degrees. The recording method, image reso-
lution, and the image size are diverse depending on
the recording conditions—for example, some subjects
were illuminated by dominant light sources, walking
naturally, seated on a stool, and keeping their heads
still. The subjects’ facial expressions are typically
subtle, switching between neutral and smiling. We
used a face detector for localizing the two eye-centers,
and then rescaled the image to 64× 64 pixels. Exam-
ples of face images are shown in Figure 2. Female
faces whose age ranges from 1 to 70 were used in our
experiments.
As pre-processing, we extracted 100-dimensional
features from the 64 × 64 face images using a neu-
ral network feature extractor proposed in (Tivive and
Bouzerdoumi, 2006; Tivive and Bouzerdoum, 2006).
In total, we have 28500 face samples in our database.
Among them, u = 27000 are treated as unlabeled sam-
ples and the remaining m = 1500 are used as test
samples. From the 27000 unlabeled samples, we
choose l = 200 samples to be labeled by active learn-
ing. The Gaussian variance σ
2
and the regularization
parameters λ and µ were determined so that WMSE
for the test data is minimized (i.e., they are opti-
mally tuned). For manifold regularization, we fixed
the nearest neighbor number and the decay rate of
the similarity to h = 5 and γ = 1.0, respectively (see
Eq.(3)).
3.2 Results
We applied the k-means clustering algorithm to
27000 unlabeled samples in the 4-dimensional or 10-
dimensional PCA subspace and extracted 200 clus-
ters. We chose 200 samples that are closest to the 200
cluster centroids and labeled them; then we trained
a regressor using the weighted manifold method pro-
posed in Section 2.3 with the 200 labeled samples and
5000 unlabeled samples randomly chosen from the
pool of 26800 (= 27000 200) unlabeled samples.
We compared the above method with random sam-
pling strategy. Figure 3 summarizes WMSE obtained
by each method; in the comparison, we also included
supervised regression where unlabeled samples were
not used (i.e., µ = 0).
Figure 3 shows that the proposed active learn-
Figure 3: Comparison of WMSE (with/without clustering
and supervised/semi-supervised learning).
Figure 4: WMSE for each age-group.
ing method gave smaller WMSE than the random
sampling strategy; the use of unlabeled samples
for learning also improved the performance. Thus
the proposed active learning method combined with
manifold-based semi-supervised learning is shown to
be effective for improving the age prediction perfor-
mance.
In order to more closely understand the effect of
age weighting, we investigated the prediction error
for each age bracket. Figure 4 shows age-bracket-
wise WMSE when the weighted learning method (see
Eq.(4)) or the non-weighted learning method (see
Eq.(2)) is used. The figure shows that the error in
young age groups (less than 20 years old) is signifi-
cantly reduced by the use of weights, which is highly
important in practical human evaluation (as explained
in Section 2.3). On the other hand, the prediction er-
ror for middle/older age groups is slightly increased,
but a small increase of the error in these age brackets
is shown to be less significant from our questionnaire
survey. Therefore, the experimental result indicates
that our approach qualitatively improves the age pre-
diction accuracy.
SEMI-SUPERVISED ESTIMATION OF PERCEIVED AGE FROM FACE IMAGES
323
4 SUMMARY AND
CONCLUSIONS
Perceived age estimation is highly useful in various
real-world applications such as developing efficient
marketing strategies. In this paper, we proposed a
novel method for perceived age estimation from face
images by combining two ideas. The first idea was
an efficient active learning strategy for reducing the
cost of labeling face samples. Experiments showed
that our active learning strategy together with man-
ifold regularization can improve the performance of
perceived age estimation even with a relatively small
number of labeled face samples. The second idea was
to take account of heterogeneous characteristics of
human age perception in the form of weighted regres-
sion. Experimental results showed that our weighted
regression method can properly handle heteroscedas-
tic noise and thus the prediction performance is qual-
itatively improved.
We have used characteristics of human age per-
ception as weights—error in younger age brackets is
more serious than that in older age groups. On the
other hand, our framework can accommodate arbi-
trary weights, which opens up new interesting re-
search possibilities. Higher weights lead to better
prediction in the corresponding age brackets, so we
can improve the prediction accuracy of arbitrary age
groups (but the price we have to pay for this is a
performance decrease in other age brackets). This
property could be useful, for example, in cigarettes
and alcohol retail, where accuracy around 20 years
old needs to be enhanced but accuracy in other age
brackets are not so important. Another possible usage
of our weighted regression framework is to combine
learned functions obtained from several different age
weights. This could further improve the age predic-
tion performance, which we would like to pursue in
our future work.
REFERENCES
Duda, R. O., Hart, P. E., and Stor, D. G. (2001). Pattern
Classification. Wiley, New York.
Fu, Y., Xu, Y., and Huang, T. S. (2007). Estimating human
age by manifold analysis of face pictures and regres-
sion on aging features. In Proceedings of the IEEE
Multimedia and Expo. 1383–1386.
Geng, X., Zhou, Z., Zhang, Y., Li, G., and Dai, H. (2006).
Learning from facial aging patterns for automatic age
estimation. In Proceedings of the 14th ACM Interna-
tional Conference on Multimedia, 307–316.
Guo, G., Fu, Y., Dyer, C. R., and Huang, T. S. (2008).
Image-based human age estimation by manifold learn-
ing and locally adjusted robust regression. In IEEE
Transactions on Image Processing, 17(7), 1178–1188.
Hoerl, A. E. and Kennard, R. W. (1970). Ridge regres-
sion: biased estimation for nonorthogonal problems.
In Technometrics, 12:55–67, 1970.
Horng, W. B., Lee, C. P., and Chen, C. W. (2001). Clas-
sification of age groups based on facial features.
In Tamkang Journal of Science and Engineering,
4(3):183–192.
Kwon, Y. H. and Lobo, N. V. (1999). Age classification
from facial images. In Computer Vision and Image
Understanding, 74(1):1–21.
Lanitis, A., Draganova, C., and Christodoulou, C. (2004).
Comparing different classifiers for automatic age esti-
mation. In IEEE Transactions on Systems, Man, and
Cybernetics Part B, 34(1):621–628.
Lanitis, A., Taylor, C. J., and Cootes, T. F. (2002). To-
ward automatic simulation of aging effects on face im-
ages. In IEEE Transactions on Pattern Analysis and
Machine Intelligence, 24(4):442–455.
Phillips, P. J., Flynn, P. J., Scruggs, W. T., Bowyer, K. W.,
Chang, J., Hoffman, K., Marques, J., Min, J., and
Worek, W. J. (2005). Overview of the face recognition
grand challenge. In Proceedings of the IEEE Com-
puter Society Conference on Computer Vision and
Pattern Recognition (CVPR 2005). 947–954.
Ricanek, K. J. and Tesafaye, T. (2006). Morph: A longitu-
dinal image database of normal adult age-progression.
In Proceedings of the IEEE 7th International Con-
ference on Automatic Face and Gesture Recognition
(FGR ’06), 341–345.
Sindhwani, V., Belkin, M., and Niyogi, P. (2006). The ge-
ometric basis of semi-supervised learning. In Semi-
Supervised Learning, The MIT Press, 2006.
Tivive, F. H. C. and Bouzerdoum, A. (2006). A shunting in-
hibitory convolutional neural network for gender clas-
sification. In Proceedings of the 18th International
Conference on Pattern Recognition (ICPR2006),
4:421–424.
Tivive, F. H. C. and Bouzerdoumi, A. (2006). A gender
recognition system using shunting inhibitory convolu-
tional neural networks. In Proceedings of the Interna-
tional Joint Conference on Neural Networks (IJCNN
’06), 5336–5341.
Ueki, K., Miya, M., Ogawa, T., and Kobayashi, T. (2008).
Class distance weighted locality preserving projection
for automatic age estimation. In Proceedings of the
2nd IEEE International Conference on Biometrics:
Theory, Applications and Systems (BTAS 08), 1–5.
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
324