Low Level Features for Quality Assessment of Facial Images

Arnaud Lienhard, Patricia Ladret and Alice Caplier

GIPSA-Lab, Grenoble Images Parole Signal Automatique, Grenoble, France

Keywords:

Aesthetic Quality, Automatic Scoring, Portraits.

Abstract:

An automated system that provides feedback about aesthetic quality of facial pictures could be of great interest

for editing or selecting photos. Although image aesthetic quality assessment is a challenging task that requires

understanding of subjective notions, the proposed work shows that facial image quality can be estimated by

using low-level features only. This paper provides a method that can predict aesthetic quality scores of facial

images. 15 features that depict technical aspects of images such as contrast, sharpness or colorfulness are

computed on different image regions (face, eyes, mouth) and a machine learning algorithm is used to perform

classiﬁcation and scoring. Relevant features and facial image areas are selected by a feature ranking technique,

increasing both classiﬁcation and regression performance. Results are compared with recent works, and it is

shown that by using the proposed low-level feature set, the best state of the art results are obtained.

1 INTRODUCTION

Social psychological studies have shown that people

form impressions from facial appearance very quickly

(Willis and Todorov, 2006) and this makes facial pic-

ture selection crucial. With the widespread use of dig-

ital cameras and photo sharing applications, select-

ing the best picture of a particular person for a given

application is a time consuming challenge. Thus, a

system providing automatically feedback about im-

age aesthetic quality would be an interesting and use-

ful tool. Searching images automatically sorted with

respect to their aesthetic scores, editing images to en-

hance their visual appeal or selecting one particular

image among an entire collection would be simpliﬁed

for home users. The features used for automated com-

putation have to be adapted to the considered applica-

tion: proﬁle pictures on social networks are different

from pictures presented in a professional purpose (re-

sumes, visiting cards). In this work, only the general

aesthetic quality of facial images is considered, with-

out taking facial expressions or beauty into account.

1.1 Previous Work

Various attempts have been made to solve automatic

aesthetic assessment in images. Different approaches

exist: (Marchesotti and Perronnin, 2012) explore fea-

tures at pixel level whereas (Li et al., 2010) estimate

high-level attributes (smiles, eyes closeness) that can-

not directly be obtained by extracting visual data due

to the semantic gap between information contained

in pixels and human interpretation. Most of recent

works perform region of interest (ROI) extraction to

enhance their prediction results since different objects

locations, shapes or color compositions may change

the global aesthetic quality of an image (Datta et al.,

2006). ROI may be detected using sharpness estima-

tion (Luo and Tang, 2008), saliency maps (Wong and

Low, 2009; Tong et al., 2010) or object detection (Vi-

ola and Jones, 2001).

The main approach for evaluating portraits aes-

thetic quality is characterized by computing a set of

features in the subject and background regions. Often,

features such as contrast, sharpness or color distribu-

tion are computed in addition to features that describe

subject-background relationship (Jiang et al., 2010;

Tang et al., 2013). Recent features that describe high-

level aspects of images have been developed: facial

expression, age and gender of the subject, hair and

skin colors, presence of beard, etc (Dhar et al., 2011).

At the best of our knowledge, little researches

have been done on pictures containing a single frontal

face (Males et al., 2013). Plus, there are no pub-

licly available datasets containing facial images and

their aesthetic ratings, which makes comparison with

previous work difﬁcult. In previous work (Lienhard

et al., 2014), we developed a method that segments

precisely the image (hair, shoulders, skin, back-

ground) and computed features in each region. The

545

Lienhard A., Ladret P. and Caplier A..

Low Level Features for Quality Assessment of Facial Images.

DOI: 10.5220/0005308805450552

In Proceedings of the 10th International Conference on Computer Vision Theory and Applications (VISAPP-2015), pages 545-552

ISBN: 978-989-758-089-5

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

main result of this previous work is that facial area

is almost sufﬁcient to describe efﬁciently the global

aesthetic of the picture. The proposed method deﬁnes

new image regions (eyes and mouth areas) and com-

putes additional features that enhance the aesthetic

prediction performance.

1.2 Objectives

Aesthetic evaluation depends on image content, and

evaluating a landscape is different from judging a por-

trait, where the viewer focuses on the subject face.

That is why ﬁnding faces, and studying particular re-

gions in the facial area (eyes, mouth) is important to

make a precise evaluation of portraiture aesthetics.

This article presents a method that achieves aes-

thetic quality assessment of facial images. 15 fea-

tures are measured on the entire image and 3 regions:

face, eyes and mouth. Eyes and mouth have already

been considered for facial expression evaluation (Li

et al., 2010) and information related to these regions

is included in models that extract low-level features in

the entire image (Marchesotti and Perronnin, 2012).

However, computing global statistics such as contrast,

colorfulness or sharpness has not been done yet in

these particular areas. This article demonstrates that

adding relevant information related to these restricted

regions (eyes, mouth) produces equal or better perfor-

mance than any other recent work in this domain. The

feature set is optimized by the Relief metric (Robnik-

Sikonja and Kononenko, 2003) and results are com-

pared with 4 recent works focusing on frontal facial

pictures (Lienhard et al., 2014), portraits (Poga

cnik

et al., 2012; Khan and Vogel, 2012) or pictures repre-

senting several persons (Li et al., 2010).

This paper is organized as follows. The overall

method is described in Section 2, including image

segmentation, feature computation and the learning

algorithm. Further analysis of relevant features and

regions is given in Section 3. Experiments and results

are reported in Section 4 and an application to picture

selection is given in 5. Conclusion and future work

are reported in Section 6.

2 PROPOSED METHOD

This work focuses on automated aesthetic assessment

of headshots, which are portraits cropped to the ex-

tremes of the target’s head and shoulders (see Figure

1). This section describes the datasets considered in

this work as well as the three steps of the rating al-

gorithm: face and facial attributes detection, feature

extraction, automated aesthetic prediction.

2.1 Datasets

Experiments are made on 3 different datasets.

HFS, for Human Face Scores, is described in (Lien-

hard et al., 2014) and contains 250 headshots that

have been gathered from several existing datasets and

private collections. More precisely, it contains a set of

7 different images of 20 persons, and 110 additional

images of different persons. Examples of images for

3 particular persons are given in Figure 1. Each im-

age has been rated by 25 persons on a 1 to 6 scale

(6 means the highest quality). The ground truth is

considered to be the average score for each picture.

This dataset is used to validate the proposed method

in Section 4.1, and to evaluate the method for picture

selection of a given person in Section 5.

Figure 1: A set of 7 pictures of 3 different persons from the

HFS dataset.

FAVA, for Face Aesthetic Visual Analysis, is a subset

of the AVA database (Murray et al., 2012) containing

various images from which headshots are automati-

cally extracted. More precisely, each picture is scored

from 1 to 10 by internet users (10 means the high-

est quality). This dataset is similar to the one used

in (Poga

cnik et al., 2012) and will be used for com-

parison. As described in (Poga

cnik et al., 2012), im-

ages with average scores (between 4.5 and 6.5) are

removed. Since our work is based on colored images,

black and white pictures are also removed, and the

ﬁnal dataset contains 300 pictures.

Flicker is a website hosting a lot of pictures and por-

traits. (Li et al., 2010) created a dataset of 500 images

gathered on this website and scored by the Amazon

Mechanical Turk system. Each image is associated to

a ground truth score between 0 and 10 (10 means high

quality). Photos are either portraits or group portraits.

In this work, only the biggest face detected is consid-

ered in each picture, while (Li et al., 2010) consider

all the faces as well as the relationship between them

(distances, face pose and expressions).

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

546

Figure 2: Example of an image and its 4 regions.

2.2 Facial Attributes Segmentation

To locate the face area, bounding box detection is

performed by using Viola-Jones algorithm (Viola and

Jones, 2001) and the OpenCV library. Inside the face

region, observers are more likely to focus on eyes and

mouth, which provide information about the subject:

facial expressions, presence of make up, etc. The pro-

posed method relies on the fact that decisive infor-

mation about face image quality can be obtained by

computing features on eyes and mouth areas only.

In this work, each image is decomposed into the 4

regions described in Figure 2: entire image R

, face

area R

, eyes area R

and mouth area R

. Both eyes

are considered to be part of the same region. Eyes

and mouth areas are also detected by Viola-Jones al-

gorithm.

2.3 Features Extraction

State of the art methods implement a lot of features

(76 in (Faria et al., 2013)) in order to assess aesthetic

quality of facial images. In this work, only 15 low-

level features are considered. They consist in image

statistics that can be computed in each region. Thus,

each image is described by a set of 60 values (15 fea-

tures in each of the 4 regions). Features correspond

to sharpness, illumination, contrast and color distri-

bution measures. These categories have been chosen

in this work because they can be computed at the pixel

level and are close to human perception. The feature

list is given below.

Sharpness is evaluated by 3 different values: F

, F

. The ﬁrst sharpness measure F

is com-

puted by using the blur estimation method described

in (Crete et al., 2007), which compares the difference

between an original image I and its low-pass ﬁltered

version I

. More precisely, gradients are measured in

I and in I

: the greater the gradient differences be-

tween both images, the sharper the original image I.

Indeed, high differences mean that the original im-

age has sharp edges, and loses a lot of its sharpness

through the ﬁltering process. On the contrary, blurry

images do not change a lot after ﬁltering. This method

appeared to be very discriminant in our previous work

(Lienhard et al., 2014).

Since a sharp facial picture contains high gradi-

ents located in the face region, the average gradient

value F

is computed in each region. The size of the

bounding box containing 90% of the image gradients

is calculated as described in (Ke et al., 2006).

Illumination is characterized by 2 values, F

and

, evaluated by the means of two channels: Value

V and Luminance L

∗

(respectively from HSV and

∗

color spaces). Both measures are considered

in several articles (Ke et al., 2006; Datta et al., 2006;

Poga

cnik et al., 2012). They provide information

about the image global brightness if computed on the

entire image, or local brightness if computed on facial

regions. Combination of local and global measures

also give some indications about the brightness dif-

ference between face and non face regions, which in-

ﬂuences our perception of aesthetics (Wong and Low,

2009; Khan and Vogel, 2012). Even if these values are

highly correlated, both are implemented and the less

discriminant measure will automatically be removed

by the feature selection process.

Contrast is measured by 4 values, from F

to F

Two of them correspond to the standard deviation of

V and L

∗

(respectively F

and F

). Then, the width of

the middle 90% mass of L

∗

histogram F

(Ke et al.,

2006; Wong and Low, 2009) and the Michelson con-

trast value F

(Desnoyer and Wettergreen, 2010) are

computed. Michelson contrast is obtained by the ra-

tio (L

∗

max

− L

∗

min

)/(L

∗

min

+ L

∗

max

) where L

∗

max

and L

∗

min

are the highest and lowest L

∗

values in the considered

region.

Color information is extracted with the measure-

ment of 6 values, from F

to F

. The Dark Chan-

nel (DC), introduced to perform haze removal (He

et al., 2010), provides information about sharpness

and colors. High values are related with dull col-

ors or blurry areas. DC corresponds to a mini-

mal ﬁlter applied on the RGB color space. Each

pixel p(i, j) of an image I is computed as follows:

p(i, j) = min

c∈R,G,B

(min

, j

)∈Ω(i, j)

, j

)) where I

is a channel of I and Ω(i, j) corresponds to the 5 × 5

neighborhood of p(i, j). It has been shown that DC

evaluation helps to increase performance of image

aesthetic assessment (Tang et al., 2013). Since faces

are composed of area with low DC values (skin for ex-

ample) and high DC values (eyes), the DC mean and

its standard deviation are considered (respectively F

and F

Hue H and Saturation S standard deviations (from

HSV color space) are also computed (F

to F

). The

number of different hues F

in each area is an indica-

LowLevelFeaturesforQualityAssessmentofFacialImages

547

tor of its complexity (Ke et al., 2006; Li et al., 2010).

Finally, the colorfulness measure F

described in

(Hasler and Suesstrunk, 2003) is implemented, pro-

viding information about the mean and standard devi-

ation of the channels a

∗

and b

∗

of L

∗

color space.

In recent work (Aydin et al., 2014), it is shown that

is highly correlated to the human perception of

colorfulness and that this measure is an indicator of

the overall image aesthetic quality.

2.4 Aesthetic Prediction

The learning task is performed by a Support Vector

Machine (SVM) for both categorization (separation

between low and high aesthetic quality images) and

regression (aesthetic quality rating). SVM provided

the best results in preliminary experiments. Other

methods like Random Forest or Neural Networks ob-

tained good results, but slightly below SVM. OpenCV

SVM implementation (Chang and Lin, 2011) is used

with its default parameters and a Gaussian kernel. For

each experiment, a 10-fold cross validation is per-

formed. This task is repeated 10 times to avoid sam-

pling bias, and only average results are reported.

2-class categorization performance is measured

by the Good Classiﬁcation Rate GCR = N

. It is

the ratio between the number of images correctly clas-

siﬁed N

and the number of test images N

. Regres-

sion performance is computed by Pearson’s correla-

tion R. Let ˆs

be the ground truth and s

the predicted

score of picture n. R is calculated by the formula:

R =

∑

n=1

( ˆs

−

ˆs) · (s

− ¯s)

∑

n=1

( ˆs

−

ˆs)

∑

n=1

− ¯s)

(1)

where

ˆs =

∑

n=1

ˆs

and ¯s =

∑

n=1

3 ANALYSIS OF RELEVANT

FEATURES AND REGIONS

15 features and 4 regions (R

, R

) are a pri-

ori considered. Finding the most discriminant couples

(Feature, Region) in the case of aesthetic quality esti-

mation presents multiple advantages. First, it helps to

design more efﬁcient metrics, adapted to the consid-

ered problem. It also enables to compute fewer fea-

tures, reducing the implementation and computational

cost, and ﬁnally improving the overall accuracy of the

prediction.

3.1 Feature and Region Selection

Some of the considered features may be more rele-

vant when computed in limited regions only. For in-

stance, facial images often have blurred background

and sharp edges in the face. Measuring each feature

inside all the regions may also add noise in the data

due to redundant or irrelevant values. Thus, selecting

the most discriminant features for a given area can en-

hance the prediction performance.

In this work, the 60 couples (Feature, Region) are

ranked using the Relief metric, implemented as de-

scribed in (Robnik-

Sikonja and Kononenko, 2003).

This metric provides feedback about the ability of

each couple to separate images with similar features

but different aesthetic quality scores. The idea is to

repeatedly consider an image i in the training set and

to ﬁnd its nearest neighbors in the feature space. For

each neighbor k and feature f , a positive weight is

added to the Relief evaluation of f if images i and k

present both close scores and close values of f , and a

negative weight otherwise. Discriminant features end

up with high Relief evaluation.

Analysis of the features and regions retained by

this metric for the HFS dataset is given in Sections

3.2 and 3.3. In these sections, 2-class categorization

is performed by separating the dataset in 2 groups of

85 images with the lowest and the highest scores.

3.2 Inﬂuence of Features

The Relief metric is used to rank the features with re-

spect to their ability to separate images with different

scores or aesthetic categories. In order to analyze the

contribution C of feature F

without the region inﬂu-

ence, the following formula is applied:

C(F

) =

∑

j=A

Relie f (F

) (2)

where Relie f (F

) is the value obtained from the

Relief algorithm for the couple (F

It can be observed in Table 1 that sharpness met-

rics are the most discriminant features (C(F

) = 0.45,

C(F

) = 0.29). Using only F

on the HFS dataset,

GCR = 71%, which is already signiﬁcantly above the

chance level (50%), but still below the performance

obtained by using the entire feature set (86.5%). By

adding the average gradient F

, the GCR reaches

77.5%.

Dark Channel measures (F

, F

) are the most

discriminant features in the color category (C(F

) =

0.15, C(F

) = 0.32), and combined to the sharp-

ness measurements, a GCR of 82% is obtained, which

is close to the optimal performance obtained in this

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

548

work. By adding the best measures from the illumi-

nation and contrast categories (respectively the mean

of the Value channel F

and Michelson contrast F

GCR = 85.5%. This shows that for this example (HFS

and 2-class categorization), 6 measures are enough to

produce results just below optimal performance. Note

that these features (F

, F

) produce

the same performance for regression than the entire

feature set (R = 0.71).

Table 1: Each row, one or two features are added to the

model. Classiﬁcation and Regression Performance (respec-

tively CP and RP) are presented, as well as the Relief

weights for each feature (C(F )).

Addition of. . . CP (%) RP (R) C(F )

71.0 0.50 0.45

77.5 0.55 0.29

, F

82.0 0.64 0.15, 0.32

, F

85.5 0.71 0.16, 0.20

to F

86.5 0.71 2.64

3.3 Inﬂuence of Image Regions

In this section the entire feature set is computed for

each considered region. Table 2 presents the results

for both 2-class categorization (85 images in each

category) and regression (250 images) for the HFS

dataset. It can be seen that computing features in the

very small area corresponding to the eyes is sufﬁcient

to reproduce the results described in (Lienhard et al.,

2014). Plus, it is better to compute the 15 proposed

features in the eyes region than in the entire image for

both classiﬁcation and regression, which is an inter-

esting result since computing features in small regions

is much faster and thus can lead to real-time applica-

tions. This can be explained by the fact that if the

entire image is of low aesthetic quality, eyes are prob-

ably of low quality as well. And if the eyes region

is sharp, contrasted and well illuminated, it is almost

sufﬁcient for evaluating a portrait as aesthetic.

Table 2: Inﬂuence of each image region, as well as the per-

formance obtained by considering the entire set of regions.

Region Class. Perf. (%) Reg. Perf. (R)

(Image) 77.9 0.54

(Face) 82.5 0.60

(Eyes) 83.9 0.64

(Mouth) 82.4 0.61

86.5 0.71

Finally, couples with the highest Relief values

are (F

{1,2}

{B,C,D}

): sharpness measures in the fa-

cial areas are the most discriminant values for aes-

thetic quality assessment. The remaining problem is

to choose the number of couples to keep in the ﬁnal

model. This can be solved by performing preliminary

experiments, where the number of features is incre-

mented until the optimal performance is reached. Re-

sults obtained by the optimal number of couples for

the 3 datasets are reported in Section 4.

4 EXPERIMENTS AND RESULTS

4.1 Validation of the Method

The performance evaluation of the proposed method

is done on HFS dataset. Two equally distributed

groups of pictures are created, containing respectively

images with the lowest and the highest scores. Each

group contains 125 images, the half of the dataset.

2-class categorization is performed and the average

True Positive Rate (TPR) and False Positive Rate

(FPR) are shown in the Receiver Operating Charac-

teristic (ROC) curves presented in Figure 3. High TPR

means that most of good looking images are retrieved

while FPR represents the rate of poor looking images

predicted as good looking images.

Performance is measured by the Area Under the

Curve (AUC). Figure 3 shows that the proposed

features and regions are relevant since performance

is signiﬁcantly better than the results obtained us-

ing only foreground/background segmentation (Usual

Method), with AUC = 0.87 instead of 0.83. Using the

combination of the best couples (R ,F ) obtained by

the Relief ranking, it is possible to increase the perfor-

mance and obtain an AUC of 0.90. Performance in the

low recall area (FPR < 0.1) is promising, since it is

possible to retrieve about 70% of the good looking im-

ages while making only 10% of false detections. This

last result can be used in real life applications: some

good looking images are selected in a large database,

among which the user can manually choose the best

one.

By removing average images, which are difﬁcult

to categorize as aesthetic or poor looking images, it

is possible to enhance the performance. This means

that erroneous classiﬁcations are mostly due to aver-

age images, which are neither good nor bad images.

Figure 3 shows that by removing 30% of average im-

ages, the AUC is 0.93.

4.2 Comparison with Previous Works

To compare the proposed method with previous work,

the experiments of (Li et al., 2010; Poga

cnik et al.,

2012; Khan and Vogel, 2012; Lienhard et al., 2014)

are reproduced, using the same learning algorithms

LowLevelFeaturesforQualityAssessmentofFacialImages

549

Figure 3: Proposed method achieves the best performance.

AUC can be increased by using feature selection or remov-

ing average pictures.

and databases with the proposed feature set. These

works use images containing both group pictures and

portraits (Li et al., 2010), only portraits (Poga

cnik

et al., 2012; Khan and Vogel, 2012) or face portraits

(Lienhard et al., 2014). The method is ﬁrst com-

pared with previous works performing image catego-

rization, then with works performing score prediction.

4.3 Comparison with Previous

Categorization Models

(Li et al., 2010) consider 500 images from the Flickr

dataset, which are separated in 5 classes with respect

to their ground truth aesthetic score. They perform 5-

class categorization and measure the Cross-Category

Error, which is a function of the error magnitude k:

CCE(k) =

∑

i=1

I ( ˆc

− c

= k) (3)

where N

is the number of test images, ˆc

the ground

truth classiﬁcation and c

the predicted classiﬁcation

for the i

image. I represents the indicator function:

it takes the value 1 if ˆc

− c

= k, and 0 if ˆc

− c

6= k.

They obtain 68% accuracy within one cross-category

error: (CCE(−1) + CCE(0) + CCE(1))/N

= 0.68.

On the same dataset, using the same learning algo-

rithm (a Gaussian-kernel SVM), the proposed method

achieves the same performance. It has to be noticed

that in this evaluation, only the biggest face is consid-

ered because the proposed method is adapted to head-

shots, not for group pictures. Several attributes related

to faces relationship in the group photo are not mea-

sured: (Li et al., 2010) show that high-level attributes

such as smiles or image composition (faces size and

positions) play an important role in the global aes-

thetic evaluation and including these attributes in the

proposed model may enhance the performance.

Table 3: Classiﬁcation performance of Previous Work (PW)

is compared with the Proposed Method (PM).

Dataset PW PM

(Li et al., 2010) Flickr 68% 68%

(Khan and Vogel, 2012) Flickr 64% 70%

(Poga

cnik et al., 2012) FAVA 75% 81%

(Lienhard et al., 2014) HFS 84% 87%

Comparison with (Khan and Vogel, 2012) is possi-

ble using Li’s dataset and focusing on the 140 images

that are portraits of a single person. 3 out of their 7

features are similar to the proposed features (face illu-

mination, contrast, brightness). They also include fea-

tures relative to image composition (rule of third, face

position and size). Their best result for 2-class catego-

rization corresponds to an accuracy of 63.5%, using

SVM classiﬁcation and 10-fold cross validation. We

obtain better performance with the proposed feature

set: 69% without selection, 70% with the best feature

selection.

The work presented in (Poga

cnik et al., 2012) is

compared using the FAVA dataset, which is very sim-

ilar to the dataset used in their work: both are por-

traits extracted automatically from the AVA dataset

(Murray et al., 2012). Their 2-class categorization ac-

curacy is 73.2%, using 71 various features: subject

position and size, compositional rules, distribution of

edges, color distribution, etc. Gaussian-kernel SVM

is used to perform 10-fold cross validation. Using

the Relief metric to enhance their results, they obtain

(74.8%). The proposed system obtains 81% of correct

classiﬁcation (76.2% without feature selection).

Finally, the feature set and segmentation algo-

rithm presented in (Lienhard et al., 2014) is tested

and compared with the proposed method. In previous

work, only 83.7% of good classiﬁcation in the case

of 2-class categorization has been obtained, while the

proposed feature set produces an average of 86.5%.

Results developed in this section are summarized in

Table 3 and show a signiﬁcant increase of the classi-

ﬁcation performance.

4.4 Comparison with Previous

Regression Models

Among the 4 works previously cited, only (Li et al.,

2010) and (Lienhard et al., 2014) performed aes-

thetic score prediction. (Li et al., 2010) calculated the

residual sum-of-squares error RSE to measure perfor-

mance:

RSE =

− 1

∑

i=1

(

− S

)

(4)

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

550

Figure 4: Comparison of the regression prediction obtained by a) (Lienhard et al., 2014), b) the proposed feature set and c)

the reduced set obtained by feature selection.

where S

is the ground truth score and

the predicted

score. They perform SVM regression to make score

prediction. Using the same dataset, their features lead

to RSE = 2.38 while the proposed method leads to

RSE = 2.15, which is slightly better.

In (Lienhard et al., 2014), performance is com-

puted by Pearson’s correlation R. Using the proposed

method without feature selection, the correlation in-

creases signiﬁcantly from R = 0.61 to 0.71. Fea-

ture selection increases the performance to R = 0.74,

which is signiﬁcantly higher than the results obtained

in our previous work. Figure 4 presents the point

clouds obtained after regression for (Lienhard et al.,

2014), the proposed feature set and the reduced set

obtained by feature ranking. A perfect prediction cor-

responds to a straight line (R = 1), and the proposed

method reaches R = 0.74 which is signiﬁcantly better

than our previous work (R = 0.61).

5 APPLICATION TO PICTURE

SELECTION

Automated picture selection of a given person is a

practical example that may beneﬁt from the proposed

method and its results. People may have hundreds of

pictures from which they want to select a small set

that is relevant for a given application: facebook pro-

ﬁle picture, professional purposes like resumes, etc.

There are many attributes that are very discriminant

in the case of picture selection: is the person smiling

? Are the eyes open ? These attributes are partially en-

coded in our features (opened eyes mean more colors

and higher contrast in the eye region). However sub-

jective judgments like emotions are not considered.

In most of picture selection problems, users are

likely to manually choose appealing images. By au-

tomatically selecting a small subset of images that are

already deﬁned as appealing, it is possible to signiﬁ-

Figure 5: 7 images of the same person represented by their

ground truth scores and automated aesthetic prediction.

cantly reduce the time spent on selection. The follow-

ing experiment is made. First, the learning algorithm

is applied on the entire HFS dataset except for one

particular person (243 images are used for learning).

Then, prediction is made on the 7 images correspond-

ing to the selected person. Figure 5 presents an ex-

ample of image selection using the proposed method.

Using appropriate thresholds, it is possible to retain

automatically appealing images (pictures above the

blue line) or remove unsatisfying images (pictures be-

low the red line).

6 CONCLUSION

In this paper, a framework to assess the aesthetic qual-

ity of frontal facial portraits has been proposed. Fea-

tures are extracted in different face regions (entire

face, eyes, mouth) that contain the most relevant in-

formation about the portrait. Few pixel-level statistics

LowLevelFeaturesforQualityAssessmentofFacialImages

551

are computed in each region and a substantial model

of portrait aesthetic estimation is proposed. Com-

parison between different methods of aesthetic scores

and categories prediction has been made, and per-

formance of 4 recent works is signiﬁcantly outper-

formed. The proposed feature selection process en-

hanced the overall prediction accuracy and the most

discriminant features and regions have been summa-

rized. Improvements are still to be done to deal efﬁ-

ciently with rotated or occluded faces, and the frame-

work can be generalized to other kind of images by

replacing the face detection process by any adapted

segmentation algorithm.

In the future, results may be enhanced by the ad-

dition of high-level features. More precisely, it would

be interesting to consider attributes such as gender,

age, facial expression, eyes and mouth closeness, etc.

These attributes are closer to human perception of fa-

cial aesthetics than low-level statistics and can help to

perform more speciﬁc evaluation, to match with con-

sumer applications and to handle faces with glasses,

hats, make-up or facial hair.

REFERENCES

Aydin, T., Smolic, A., and Gross, M. (2014). Automated

Aesthetic Analysis of Photographic Images. IEEE

Transactions on Visualization and Computer Graph-

ics.

Chang, C. and Lin, C. (2011). LIBSVM: a library for sup-

port vector machines. ACM Transactions on Intelli-

gent Systems and Technologies, pages 1–39.

Crete, F., Dolmiere, T., Ladret, P., and Nicolas, M. (2007).

The blur effect: perception and estimation with a new

no-reference perceptual blur metric. In SPIE Elec-

tronic Image Symposium.

Datta, R., Joshi, D., Li, J., and Wang, J. (2006). Studying

aesthetics in photographic images using a computa-

tional approach. Computer VisionECCV 2006.

Desnoyer, M. and Wettergreen, D. (2010). Aesthetic Im-

age Classiﬁcation for Autonomous Agents. 20th In-

ternational Conference on Pattern Recognition, pages

3452–3455.

Dhar, S., Ordonez, V., and Berg, T. (2011). High level de-

scribable attributes for predicting aesthetics and inter-

estingness. Computer Vision and Pattern Recognition,

pages 1657–1664.

Faria, J., Bagley, S., R

uger, S., and Breckon, T. (2013).

Challenges of ﬁnding aesthetically pleasing images.

In Image Analysis for Multimedia Interactive Services

(WIAMIS), volume 2, pages 4–7.

Hasler, D. and Suesstrunk, S. (2003). Measuring colorful-

ness in natural images. Electronic Imaging. Interna-

tional Society for Optics and Photonics., pages 87–95.

He, K., Sun, J., and Tang, X. (2010). Single Image Haze Re-

moval Using Dark Channel Prior. IEEE transactions

on pattern analysis and machine intelligence.

Jiang, W., Loui, A. C., and Cerosaletti, C. D. (2010). Auto-

matic aesthetic value assessment in photographic im-

ages. IEEE International Conference on Multimedia

and Expo, pages 920–925.

Ke, Y., Tang, X., and Jing, F. (2006). The design of high-

level features for photo quality assessment. In Com-

puter Vision and Pattern Recognition, volume 1, pages

419–426.

Khan, S. and Vogel, D. (2012). Evaluating visual aesthetics

in photographic portraiture. Computational Aesthetics

in Graphics, Visualization and Imaging, pages 1–8.

Li, C., Loui, A., and Chen, T. (2010). Towards aesthetics:

a photo quality assessment and photo selection sys-

tem. In Proceedings of the international conference

on Multimedia, pages 10–13.

Lienhard, A., Reinhard, M., Caplier, A., and Ladret, P.

(2014). Photo Rating of Facial Pictures based on Im-

age Segmentation. In Proceedings of the 9th Int. Conf.

on computer Vision Theory and Applications, pages

329–336, Lisbonne, Portugal.

Luo, Y. and Tang, X. (2008). Photo and video quality

evaluation: Focusing on the subject. Computer Vi-

sionECCV 2008, pages 386–399.

Males, M., Hedi, A., and Grgic, M. (2013). Aesthetic qual-

ity assessment of headshots. In 55th International

Symposium ELMAR, number September, pages 25–

27.

Marchesotti, L. and Perronnin, F. (2012).

Evaluation au-

tomatique de la qualit

e esth

etique des photographies

a l’aide de descripteurs d’images g

eriques. In Re-

connaissance des Formes et Intelligence Artiﬁcielle

(RFIA).

Murray, N., Marchesotti, L., and Perronnin, F. (2012).

AVA: A large-scale database for aesthetic visual anal-

ysis. Computer Vision and Pattern Recognition, pages

2408–2415.

Poga

cnik, D., Ravnik, R., Bovcon, N., and Solina, F.

(2012). Evaluating photo aesthetics using machine

learning. In Data Mining and Data Warehouses, pages

4–7.

Robnik-

Sikonja, M. and Kononenko, I. (2003). Theoretical

and empirical analysis of ReliefF and RReliefF. Ma-

chine learning, 53:23–69.

Tang, X., Luo, W., and Wang, X. (2013). Content-Based

Photo Quality Assessment. IEEE Transactions on

Multimedia, 15(8):1930–1943.

Tong, Y., Konik, H., Cheikh, F. A., and Tremeau, A.

(2010). Full reference image quality assessment based

on saliency map analysis. Journal of Imaging Science

and Technology, 54(3):1–21.

Viola, P. and Jones, M. (2001). Rapid object detection

using a boosted cascade of simple features. In Pro-

ceedings of the IEEE Computer Society Conference on

Computer Vision and Pattern Recognition., volume 1,

pages I–511–I–518. IEEE Comput. Soc.

Willis, J. and Todorov, A. (2006). Making Up Your Mind

After a 100-Ms Exposure to a Face. Psychological

science, 17(7):592–598.

Wong, L.-k. and Low, K.-l. (2009). Saliency-enhanced im-

age aesthetics class prediction. In 16th IEEE Interna-

tional Conference on Image Processing, pages 997–

1000. Ieee.

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

552