1.1 Related Work
Different types of metrics can be used to assess the
quality of photographs. These metrics can be grouped
into different levels:
• Technical metrics
• Subject metrics
• Composition metrics
• High-level metrics
Existing work does not necessarily take these lev-
els into account, but often looks at the task holisti-
cally - simply outputting an attractiveness score for
the input pictures, regardless of which metric level
they employ. Such a black-box approach may work,
but in order to understand the limitations of individual
systems, it is instructive to look at their level of met-
rics. After all, a system which solely evaluates, say,
colours will be unable to gauge the attractiveness of
the composition.
These levels are described in further detail below,
but before any of them can be evaluated, data must
be available. We point the reader toward some of the
different comprehensive datasets which do exist, such
as The Aesthetic Visual Analysis (AVA) (∼250.000
images) (Murray et al., 2012), Photo.Net (∼20.000
images)
1
, and the DPChallenge dataset (∼16.000 im-
ages)
2
. Each of them contain catalogues of images
which are rated by users from an aesthetic perspec-
tive (Deng et al., 2017).
Technical Metrics: describe the technical qualities
of the photo, such as exposure, sharpness, white bal-
ance, depth of field etc. (Marchesotti et al., 2011).
Research in methods for grading photographs based
on the technical metrics is well documented. Meth-
ods for computing various features and training a Sup-
port Vector Machine (SVM) to discriminate between
pleasing and displeasing photographs have been pro-
posed (Datta et al., 2006). Others have used Scale-
Invariant Feature Transform (SIFT) to extract key-
points and feature descriptors encoded in a Fisher
Vector to then classify using an SVM to determine
whether a photograph is pleasing or not, reaching an
accuracy of approximately 90% on the CUHK dataset
and 77 % on the Photo.net dataset.
Subject Metrics: are optimised for a specific cat-
egory of photographs, and hence the usable subject
metrics vary, depending on the subject. They are effi-
cient for a fixed task, known beforehand, but are not
generally applicable. If a photograph contains faces,
useful face-related subject metrics could be facial ex-
pressions, face symmetry, and face pose (Deng et al.,
1
http://photo.net
2
http://DPChallenge.com
2017). Research focusing on face-related regions by
using these three metrics, among others, to predict the
aesthetic quality have been made, achieving good re-
sults (Li et al., 2010).
Composition Metrics: relate to how the objects, and
especially the salient objects, are positioned relative
to each other, and relative to the scene. Simplicity
of the scene and balance among visual elements are
some of the indicators of good composition. These
composition metrics are also utilised to make salient
objects stand out more. Examples of composition
metrics are rule of thirds, low depth-of-field and op-
posing colours (Deng et al., 2017)(Obrador et al.,
2010). Researchers have explored the role of com-
position metrics in image aesthetic appeal classifica-
tion, focusing on simplicity and visual balance. They
achieved close to state-of-the-art image aesthetic-
based classification accuracy, only using composition
metrics (Obrador et al., 2010).
High-level Metrics: are hard to define, as they are
based on abstract concepts. High-level metrics can re-
late to either simplicity, realism or photographic tech-
nique, and designed high-level metrics such as spatial
distribution of edges, colour distribution and blur (Ke
et al., 2006). Some researchers have looked at the
content of images as high-level metrics, and present
the following content-based high-level metrics: pres-
ence of people, presence of animals and portrait de-
piction (Dhar et al., 2011).
Research in quality assessment of photographs
has, until recently, been focused on designing hand-
crafted features which can be used to distinguish be-
tween photographs of good or poor quality based on
different aesthetic measures, such as subject metrics
and high-level metrics (Guo et al., 2014)(Datta et al.,
2006)(Tong et al., )(Dhar et al., 2011). These hand-
crafted features were previously mostly based on a
combination of different metrics, such as the rule-
of-thirds, focus, exposure, colour combinations, etc.
These metrics were later largely replaced by generic
image descriptors such as Bag-Of-Visual-words and
Fisher Vectors (Marchesotti et al., 2011) in an attempt
to model photographic rules, using generic content
based features, which performs equal to, if not bet-
ter than the simple handcrafted features (Deng et al.,
2017). Lately, of course, research has been made
in employing Deep Convolutional Neural Networks
(CNN) in picking out the photographs of highest aes-
thetic quality (Tian et al., 2015). Deep learning meth-
ods may be able to generalise better across differ-
ent scenarios, whereas handcrafted methods are more
suited for specific tasks.
A unique approach (Kao et al., 2016) is look-
ing at dividing images into three different categories,
VISAPP 2021 - 16th International Conference on Computer Vision Theory and Applications
248