the Viola-Jones algorithm as well as an extended
version of Eigenfaces enhanced through neural net-
works turned out to be the most feasible methods.
A new revolution in Face Detection is introduced by
Facebook called ’DeepFace’ (Taigman et al., 2014).
Based on coupling a 3D model-based alignment with
large feedforward models an accuracy of 97.35% is
reached.
2.2 Skin Tone Detection
Skin Tone Detection is important to determine the
skin tone of depicted persons in order to detect the
percentage of skin visible on the digital asset. The re-
sulting value can be used for sieving input data for
nudity in order to minimize the number of images
to be processed in the next steps. Selecting a color
space suitable for skin tones is important. Usual color
spaces are: RGB (Red, Green, Blue), YCbCr (Lu-
minance, Chroma Blue, Chroma Red), HSV (Hue,
Saturation, Value), YIQ (Luminance, Cyan-Orange
Balance, Magenta-Green Balance) and YUV (Lumi-
nance, Chroma U, Chroma V). While all these color
spaces are suitable for skin tone detection in first
place, there are limitations to the process of applying
manual thresholds in order to parameterize the detec-
tion algorithm. External influences like reflections, il-
lumination and poor image quality lead to decreasing
detection rates. (Yang et al., 2011)
Red/Green (R/G) ratio and Human Composition
Matrix (HCM) are the two main processes of the
hierarchical image filtering method, introduced by
(Polpinij et al., 2008). R/G ratio is preferably used be-
cause it shows significant results for skin colors that
are commonly found in African, Asian and Caucasian
skins. This provides a feasible way of determining the
thresholds for skin tone detection algorithms. If R/G
ratio is not able to deliver reliable results, HCM is
applied as a next processing step. The input image is
sectored and compared against skin- and non-skin his-
togram models. Further, the probability of the color
being a skin-tone is derived.
Another approach is the combination of 2-D his-
tograms and the usage of Gaussian models (Tan et al.,
2012). In our proof-of-concept, an eye-detector was
used in order to refine the skin model. The major ad-
vantage of this algorithm is that it does not depend on
training data and can cope with different ethnicities
and varying illumination of the image.
To enhance the performance of skin tone detection
mechanisms, local features and descriptors can be ex-
tracted as introduced in (Jiang et al., 2007) or (Ng and
Pun, 2011).
2.3 Shape Detection
Shape detection is usually performed in succession
to skin tone detection. Most shape detection algo-
rithms follow the same approach: After areas of in-
terest are determined, they are characterized based on
the contour of the object. The decision between nor-
mal and pornographic images is made based on the
extracted contour and a set of post processing steps
as described in (Tan et al., 2012). Hu et al (Hu et al.,
2006) proposed a method for torso detection in still
images: “[..] the image is segmented into uniform ar-
eas. Then, dominant colors of the torso are adaptively
selected using a color probability model. Finally, the
torso candidates can be extracted based on the domi-
nant colors”
2.4 Age Detection
To be able to automatically distinguish between
pornography and child pornography, the age of ev-
ery person depicted on the source material is vital.
Throughout the last couple of years, age detection
gained significance, as shown by the sheer number
of research (Selvi and Vani, 2011) (Takimoto et al.,
2006) (Li et al., 2012) (Fu et al., 2010) done in this
field. The face is the only part of the human body that
allows to visually determine the age of the person.
Measuring the cranio-facial growth shows the most
significant changes during the first 20 years of life.
To detect the age, a set of features has to be extracted
from the face, including eyes, nose and mouth. While
research proposes different ways of detecting age, ap-
proaches based on distances, ratio and landmarks turn
out to provide the best performance. Weda et al (Weda
and Barbieri, 2007) show that the extraction of sin-
gle facial features, e.g. the iris, also provide good re-
sults in age estimation. Since the human iris does not
change in size in a persons lifetime while the head
certainly does, the iris/head ratio can be used to deter-
mine the approximate age. The prerequisite for this
approach is the availability of frontal images, some-
thing that is rare in the particular domain. In (Geng
et al., 2013), the authors address that the main diffi-
culty in facial age estimation is the lack of sufficient
training data for many ages. Based on the fact that
the growth of faces is a slow and smooth process, an
algorithm named IIS-LLD is introduced which learns
from labeled distributions. The basic idea behind their
approach is that a face image contributes to not only
the learning of its real age, but also the learning of
its neighboring ages. Another approach is introduced
by Guo et al (Guo et al., 2009) who use biologically
inspired features for human age estimation.
ChallengesandLimitationsConcerningAutomaticChildPornographyClassification
493