
 
detection is via detecting human face. Human face is 
the most unique part in human body, and if it is 
accurately detected it leads to robust human 
existence detection. Identifying the presence of face 
in video streams is one of the most important 
features that must be extracted. For each image of 
the video containing more than one face, we 
calculate the number of existing faces in each frame 
of video then removes the region face, and calculate 
the rate of correct detection of the skin. In order to 
separate the region face, we scan the segmented 
image in search of pixels that match the label of the 
region. The result will be a binary image that does 
not contain the region. 
We must first determine the number of regions 
of skin in the image, by associating with each region 
an integer value called a label. We performed 
measurements by testing different sets of 100 and 
averaging the results. All of the results are 
represented by the following figure. 
 
Figure 1: Rate of good detection based on the number of 
face. 
We assume that an image will contain an adult 
material if the image contains at max four persons 
and one person at least. Normally this is where we 
find the most actually. Our way proves to be able to 
correctly online determine the skin and effectively 
distinguish naked videos from non-naked videos by 
integrating texture, features extraction and face 
detection. After this step we adapt neural networks 
to classify videos. More specifically, the classifier 
will act on the vector constructed from the 
calculated descriptors in the next paragraph to 
decide what kind of video analysis. After we present 
functions based on grouping of skin regions which 
could distinguish the adult images of the other 
images. Many of these features are based on suitable 
ellipses calculated on the skin map. These functions 
are adapted to our demand for their simplicity. 
Consequently we calculate for each card skin two 
ellipses namely Suitable Global Ellipse (GFE) and 
Local Ellipse (LFE) based only on the largest region 
on the map skin. We distinguish 8 functions of the 
skin map 3 first functions are global. 
- The average probability of skin of the entire image. 
- The average probability of skin inside the GFE. 
- The number of areas of skin in the image. 
- Distance from the larger area of skin at the center 
of the image. 
- The angle of the main axis of the LFE of horizontal 
axis. 
- The average probability of skin inside the LFE. 
- The average probability of skin outside the LFE. 
- Number of dominant face in the video to analyze. 
6 NEURAL NETWORK 
In this step, we suggest to use the Artificial Neural 
Network (ANN) classifier which is considered as the 
majority common technique used of a decision 
support system in image processing. In particular we 
use a Multi Layer Perceptron (MLP) neural network. 
Hence, the used network concentrates on the study 
of decision-boundary surface telling adult videos 
from non-adult ones. It is composed of a large 
number of vastly interconnected processing elements 
(neurons) working in unison to solve the adult video 
recognition problem. The decision tree model 
recursively partitions an image data space, using 
variables that can divide image data to most 
identical numbers among a number of given 
variables. This technique can give incredible results 
when characteristics and features of image data are 
known in advance (BOUIROUGA et al., 2011). The 
inputs of our neural network are fed from the feature 
values extracted from descriptors. Since the various 
descriptors can represent the specific features of a 
given image, the proper evaluation process should 
be required to choose the best one for the adult 
image classification. Our MLP classifier is a semi-
linear feed forward net with one hidden layer. The 
MLP output is a number between 0 and 1; with 1 for 
adult image and 0 for no-adult image. 
7 EXPERIMENTS 
We conduct two experiments in performance 
evaluation: one for the detection of skin and one for 
the classification of videos. In skin detection 
evaluation, we use 200 videos, 130 for training and 
70 adult videos for test. Performance comparison 
between the different color spaces is shown in 
Figure 2. 
VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications
424