An approach using neural networks
Adriano Martins Moutinho, Antonio Carlos Gay Thomé
Núcleo de Computação Eletrônica, Universidade Federal do Rio de Janeiro, Rio de Janeiro,Brasil
Luiz Biondi Neto, Pedro Henrique Gouvêa Coelho
Faculdade de Engenharia, Universidade do Estado do Rio de Janeiro, Rio de Janeiro, Brasil.
Keywords: Face detection, neural networks, image processing.
Abstract: Security systems based on face recognition often have to deal with the problem of finding and segmenting
the region of the face, containing nose, mouth and eyes, from the rest of the objects in the image. Finding
the right position of a face is a part of any automatic identity recognition system, and it is, by itself, a very
complex problem to solve, normally being handled separately. This paper describes an approach, using arti-
ficial neural networks (ANN), to find the correct position and separate the face from the background. In
order to accomplish this goal, a windowing method was created and combined with several image pre-
processing steps, from histogram equalization to illumination correction, as an attempt to improve neural
network recognition capability. This paper also proposes methods to segment facial features such as mouth,
nose and eyes. Finally, the system is tested using 400 images and the performance of face and facial features
segmentation is presented.
The human face, in this paper, is defined as the
smallest rectangle that contains the mouth, nose and
eyes of the person, in such way that it is possible to
separate these elements from the other objects in the
Figure 1 illustrates an example of face segmen-
tation as it is defined here. This is a very difficult
task and in order to succeed, any face detection
system must be able to handle several problems like
presence of beard, mustache, skin color variations,
head inclination and face rotation.
Figure 1: Face example
One of the objectives in this research is to make
the system, as much as possible, immune to these
The human frontal face, as defined in figure 1,
.is a very distinguished pattern, making the MLP
neural network (Haykin, 1999) a nice choice to
achieve good results. A neural network could be
trained using several sequences of face images, as
shown in figure 2, and many other non-face images
containing geometric shapes, noise and variations
(Rowley, 1999).
Figure 2 shows examples of face and non-face
databases (Rowley, 1999):
Figure 2: Face (left) and non-face (right) database (Row-
ley, 1999)
Martins Moutinho A., Carlos Gay Thomé A., Biondi Neto L. and Henrique Gouvêa Coelho P. (2004).
FACE PATTERN DETECTION - An approach using neural networks.
In Proceedings of the Sixth International Conference on Enterprise Information Systems, pages 172-177
DOI: 10.5220/0002610301720177
Figure 3: System general view
The proposed face segmentation can be divided into
four main modules, as shown in figure 3.
The first module, called windowing, splits the
main picture into squared 19x19 sized sub-images
using an edge seek method, as described in section
3; the second module, called image preprocessing
and coding, described in section 4, prepares the
image signal to improve ANN generalization. Sev-
eral processing steps such as contrast stretch and
illumination correction are applied; the third module
corresponds to an MLP neural network, trained
using databases such as the ones in figure 2. All the
sub-images identified as faces by the ANN are
passed to the fourth module, where face framing is
optimized using a recursive process, in order to
obtain the smallest rectangle that contains the face,
according to the definition presented in section 1.
These adjustments are further described in section 5.
As long as any part of the image, independently of
size or proportion, could contain the desired face,
the windowing module must extract from the origi-
nal image, every possible and distinct rectangular
sub-images, sending them all to the pre-processing
module and then to the neural network, trained to
recognize face patterns. Although this method is
theoretically possible, it would require too much
processing time, because even from a small image
would be possible to extract various distinct sub-
The method suggested in this paper seeks for
faces only around image edges, extracting only sub-
images that contain at least one edge pixel. Figure 4
shows an example of this process. It can be clearly
observed that a face will always contain edge pixels.
Figure 4: Edge windowing example
Even so, the number of distinct images could be
very high. A solution to that would be to first con-
sider the sub-images as a square, not as a rectangle,
and only in a fine adjustment, as described in section
5, it will be optimized to better encompass the face.
After windowing, every extracted image is re-
sized to 19x19 and passed to the preprocessing
module, described in section 4. The windowing
method first computes edge points in the image and
assembles them in a sorted list (according to its
distance to the origin). Then, an adjustable square,
initially 19x19 sized, is centered on the first point in
the list and extracts a sub-image.
In order to obtain a good generalization, it is neces-
sary to do some processing before applying the
signal to the ANN. In the specific case of face pat-
tern detection, it is important to apply a process that
emphasizes the differences between faces and non-
faces images (Rowley, 1999).
Thus, a sequence of transforms is applied, in-
cluding histogram equalization (Gonzalez, 1992),
oval filtering, and illumination correction, as shown
in figure 5.
FACE PATTERN DETECTION - An approach using neural networks
Figure 5: Preprocessing steps, order of application
The histogram equalization is applied following
the traditional methods (Gonzalez, 1992), its objec-
tive is to raise image contrast.
The oval filter follows the principle that pixels
in the borders of face images do not contain relevant
information and could be masked, this operation
does not change a face to a non-face, or vice-versa
(Rowley, 1999). This mask allows the network,
during the training process, to avoid consider such
points, which tend to favor generalization.
Pixel elimination is done by adjusting triangular
masks to the sides of a face, as shown in figure 6.
Figure 6: An oval filter example
Illumination correction is the key point to a reli-
able recognition. It is designed to remove the effect
of regular illumination.
If light effect is presumed to be a linear func-
tion, an image where some illumination is applied
could be decomposed into a linear gradient and a
non-illuminated image. To find this gradient directly
from an already illuminated image, a method is
proposed where the mean of grayscale values of
both columns and rows of an image is calculated.
So, a vector of all row means and a vector of
columns means are build. Linear approximations of
these vectors are computed using the minimum
square method, shown in equations 1, 2 and 3. The
Linear equation is
=+, where z is the mean
of grayscale values, w is the column or row number
and a and b are the linear coefficients of the ap-
proximation. Figure 7 shows a practical example of
this process.
bawz +
zw w
wz w
zw w
The next step is to build a matrix with character-
istics extracted from the image. This process is
called coding, where the lines of the pre-processed
image are concatenated creating a matrix containing
all face examples. The next step is subject the matrix
to a statistic method called principal components
analysis (PCA) (Haykin, 1999), (Johnson, 1998).
The PCA method applies a linear transformation
to the subspace and removes those dimensions with
low variance. It can reduce the size of a training
vector without losing much information. PCA appli-
cation (Haykin, 1999) (Johnson, 1998) results in a
size reduced vector and a transformation matrix,
used to map new data into the same reduced form. In
this paper, the computation of the transform matrix
is done using only the face database, as shown in
figure 2.
The best neural network configuration has one
hidden layer with around 150 neurons. The activa-
tion function is hyperbolic-tangent for the hidden
layer and sigmoid for the output layer.
Figure 7: Illumination correction method
Figure 8: Centroid and Successive comparison reduction
The recognition method proposed here tends to iden-
tify the same face several times, on adjacent squares,
as shown in figure 8.
However, this cannot be considered as a disad-
vantage, since all these multiple detections can be
used to validate and find the correct face position
through a process called Centroid Reduction.
Let rectangles A and B be called “Centroid
neighbors” if the centroid of A is within rectangle B
and vice-versa. This definition can be extended to n
rectangles. Figure 8 shows many centroid neighbors,
shown in yellow color.
The centroid reduction method replaces all
“centroid neighbors” by the average square. Squares
that do not have any neighbors are ignored because
they are more likely to be a mistake made by the
ANN. Figure 8 shows an example of centroid reduc-
tion. All yellow squares are replaced by another one
that is computed using the average. The red square
in figure 8 is ignored during the process because it
does not have any “centroid neighbors”.
After centroid reduction, another fine adjust-
ment is made. It is called successive comparison
reduction. From the face position that is the output
of centroid reduction, other faces are generated by
shifting the original rectangle shape one pixel to the
left, right, bottom and top. Other faces are also cre-
ated from the original one, having one pixel width
and height less. All of these faces along with the
original image are again submitted to the neural
network, in a greedy search recursive process, where
the actual face is the higher output value of the neu-
ral network. This process repeats itself until there is
no change in rectangle position between two phases.
Figure 8 also shows the successive comparison ad-
justment method. The result rectangle is smaller and
more adapted to the face than the input square.
Once the face is located, simple techniques can be
developed to find each face element of the subject.
This is possible because if the smallest rectangle that
contains the face is known, it can be said that all
face elements respect a specific geometry that helps
the task of finding every sub-part of face, such as
eyes, mouth and nose.
6.1 Eye pattern localization
With the smallest rectangle that covers the face, it is
possible to guarantee that the eyes are on the supe-
rior half of this rectangle. It is also possible to admit
that if this superior part is again divided in two
halves, one will be very likely to contain the left eye
and the other the right one.
So, to obtain the right position of both eyes, 2d
cross correlation between the right and left halves
and an eye standard pattern are computed. Equation
4 shows the cross correlation function where M and
N are the dimensions of the picture. Figure 9 shows
the eye pattern.
)() ()(
fxy gxy fxygx my n
,∗ , = , +,+
Figure 9: Eye pattern
Thus, eye masks are generated admitting an
80% tolerance of the maximum correlation value.
When both masks are computed, final eyes positions
are obtained. Figure 10 represents the process, and
figure 11 shows the positions computed.
6.2 Mouth pattern localization
Mouth can be easier identified than eyes. First, the
original face rectangle is divided horizontally. Then,
the mouth’s edges are computed using a Laplacian
method (Gonzalez, 1992) showed in equation 5:
fxy fxy fx
,= ,+ ,
Edges on the lateral sides of the image are ig-
nored and a fill algorithm is applied. This
morphological algorithm changes black points into
white when all pixels in D8 neighborhood are white
(Gonzalez, 1992), this process repeats itself up to the
point where no change occurs in the image.
FACE PATTERN DETECTION - An approach using neural networks
Figure 10: Cross correlation Method and final eyes position
Figure 11: Mouth segmentation method
Finally, if the bigger connected region is ex-
tracted, it will very likely to be the mouth, as shown
in figure 11.
6.3 Nose pattern localization
Although mouth and eyes detection could be done
independently, nose position would be very difficult
to be found without the eyes and mouth information.
This happens because the nose usually does not have
detectable borders and is not, in general, darker or
brighter than the rest of the face.
However, if eye and mouth positions are
known, it is very likely that nose will be located
inside the rectangle defined by the centroids of both
eyes, mouth, and the lowest coordinate of eyes cen-
troid, as shown in figure 12.
Figure 12: Nose position.
From this region, the nose is segmented using
the same method of mouth detection of section 6.2,
by eliminating lateral edges and looking for the
bigger connected region filled by an morphological
With face localization methods described in sections
3, 4, 5 and face elements localization methods, de-
scribed in sections 6.1, 6.2 and 6.3; it is possible to
build a face pattern finder.
Using this pattern finder, 400 pattern-tests are
made using the faces available at the AT&T labora-
tory page (AT&T, 2003). Although the focal
distance variance among the faces is very small in
this database, it presents some other important chal-
lenges such as the presence of many non-frontal
poses and rotated faces.
Preliminary results, shown in table 1, indicate
that the most difficult task is nose identification,
because it has neither borders nor different colors
from the rest of the face elements, and it is also very
affected by the presence of glasses, beard or mus-
A comparison between the method present here
and two other are also shown in table 1:
The first one uses a spikenet to detect face pat-
terns (Van Rullen, 2001). The database used was the
same, but has been separated in test and train sets,
table 1 shows both results.
The Second one uses a Kiosk System (Mãkinen
et al, 2002) to detect face features, only frontal and
non-rotated faces were used.
Figure 13 shows various outputs from the sys-
tem. Face detection is shown in yellow; mouth
position in blue, eyes in red and finally nose in
Figure 13: Results of the Face pattern recognition system
Table 1: Preliminary comparative Results
Success rate
This paper
(spikenet, 2001)
(Mãkinen et al, 2002)
Face Preprocessing, Illumination correction and ANN.
90% 88% 92%
Eyes 2d correlation with eyes pattern. 83% 90% 83%
Nose Edge detection and connected regions. 68% Not available 72%
Mouth Edge detection and connected regions. 86% 91% 47%
The same database (AT&T, 2003) was used as a training and test set, training set results are shown here.
Only frontal and non-rotated faces were used in these results.
Results demonstrated on table 1 and figure 13 are
promising. Localization of eyes, mouth and face are
relatively well accurate, but nose position is quite
difficult to find because there are almost no features
that separates it from the background.
Face identity recognition systems can be im-
plemented using face localization methods shown
here. If face position is known, eyes and mouth
position can be found and used to adjust the orienta-
tion of the image. A new set of features can now be
extracted from these parts and used to feed another
neural network model trained to identify the owner
of the face.
New studies are being developed using the face
detection system presented in this paper. Then, it
will be possible to enhance detection process speed
and accurateness, in order to become a part of an-
other identity recognition system.
Haykin, S., 1999. Neural Networks, a comprehensive
Foundation, 2
edition. Prentice Hall Press, New Jer-
Rowley, H., 1999. Neural Network-Based Face Detection.
School of Computer Science, Computer Science De-
partment, Carnegie Mellon University Pittsburgh.
Gonzalez, R. C., 1992. Digital Image Processing, Addi-
Zurada, J. M., 1992. Introduction to Artificial Neural
Systems, West Publishing Company, New York.
AT&T, 2003. AT&T Laboratories, Cambridge University
Engineering Department, Cambridge,
Johnson, R. 1998. Applied Multivariate Statistical Analy-
sis, Prentice-Hall, EUA, 1998.
Demuth, H. and Beale M. 2001. Neural Network Matlab
Toolbox, user’s guide. MathWorks, Inc.
Warren, S. 2001. Internet Neural Network
FAQ.ftp://ftp.sas.com/pub/neural/FAQ.html. Copy-
right 1997, 1998, 1999, 2000, 2001 by Warren S.
Sarle, Cary, NC, USA, 2001.
Mäkinen et al, 2002. Real-Time Face Detection for Kiosk
Interfaces. Unit for Computer-Human Interaction
(TAUCHI) Department of Computer and Information
Sciences FIN-33014 University of Tampere, Finland.
SPIKE PER NEURONE. Centre de Recherche
Cerveau & Cognition, UMR 5549, 133 route de Nar-
bonne, 31062 Toulouse, France.
FACE PATTERN DETECTION - An approach using neural networks