Figure 2: Left: the 26 points of interest identified in a
frontal face. Right: considered minimal set of distances.
the Gabor filters are anisotropic and estimating of fre-
quency parameter depends on the face sizes in pix-
els. Since the images in the datasets are generated
by using single camera positioned at front of face,
a pertinent normalization has to be conducted to ad-
dress in-plane rotations and face size. Two methods
based on a three-point-normalization via transforma-
tion matrices are employed. The first one uses three
fixed points, where two are located in the eye centers
and the third in the middle of the mouth determined
by the cutting lines through opposing mouth points. It
maps simply the points onto three predefined points to
determine transform matrix. The second method pre-
serves the relation between the inter-ocular distance
and the perpendicular line distance of the mouth mid-
dle point to that line. Therefore the ”natural appear-
ance” of the face is more preserved, since the face
shape is respected.
After the normalization, the Gabor filters are ap-
plied to the sample at each POI. As a result, we ob-
tained feature vector containing 18 complex coeffi-
cients for each POI and reduced the size of the fea-
ture vector by considering only magnitude of real and
imaginary parts.
3.2 Feature Extraction in Geometric
Domain
To provide a unit system for the intra-face measure-
ments that are comparable across individuals, we need
certain anchor points that have to lie in areas with
sufficient textural information (for easy detection), be
present in a consistent manner across different sam-
ples/models, be at locations that do not move due
to facial deformations and be not located at points
with transient information (e.g. wrinkles, bulges).
Among different candidates illustrated in the Figure
2 the outer points of the left and the right eye turned
out to be the best options. The points at the temples
would be a good choice, too, but can vanish due to
even small out-of-plane rotations or be hard to detect
because of hair. All measured distances will be di-
vided by this span for conversion into the unit system.
As facial landmarks, we used a subset of the points in
the Figure 2, except for point 6, 8, 10, 12, 16-18 and
23-25 which are anchor points.
We calculated geometry-based features by mea-
suring distances of anchor-to-landmark, landmark-to-
landmark points and dividing them by the base unit.
Furthermore, div- and med-features are obtained by
considering two intersecting lines between the corre-
sponding points, for example, the lines of point 20
to 22 and 19 to 21. We then calculated the ratio
and median values based on the lines. Consequently,
these features represent the change of the eye- or
mouth-form. Figure 2 right shows a possible minimal
set of distances. Light gray lines are the spans be-
tween anchor-to-landmark and the dark lines indicate
distances that were used to calculate div- and med-
features.
3.3 Classification
We tested the recognition efficiency of the two feature
sets by employing two well-known statistical classi-
fiers, k-nearest neighbor (k-NN) and support vector
machines (SVM). For k-NN, Euclidean distance mea-
sure is used with k = 3. We used the C-SVM (RBF
kernel) with a fixed γ and high cost factor c by build-
ing binary classifiers in terms of one-vs-one as well
as one-vs-all.
4 RESULTS
Figure 3 illustrates the Fisher projection of the feature
sets in order to get an preview of the distinguishability
according to the seven expression classes. The distri-
butions in the figures show that the class related sam-
ple density for the Gabor approach seems satisfying,
even though some classes (e.g. disgust and anger) in-
tersect each other.
Table 1 and 2 summarize the recognition results.
Through all tests it turned out that the JAFFE dataset
could be easily classified, compared to the FEED-
TUM dataset, regardless which feature set is used.
This should be due to the high consistency of the sam-
ples and the feature extraction favorable setup of the
JAFFE dataset, while the slightly more ”real world”
oriented FEEDTUM samples allowed therefore infe-
rior results.
EMOTIONAL FACIAL EXPRESSION RECOGNITION FROM TWO DIFFERENT FEATURE DOMAINS
633