EVALUATION OF FEATURES AND COMBINATION APPROACHES

FOR THE CLASSIFICATION OF EMOTIONAL SEMANTICS IN

IMAGES

Ningning Liu, Emmanuel Dellandr

ea, Liming Chen

Universit

e de Lyon, CNRS

Ecole Centrale de Lyon, LIRIS, UMR5205, F-69134, Lyon, France

Bruno Tellez

Universit

e de Lyon, CNRS

Universit

e Lyon 1, LIRIS, UMR5205, F-69622, Lyon, France

Keywords:

Emotional semantic, Image classiﬁcation, Evidence theory.

Abstract:

Recognition of emotional semantics in images is a new and very challenging research direction that gains

more and more attention in the research community. As an emerging topic, publications remains relatively

rare and numerous issues need to be addressed. In this paper, we propose to investigate the efﬁciency of

different types of features including low-level features and proposed semantic features for classiﬁcation of

emotional semantics in images. Moreover, we propose a new approach that combines different classiﬁers

based on Dempster-Shafer’s theory of evidence, which has the ability to handle ambiguous and uncertain

knowledge such as the properties of emotions. Experiments driven on the International Affective Picture

System (IAPS) image databases, which is a common stimulus set frequently used in emotion psychology

research, demonstrated that the proposed approach can achieve promising results.

1 INTRODUCTION

In recent years, online photo sharing communities are

emerged and are growing (like, ﬂick.com, photo.net,

dpchallenge.com, deviantart.com). In the era of in-

formation explosion especially with more and more

pictures and other multimedia, it is an urgent thing

to continuously develop intelligent systems for au-

tomatic image emotional semantic analysis.(A. W.

M Smeulders, 2000; R. Datta, 2005)

One of the goals of computer science, and partic-

ularly artiﬁcial intelligence is to elaborate intelligent

computers having the ability to interact with human

beings in a natural way. Thus, one of key issues is

to allow computers to recognize, understand and ex-

press emotions, and numerous works have been done

for recent years on these aspects (J. Z.Wang, 2001;

K. Kuroda, 2002; Picard, 1997; C. Columbo, 1999;

C.-H. Chan, 2005; S. Wang, 2005; C.-T. Li, 2007;

Z. Zeng, 2009; W. Wang, 2008).

As far as emotion recognition is concerned, re-

searches mainly focus on affect recognition in audio

(speech and music) and visual based facial expres-

sions. Limited contributions deal with the recogni-

tion of emotions carried by images (V.Yanulevskaya,

2008; W. Wei-ning, 2006; S. Wang, 2005; Q. Wu,

2005; C. Columbo, 1999), and a lot of issues need

to be addressed particularly concerning the three fol-

lowing fundamental problems: emotion models, fea-

ture extraction for emotion recognition and classiﬁ-

cation schemes to handle the distinctive characteris-

tics of emotions, and the main difﬁculty remains to

bridge the gap between low-level features extracted

from images and high level semantic concepts such

as emotions.

Several models have been considered in the litera-

ture to represent emotions (P. Dunker, 2009), and the

two main approaches are the discrete one and the di-

mensional one. The ﬁrst model consists in consider-

ing adjectives or nouns to specify the emotions, such

as happiness, sadness, fear, anger, disgust and sur-

prise. The second model describes emotions accord-

352

Liu N., Dellandréa E., Chen L. and Tellez B..

EVALUATION OF FEATURES AND COMBINATION APPROACHES FOR THE CLASSIFICATION OF EMOTIONAL SEMANTICS IN IMAGES.

DOI: 10.5220/0003364603520357

In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2011), pages 352-357

ISBN: 978-989-8425-47-8

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

ing to one or more dimensions representing a special

mood characteristic, such as pleasure, arousal or con-

trol. These models allow representing a wider range

of emotions than the ﬁrst one. In this paper, the di-

mensional model has been employed as illustrated in

Fig. 1.

Few works have been done to propose an efﬁcient

automatic images emotion recognition system. Yan-

ulevskaya et al. (V.Yanulevskaya, 2008) propose an

emotion categorization approach for art works based

on the assessment of local image statistics using sup-

port vector machines. Wang et al. (S. Wang, 2005)

uses a Support Vector Machine of Regression to pre-

dict values of emotional factors based on three im-

age features: luminance fuzzy histogram, saturation

fuzzy histogram integrated with color contrast and

luminance contrast integrated with edge sharpness.

Colombo et al. (C. Columbo, 1999) use a suitable

set of rules to extract some intermediate semantic lev-

els, and then build the semantic representation by

a process of syntactic construction called composi-

tional semantics. Wu et al. (Q. Wu, 2005) use SVMs

to learn the mapping correlation between affective

space and visual feature space of images, and then the

trained SVMs are used to estimate and classify im-

ages automatically. However, no study has attempted

to identify the most adapted features and type of clas-

siﬁers to handle the characteristics of emotions which

are high-level semantic concepts. Thus, we propose

in this paper to evaluate the efﬁciency of different

types of features and combination methods for emo-

tion recognition. Moreover, we present a novel com-

bination approach based on Dempster-Shafer’s The-

ory of Evidence, which allows to handle ambiguity

and uncertainty which are characteristics of emotions.

The rest of this paper is organized as follows. Im-

age features used for characterizing emotions in im-

ages are presented in section 2. The proposed fusion

of information based on the Theory of Evidence is de-

tailed in section 3. Experiments setup and results are

presented in section 4, followed by the conclusion in

section 5.

2 IMAGE FEATURES FOR

EMOTION CLASSIFICATION

2.1 Low-level Image Features

Most of works dealing with emotion recognition

make use of traditional image features that are also

used for other computer vision problems. The three

main categories of image features are based on color,

texture and shape informations. Concerning color,

studies have shown that HSV (Hue, Saturation, Value)

color space is more related to human color percep-

tion than others such as RGB color space. Based on

HSV color space, authors (P. Dunker, 2009; Q. Wu,

2005) use several ways to describe color contents in

images such as moments of color, color histograms,

correlograms and histograms of color temperature.

Concerning texture, Tamura features (Q. Wu, 2005)

have been proven to correlate strongly with human

visual perception: coarseness, contrast, directionality.

The spatial grey-level difference statistics (C.-T. Li,

2007), known as co- occurrence matrix, can describe

the brightness relationship of pixels within neighbour-

hoods, and the local binary pattern (LBP) descriptor

is a powerful feature for image texture classiﬁcation.

Figure 1: A dimensional emotion model: dimension of

pleasure (ranging from pleasant to unpleasant) and arousal

(ranging from calm to excited). Each point represents an

image from database IAPS (P. J. Lang, 1999).

Concerning shape, studies on artistic paintings

have brought to the fore semantic meanings of shape

and lines, and it is believed that shapes in a picture

also inﬂuence the degree of aesthetic beauty perceived

by humans (Q. Wu, 2005). In this paper, the hough

transform is employed to build a histogram of line ori-

entations in 12 different orientations.

EVALUATION OF FEATURES AND COMBINATION APPROACHES FOR THE CLASSIFICATION OF

EMOTIONAL SEMANTICS IN IMAGES

353

2.2 Semantic Image Features

Some attempts have been made to identify higher

level image features linked to emotions. Thus, studies

on artistic paintings have brought to the fore seman-

tic meanings of color and lines that have been used

for designing image features for emotion recognition

purposes in the work of (C. Columbo, 1999). Indeed,

Figure 2: Itten’s chromatic circle.

color combinations can produce effects such as har-

mony, non-harmony, calmness and excitation. The vi-

sual harmony can be obtained by combining hues and

saturations so that an effect of stability on human eye

can be produced. This harmony can be represented

thanks to Itten’s chromatic circle (Itten, 1961) where

colors are organized into a chromatic circle and con-

trasting colors have opposite coordinates according to

the center of the circle (Fig. 2). To extract an im-

age feature that characterizes harmony, dominant col-

ors in the image are ﬁrst identiﬁed and plotted into

the chromatic circle. Then, the polygon linking these

colors is considered. The harmony can ﬁnally be de-

scribed by a value in such a way that a value next to 1

corresponds to a regular polygon whose center is next

to the circle center which characterizes a harmonious

image, and a value next to 0 corresponds to an irregu-

lar polygon characterizing a non harmonious image.

Lines also carry important semantic information

in images: oblique lines communicate dynamism and

action whereas horizontal or vertical lines rather com-

municate calmness and relaxation. To characterize

dynamism and action in images, the ratio is computed

between the numbers of oblique lines respect to the

total number of lines in an image.

3 THE THEORY OF EVIDENCE

FOR EMOTION RECOGNITION

Emotions are high-level semantic concepts that are by

nature highly subjective and ambiguous. Thus, in or-

der to perform efﬁciently this recognition task, it is

necessary to handle informations that can be uncer-

tain, incomplete, ambiguous and leading to conﬂicts,

this paper attempts to solve this issue by proposing a

new technique based on the Theory of Evidence.

3.1 Fundamentals of the Theory of

Evidence

The Theory of Evidence (Dempster, 1968) introduced

by Dempster and then formalized by Shafer (Shafer,

1976) and Smets(Smets, 1990) offers a theory allow-

ing the reasoning on knowledge that can be uncertain,

incomplete, and leading to conﬂicts.

Let Ω = {H

, H

, . . . , H

} be a ﬁnite set of possi-

ble hypotheses. This set is referred to as the frame of

discernment, and its power set denoted by 2

Ω

. Fol-

lowing are the basic concepts of the theory:

Belief Mass Functions. The conﬁdence, or belief, we

can have in a hypothesis given a source of information

(a type of feature in our case) is expressed by the mass

function m associated to this source of information.

Thus, the mass function assigns a value in [0, 1] to

every subset A of Ω and satisﬁes the following:

Ω

(

0) = 0 and

∑

A⊆Ω

Ω

(A) = 1 (1)

Combination Rule. Different mass functions from

several sources of information can be combined to im-

prove the knowledge used for the classiﬁcation deci-

sion. Let m

Ω

and m

Ω

be two mass functions from

two independent sources of information S1 and S2.

Then, the Transferable Belief Model (TBM) (Smets,

1990) combined mass function m

Ω

S1∩S2

of an hypothe-

sis A ⊆ Ω is given by:

Ω

S1⊕S2

(A) =

∑

B∩C=A

Ω

(B).m

Ω

(C)

1 − m

Ω

S1⊕S2

(

(2)

where m

Ω

S1⊕S2

(

0) =

∑

B∩C=

Ω

(B).m

Ω

(C)

3.2 Computation of the Evidence

We propose in this paper an original approach for

computing evidence. The principle is as follows. For

each type of feature, considered as source of informa-

tion S

, SVM classiﬁers are ﬁrst trained to recognize

each of the classes, or hypotheses H

. The outputs

of these classiﬁers are used to compute the beliefs.

This is done by applying on them membership func-

tions, represented in Fig. 3 allowing to give a belief to

the different classes and combination of classes, ac-

cording to classiﬁers. However, the classiﬁers are not

perfect and can be mistaken. To integrate this, the

VISAPP 2011 - International Conference on Computer Vision Theory and Applications

354

Figure 3: Membership functions f

and f

, associated re-

spectively to classes H

and H

, applied on SVM output x

to build mass functions.

efﬁciency of the classiﬁers (i.e. precision, given by

the confusion matrix) is used to weight their decision.

This is formalized as follows.

Let S

, S

be the m sets of features considered

as sources of information. For all S

, n binary clas-

siﬁers c

i j

are trained to recognize the n classes H

Let now consider the building of the mass function

i j

corresponding to the belief mass obtained from

source of information S

using classiﬁer c

i j

trained to

recognize class H

. According to the output x

i j

of c

i j

and using membership functions (Fig. 3), the mass is

distributed on three subsets of Ω: Ω itself, H

and H

the complement of H

in Ω as in Eq. 5.

i j

) = f

(x). p

i j

) (3)

i j

) = f

(x). p

i j

) (4)

i j

(Ω) = 1 − m

i j

) − m

i j

) (5)

where p

i j

) is the precision of c

i j

for class H

and p

i j

) is the precision of c

i j

for class H

, both

computed from the confusion matrix of c

i j

Thus, if the output x of classiﬁer c

i j

is a high pos-

itive value, it means that c

i j

is sure that the input is

in class H

. But, as c

i j

may be mistaken, the mass is

distributed not only on H

but also on Ω which corre-

sponds to uncertainty, according to the ability of c

i j

to correctly recognize H

. On the contrary, if x is very

negative, it means that c

i j

is sure that the input is in

class H

. However, this decision is also weighted by

the ability of c

i j

to correctly recognize H

, leading to

a distribution of mass between H

and Ω. Finally, if

x is around 0, it means that classiﬁer c

i j

has a doubt,

thus the mass is in majority given to Ω, which corre-

sponds to uncertainty.

Once mass functions m

i j

are computed from all

classiﬁers c

i j

, they are combined according to a given

combination operator, such as Dempster’s one (Eq.2),

which corresponds to a fusion of informations given

by all sources S

. Finally, a single mass function is

obtained distributing the belief over some subsets of

Ω. The ﬁnal decision can be taken according to deci-

sion measures presented in section 3.1.

4 EXPERIMENTS

In our experiments, we have made used of the IAPS

database (P. J. Lang, 1999), which provides ratings of

affect (pleasure or valence, arousal and control) for

1192 emotionally-evocative images. We have con-

sidered an emotion model based on the pleasure and

arousal dimensions using four classes corresponding

to each quadrant as shown in Fig. 5. The IAPS cor-

pus is partitioned into a train set (80% of the data, 953

images) and a test set (20% of the data, 239 images),

and all the experiments repeated ten times to get the

average correct classiﬁcation rate (CR).

To explore the performance of different feature

sets for visual emotion recognition presented in Sec-

tion 2, we have built a classiﬁcation scheme using two

support vector machine classiﬁers to identify each

class: the ﬁrst one is to identify arousal dimension,

and the second one is dedicated to the pleasure di-

mension. The results obtained are shown in Figure 6.

From these results, it appears that among the dif-

ferent features, texture (LBP, Tamura) are the most ef-

ﬁcient ones. Moreover, the higher level features (dy-

namism and harmony) may ﬁrst seem giving lower

performance, but as they consist in a single value,

their efﬁciency is in fact remarkable.

To evaluate the efﬁciency of different types of

combination approaches, we have built an emotion

classiﬁcation scheme that combined classiﬁers based

on different features according to the framework il-

lustrated in Fig. 4. In these systems, SVM classiﬁers

are employed and each feature set S

was used to train

classiﬁers c

i j

, which produces measurement vector y

i j

corresponding to the probability of inputs to belong

to different classes C

. Vectors y

i j

are then used to

perform the combination to get the classiﬁcation re-

sults according to section 3.2. The following com-

bination methods have been implemented and com-

pared to our approach based on the Theory of Evi-

dence: maximum-score, minimum-score, mean-score

and majority-score. The results obtained are shown in

Table 1.

These results show that fusion with the Theory of

Evidence is more efﬁcient with an average percent-

age of 54.7% compared to fusion with mean-score,

min-score, max-score, and majority voting, according

to following equation (Robert Snelick, 2005) , which

proves the ability of the Theory of Evidence to com-

EVALUATION OF FEATURES AND COMBINATION APPROACHES FOR THE CLASSIFICATION OF

EMOTIONAL SEMANTICS IN IMAGES

355

Figure 4: The classiﬁcation scheme, which combined different classiﬁer outputs.

Table 1: The accuracy for four classes, performed by the different fusion methods.

Max

score

Min

score

Mean

score

Majority

voting

Proposed

combined

I 56.32 51.25 50.87 55.20 58.97

II 53.08 50.24 52.51 53.36 55.00

III 52.67 48.31 50.43 47.37 51.76

IV 50.34 51.42 53.67 52.30 53.08

CR 51.87 50.31 53.10 52.05 54.70

Figure 5: The dimensional emotion model is used to deﬁne

4 classes of emotions corresponding to the 4 quadrants.

bine the different sources of information and to ex-

ploit their complementarities.

Z(i) =

∑

n=1

.∀i. (6)

Z(i) = min(y

, y

, ..., y

), ∀i. (7)

Z(i) = max(y

, y

, ..., y

), ∀i. (8)

Z(i) = argmax(y

, y

, ..., y

), ∀i. (9)

where y

represent the i

measurement of classiﬁer

Figure 6: The average correct classiﬁcation rate obtained

using individual feature set.

5 CONCLUSIONS

We have investigated in this work the efﬁciency of

different types of features and combination of clas-

siﬁers for visual emotion recognition in realistic im-

ages. Experiments on IAPS dataset have brought

to the fore that texture features as well as harmony

and dynamism features carry important information

for the emotional semantics classiﬁcation purpose.

Moreover, the proposed fusion approach based on

the Theory of Evidence has achieved an encouraging

classiﬁcation rate, certainly due to its ability to repre-

sent uncertainty and ambiguity of emotions.

VISAPP 2011 - International Conference on Computer Vision Theory and Applications

356

ACKNOWLEDGEMENTS

This work is partly supported by the French ANR un-

der the project Omnia ANR-07-MDCO-009.

REFERENCES

A. W. M Smeulders, Marcel Worring, S. S. A. G. R. J.

(2000). Content-based image retrieval: the end of the

early years. IEEE Transactions on Pattern Analysis

and Machine Intelligence, 22(12):1349–1380.

C. Columbo, A. Del Bimbo, P. P. (1999). Semantics in vi-

sual information retrieval. IEEE Multimedia, 6(3):38–

53.

C.-H. Chan, G.-J.-F. J. (2005). Affect-based indexing and

retrieval of ﬁlms. ACM Multimedia, pages 427–430.

C.-T. Li, M.-K. S. (2007). Emotion-based impression-

ism slideshow with automatic music accompaniment.

ACM Multimedia, pages 839–842.

Dempster, A. P. (1968). A generalization of bayesian infer-

ence. Journal of the Royal Statistical Society, Series

B, 30:205–247.

Itten, J. (1961). The art of color. Otto Maier Verlab, Ravens-

burg, Germany.

J. Z.Wang, J. L. (2001). Simplicity: Semantics-sensitive

integrated matching for picture libraries. IEEE Trans-

actions on Pattern Analysis and Machine Intelligence,

23(9):947C963.

K. Kuroda, M. H. (2002). An image retrieval system by

impression words and speciﬁc object names iris. euro

computing, 43:259–276.

P. Dunker, S. Nowak, A. B. C. L. (2009). Content-based

mood classiﬁcation for photos and music. ACM MIR,

pages 97–104.

P. J. Lang, M. M. Bradley, B. N. C. (1999). The iaps: Tech-

nical manual and affective ratings. Tech. Rep. GCR in

Psychophysiology.

Picard, R. W. (1997). Affective computing. MIT Press,

Cambridge.

Q. Wu, C. Zhou, C. W. (2005). Content-based affective

image classiﬁcation and retrieval using support vector

machines. ACII, pages 239–257.

R. Datta, J. Li, J. Z. W. (2005). Content-based image re-

trieval: approaches and trends of the new age. ACM

Workshop MIR, Singapore, pages Nov. 11–12.

Robert Snelick, Umut Uludag, A. M. M. I. A. J. (2005).

A survey of affect recognition methods: audio, vi-

sual and spontaneous expressions. IEEE Transac-

tions on Pattern Analysis and Machine Intelligence,

27(3):450–455.

S. Wang, X. W. (2005). Emotion semantics image retrieval:

a brief overview. ACII, pages 490–497.

Shafer, G. (1976). A mathematical theory of evidence.

Princeton University Press.

Smets, P. (1990). The combination of evidence in the trans-

ferable belief model. IEEE Transactions on Pattern

Analysis and Machine Intelligence, 12(5):447–458.

V.Yanulevskaya, J.C.Van Gemert, e. a. (2008). Emotional

valence categorization using holistic image features.

ICIP, pages 101–104.

W. Wang, Q. H. (2008). A survey on emotional semantic

image retrieval. ICIP, pages 117–120.

W. Wei-ning, Y. Ying-lin, J. S.-m. (2006). Image retrieval

by emotional semantics: A study of emotional space

and feature extraction. IEEE ICSMC, 4.

Z. Zeng, M. Pantic, G. I. R. T. S. H. (2009). A survey of

affect recognition methods: audio, visual and spon-

taneous expressions. IEEE Transactions on Pattern

Analysis and Machine Intelligence, 31(1):39–58.

EVALUATION OF FEATURES AND COMBINATION APPROACHES FOR THE CLASSIFICATION OF

EMOTIONAL SEMANTICS IN IMAGES

357