Uncertainty and Integration of Emotional States in e-Learning

Doctoral Consortium Paper

Grzegorz Brodny

Department of Software Engineering, Gdansk University of Technology, Narutowicza Str. 11/12, 80-233 Gdansk, Poland

Keywords: e-Learning, Emotion Recognition, Late Fusion, Early Fusion, Emotion Representation Models.

Abstract: One of the main applications of affective computing remains supporting e-learning process. Therefore, apart

from human mentoring, automatic emotion recognition is also applied in monitoring learning activities.

Specific context of e-learning, that happens at home desk or anywhere (mobile e-learning), adds additional

challenge to emotion recognition, e.g. temporal unavailability and noise in input channels. Nowadays,

affective computing has provided many solutions for emotion recognition. There are numerous emotion

recognition algorithms which differ on input information channels, representation emotion model on output

and classification method. The most common approach is to combine the emotion information channels.

Using multiple input channels proved to be the most accurate and reliable, however there is no standard

architecture proposed for this kind of solutions. This paper presents outline of the author's PhD thesis, which

concentrates on integration of emotional states in educational applications with consideration of uncertainty.

The paper presents state of art, the architecture of integration, performer experiments and planned

simulations.

1 RESEARCH PROBLEM

One of the main applications of affective computing

remains e-learning processes support. Based on

many studies from the fields of pedagogy, it has

been confirmed that emotions have a crucial impact

on learning and e-learning e.g. (Binali et al. 2009)

(Landowska 2013). Therefore, apart from human

mentoring, automatic emotion recognition is also

applied in monitoring learning activities. Although

there are some ethical considerations regarding

revealing affective states of a learner to a teacher,

there are affective educational systems that were

already built. The specific context of e-learning, that

happens at home desk or practically anywhere

(mobile e-learning), adds another challenge to

emotion recognition, e.g. temporal unavailability or

noisiness of input channels.

Nowadays, there are numerous emotion

recognition algorithms that differ on input

information channels, an emotion representation

model as output and recognition method. The most

important classification is based on input channels,

as some are not always available in the e-learning

environment. A recognition algorithm might use one

or a combination of the following channels:

 visual information from cameras,

 body movements mattes,

 textual input of a user,

 voice signals,

 standard input devices usage,

 physiological measurements.

All of above listed input channels might be

applied in the monitoring e-learning activities, but

some of them are task- or user-dependent in e-

learning context (Landowska et al. 2017)

(Landowska et al. 2016).

As all emotion recognition channels are

susceptible to some noise, the most common

approach is to combine the channels (multimodal

recognition) (Poria et al. 2017). This approach

requires integration of data or results from different

sources. There are two approaches to integration:

early and late fusion methods. Both have some

disadvantages and the challenge of multimodal

integration constitutes the author’s research

problem. The challenge could be decomposed into

the following subproblems:

(1) missing standard emotion representation

model; There are many models of emotion

representation and unfortunately, there is no

standard nor the most frequently used one. As a

Brodny G.

Uncertainty and Integration of Emotional States in e-Learning - Doctoral Consortium Paper.

In Doctoral Consortium (CSEDU 2017), pages 21-27

result, each emotion recognition algorithm and each

solution uses another (and sometimes unique)

emotion representation model (Gunes and Schuller

2013).

(2) discrepancies between results (recognized

emotional states) obtained from different input

channels and algorithms; Assuming the same

investigation, time and a person, some

experimenters observed huge discrepancies between

recognized emotional states among different

algorithms. Not only solutions based on diverse

input channels exhibit the discrepancy, but also the

same channel recorded twice (e.g. two camera

locations) results in different results (Landowska et

al. n.d.) (Landowska and Miler 2016).

(3) uncertainty of results, that differs among

algorithms and contexts; Some solutions return a

prediction of an emotional state even if conditions

for prediction were suboptimal (large camera angle,

insufficient lighting, other noise). At the same time,

most of the emotion recognition tools do not report

the reliability of the predicted state. Moreover, the

reported emotional state might be provided using

diverse scales and precision.

(4) disadvantages of integration methods; The

early fusion method is non-resistant for periodically

unavailable channels. The late fusion method is

non-resistant for incompatible emotion

representation models.

The author’s research concentrate on the

integration of emotional states in educational

applications with consideration of uncertainty. These

objectives are described in detail in section 2.

2 OUTLINE OF OBJECTIVES

Author’s research concentrates on the following

objective: prepare a method of integration of

recognized emotional states, taking into account

uncertainty. This objective might be decomposed to

the following subproblems:

a) selection and application of appropriate

methods of mapping between emotion

representation models, especially to models

that make sense in the e-learning context (the

mapping increases measurement error);

b) method for calculation of uncertainty factor

for different input channels;

c) integration method based on late fusion,

including uncertainty;

d) post-hoc evaluation of emotion recognition,

based on efficiency in a specific context;

e) architecture supporting the late fusion of

emotion recognition results provided by

algorithms from diverse vendors.

3 STATE OF THE ART

This section is divided into 3 parts. The first part

presents emotion representation models and

approaches to mapping between them. The second

provides a review of research about applying

affective computing methods in the e-learning, and

the latter reviews existing methods of integration of

emotional states.

3.1 Emotion Models and Approaches of

Mappings

There are multiple emotion representation models

and no standard model has been established so far

(Valenza et al. 2012). Models fall into three

categories: (1) categorical, (2) dimensional and (3)

componential. (1) Categorical models are the most

intuitive for human, but not for the computers

(Gunes and Schuller 2013). They present each

emotion as a combination of labelled emotional

states. An example of those is a popular Ekman’s

model, that combines basic emotions: joy, fear,

anger, surprise, sadness and disgust to represent

complex emotional states (Scherer and Ekman 1984)

(2) Dimensional models (usually two- or three-

dimensional) represent emotions as compound of

bipolar entity for example: valence (pleasant vs

unpleasant), arousal (relaxed vs arousal) and

dominance/power/control (submissiveness vs

dominance) (Gunes and Schuller 2013). Emotions in

these models are represented as a point in 2D, 3D or

more dimensional space (Grandjean et al. 2008).

These models are less intuitive for humans but more

easy to be computed by applications. To be

understandable for people, the points require some

mapping to emotion labels. (3) Componential

models of emotions are based on appraisal theory.

The models are more complex and concentrate on

how emotions are generated (Fontaine et al. 2007)

(Grandjean et al. 2008) (Ortony et al. 1988).

Some authors claim that categorical models

could be mapped to dimensional ones and vice

versa. Some mappings are lossless (Gunes and

Schuller 2013).

The researchers use a few mappings which are

mainly derived from correlation coefficients. For

example mapping between a big five model

(categorical) and PAD (3D) was proposed as a

function (Mehrabian 1996a) (Shi et al. 2012). The

mapping was created by calculating a correlation

between the factors from both models and the

correlation coefficients were used as weights in the

mapping functions. The next case was mapping

between PAD and the Ekman model (categorical)

(Shi et al. 2012), which was created in an analogical

way. Mehrabian and Russel calculated correlation

coefficients between PAD and models of personality

(Mehrabian 1996b).

The next example is a mapping of the emotion

labels to dimensional space proposed by (Hupont et

al. 2011). The mapping provides weights that are

derived from a database of coordinates from

dimensional space to each label. The model can be

used directly (e.g. in sentiment analysis). Another

method of mapping was used by (Gebhard 2005) in

OCC (componential model) to PAD (3D). In this

mapping, the points from PAD space were created

for each OCC parameters (24) as a label (e.g. anger,

fear, distress). The PAD’s coordinates were used as

weights to mapping functions.

Based on the above review, universal models are

dimensional ones, but the final models (which are

used by applications) might require adjustments.

3.2 Affective Computing Methods in

e-Learning

While applying affective computing methods in e-

learning, one might not require the full spectrum of

emotions. The most important emotional states, from

educational perspective, include: frustration,

boredom and flow/engagement (Binali et al. 2009)

(Kołakowska et al. 2013) (Landowska 2013).

A few virtual systems have virtual characters,

that deal with or visualize affect (Landowska 2008).

One of those is the Virtual Human Project that is an

educational platform for students with two avatars (a

student and a teacher), each with a personality

profile (different for each avatar) (Gebhard 2005). In

this project OCC model of emotions was used,

combined with PAD model for mood and the big

five model for personality traits.

Another example of using affective computing in

education is an Intelligent Tutoring System (ITS)

Eve. Eve was an affect-aware tutoring system, which

recognizes (in Ekman states) and expressed affect

while teaching mathematics (Alexander et al. 2006).

3.3 Methods of Integration Emotional

States

The methods of detecting emotional states could be

categorized into four categories: (1) single algorithm

(without integration), (2) early fusion, (3) late

fusion, (4) hybrid fusion.

3.3.1 Single Algorithm

Nowadays, many affective solution use only one

input channel and one emotion recognition

algorithm based on that (Hupont et al. 2011). These

solutions are very specific, dedicated for one

problem and often reveal only one emotion e.g. a

positive state, stress or lack-of-stress (Landowska

2013) (Chittaro and Sioni 2014).

3.3.2 Early Fusion (Called Also Feature-

Level Fusion)

The early fusion method uses data from multiple

input channels that are combined during the data

collection step into one input vector (before

classification). All data types are processed at the

same time. This method usually provides high

accuracy (Hupont et al. 2011), but becomes more

challenging as the number of input channels

increases. The main challenges in this method

include:

a) Time synchronization for data from each

channel (resulting in incomplete feature

vectors);

b) Learning a classifier with vectors containing

missing values (when channels are

inaccessible);

c) Large feature vectors when fusing many

channels, (feature selection techniques are

used to maximize the performance of the

classifier) (Gunes and Piccardi 2005);

d) Adding a new channel/module often requires

retraining and/or rebuilding all solutions (low

scalability).

3.3.3 Late Fusion (Also Called Decision-

Level Fusion)

In late fusion, in contrast to early fusion, integration

of data is performed during decision step. This

method is based on an independent processing of

data from each input channel and training multiple

classifiers. Each of the classifiers provides one

hypothesis on emotional state. The integration

function provides a final estimate of emotional state

based on partial results. This method provides more

scalability than the early fusion because a new

module is just one more result to integrate. The main

challenges in the method include:

a) Time synchronization for data from diverse

modules – integrate a subset of results or wait

for all modules to provide a hypothesis?

b) Mapping output from modules to one, final

output model.

3.3.4 Hybrid Fusion

The hybrid methods are a combination of late and

early fusion. Each module has a separate classifier as

in the late fusion but also has access to input data

from all input channels. The main advantage of this

method is preservation of algorithms independence,

while still using combined information from

multiple channels. However, the challenges remain

more less the same as in the late fusion method.

3.3.5 Summary of Fusion Method

The approach used in this research is a late fusion

method, with a potential extension to a hybrid

fusion. The early fusion method is difficult to

maintain and extend with new observation channels.

Moreover, the early fusion approach is not possible

with the use of existing off-the-shelf solutions,

including commercial software. Late or the hybrid

fusion method supports integration, exchange and

modifiability of modules for emotion recognition

("black box" approach).

4 METHODOLOGY

The research methodology in the presented PhD

work is in general based on experiments and

simulations. To compare the algorithms accuracy

experiments were carried with different input

channels. Experiments were used also to collect the

data for simulations. Simulations can be carried

offline, using the real data from experiments and

data available in emotional databases (Cowie et al.

2005).

4.1 Experiments

Three experiments were performed so far.

4.1.1 The Experiment 1. Learning via

Playing a Educational Game

The goal of the experiment was to investigate

emotional states while learning using an educational

game (Landowska and Miler 2016). The game was

about managing IT projects and participants were

computer science students. The participants were

asked to play a game several times and both their

emotional state as well as educational outputs were

measured.

The emotion recognition channels in these

experiments were: facial expressions, self-report,

physiological signals. Details of the experiment were

described in (Landowska and Miler 2016). This PhD

work will use the data from the experiment to

perform off-line simulations of the proposed

integration methods.

4.1.2 The Experiment 2. Learning with a

Moodle Course

The aim of the experiment was to investigate

emotional states while using a Moodle course with

diverse activities. Three Moodle activity types were

employed: watching a lecture, solving a quiz and

adding a forum entry on a subject pre-defined by a

teacher. In this experiment, the simultaneous 4

cameras recordings were used for facial expression

analysis. Self-report, physiological measurements

and sentiment analysis of textual inputs were also

employed (Landowska et al., 2017).

4.1.3 The Experiment 3. Learning with on-

Line Tutorials

The aim of the experiment was to investigate

emotional states while learning using video tutorials

of Inkscape tool (Landowska et al. n.d.).

The emotion recognition channels in these

experiments were: facial expressions recording (2

cameras), keystroke dynamics, mouse movements

patterns, opinion-like text and self-report.

4.1.4 Summary of Experiments

After carrying the three experiments some general

observations were made:

 a location of the camera is one of the crucial

factors influencing recognized emotional

states.

 availability of some emotion observation

channels is task-dependent (e.g. sentiment

analysis depends on writing tasks) and/or

user-dependent,

 physiological signals provide information only

on arousal and not on the valence of an

emotional state,

 self-report is the most dependent on human

will and should be confirmed with another

observation channel;

 peripherals (mouse/keyboard) usage patterns

reveal information on affect with relatively

low granularity and accuracy and should be

combined with other observation channels.

These observations confirmed the assumption on

multichannel observation having a potential for

improving accuracy of emotion recognition.

4.2 Simulations

After collecting data from experiments and from

available databases a set of simulations will be

carried out. This section was divided into a few

parts. The first one presents the method of

integration. The second part provides preliminary

simulation plans.

4.2.1 Method of Integration

The method of emotional states integration used in

this research was a part of Emotion Monitor, which

was described in (Landowska 2015). The concept of

the stand assumed combining multiple modalities

used in emotion recognition in order to improve

accuracy of affect classification. A model of

Emotion Monitor architecture was presented in

Figure 1. An area covered by this PhD research is

mark with a dotted line. Integrated algorithms are

treated as in "black-box" approach and can use the

early fusion mechanism on algorithm level (as in

hybrid fusion method). Algorithms get input data

from emotion observation channels and provide the

hypothesis using some emotion representation

model. In the next step, each of the algorithms'

hypothesis must be mapped to one common emotion

model (if needed). Next, the integration function

combines partial hypotheses to an integrated state,

which could be sent to the application as a final

decision on recognized emotional state.

The proposed methods and architecture aims at

addressing the problem 2e defined in the objectives

section.

4.2.2 Preparation of Simulations

Before simulations could be started the following

prerequisites must be met:

a) Collect the algorithms to integrate, which

differ input channels and using different

emotion representation models.

b) Prepare an integration function that can be

tested in simulations.

c) Prepare method of calculating uncertainty

factor for different input channels.

d) Collect the data with labels in different

emotion models (preferably labels in two or

more models in one database).

4.2.3 Simulation 1 – Compare Accuracy of

the Algorithms

The first simulation helps to constitute a ranking of

algorithms with accuracy rates.

A plan is to compare the algorithms, using

exactly the same input data, from the same

experiment and the same channels. Input data should

be labelled with emotions in emotion models being

outputs of tested algorithms (without mapping).

The results from these simulations are required

for the next simulations.

Figure 1: Conceptual model of multimodal emotion recognition fusion and the scope of integration solution.

4.2.4 Simulation 2 – Evaluate a

Measurement Error of Mapping

The aim of this simulation is a selection of the

optimal mapping models and evaluation of error

imposed by the transformation.

Data: Can be different for each algorithm.

Preferably using two or more emotion model labels

in one input set. The same data as Simulation 1

might be used if additionally labelled.

Plan: For each algorithm, for each set from

datasets: (1) Provide input data to an algorithm, (2)

get results (an emotional state estimate), (3) perform

mappings. (4) Calculate the accuracy of results after

mapping. (5) Calculate the mapping error –

difference between accuracy obtained from

simulation 1 and after mapping.

The simulation might allow choosing the best

mapping method.

4.2.5 Simulation 3 – Evaluate the

Uncertainty

The aim of this simulation is verification of a

method for calculating uncertainty. This simulation

should address the problem 2b.

Data: Input data from one of the experiments, but

using different settings e.g. different camera

locations.

Plan: For each algorithm or at least one for each

channel: (1) Provide an algorithm with data from

different sources. (2) Calculate the accuracy for each

source independently with uncertainty factor. (3)

Compare accuracy and uncertainty factor for each

source and integrated result.

If a function of calculating uncertainty is correct,

an integrated result is expected to be less uncertain

and more accurate.

4.2.6 Simulation 4 – Evaluate the Function

of Integration

The aim of the simulation is to verify the integration

function using uncertainty factor. This simulation

should answer to the problem 2c.

Data: The same data as in simulation 3.

Plan: Integrate an emotional states from

simulation 3. Compare accuracy of integrated states

with states from partial algorithms.

5 EXPECTED OUTCOME

The main expected outcome of Author’s PhD thesis

is preparing a method of integration of recognized

emotional states, taking into account uncertainty.

This method should improve the accuracy of

emotion recognition. It should also allow applying

some of the off-the-shelf software to different

contexts, especially concentrating on e-learning. As

an expected long-term result, the integration method

should be applicable to e-learning platforms and

educational games, which aim at supporting learners

in maintaining attention and positive attitude in

educational processes.

6 STAGE OF THE RESEARCH

This paper presented the outline and selected details

from Author’s PhD thesis. This section summarizes,

which parts of research were already done, which

are in progress and which are not started yet.

The first version method of integration based on

late fusion, including uncertainty, was already

implemented. It was implemented in C# and passed

the basic tests. It’s waiting for validation in

Simulation 4.

All of three planned experiments were

completed. Some data from external emotion

databases are obtained, but author is still looking for

more data in the next steps.

The architecture supporting the late fusion of

emotion recognition algorithm provided by diverse

vendors was developed and the first implementation

was already tested, revealing some potential for

improvement. The second implementation of the

architecture is in progress. A paper about the

architecture is prepared.

Some algorithms were collected and prepared

(implementing wrappers) to integrate with the

architecture. The method for calculation of

uncertainty factor for different input channels is

designed and currently under development.

Some mappings between emotion models were

collected, but the list is not closed yet. Simulations

regarding mapping accuracy are planned as the first

step to follow.

ACKNOWLEDGEMENTS

This work was supported in part by Polish-

Norwegian Financial Mechanism Small Grant

Scheme under the contract no Pol-

Nor/209260/108/2015 as well as by DS Funds of

ETI Faculty, Gdansk University of Technology.

REFERENCES

Alexander, S., Sarrafzadeh, A. & Hill, S., 2006. Easy with

eve: A functional affective tutoring system. and

Affective Issues in ITS. 8

Binali, H.H., Wu, C. & Potdar, V., 2009. A new

significant area: Emotion detection in E-learning using

opinion mining techniques. In 2009 3rd IEEE

International Conference on Digital Ecosystems and

Technologies, DEST ’09. IEEE, pp. 259–264.

Chittaro, L. & Sioni, R., 2014. Affective computing vs.

affective placebo: Study of a biofeedback-controlled

game for relaxation training. International Journal of

Human-Computer Studies.

Cowie, R., Douglas-Cowie, E. & Cox, C., 2005. Beyond

emotion archetypes: Databases for emotion modelling

using neural networks. Neural Networks, 18(4),

pp.371–388.

Fontaine, J.R.J. et al., 2007. The world of emotions is not

two-dimensional. Psychological Science, 18(12),

pp.1050–1057.

Gebhard, P., 2005. ALMA – A Layered Model of Affect.

AAMAS ’05: Proceedings of the 4th international joint

conference on Autonomous agents and multiagent

systemsent systems, pp.29–36.

Grandjean, D., Sander, D. & Scherer, K.R., 2008.

Conscious emotional experience emerges as a function

of multilevel, appraisal-driven response

synchronization. Consciousness and Cognition.

Gunes, H. & Piccardi, M., 2005. Affect recognition from

face and body: early fusion vs. late fusion. Systems,

Man and Cybernetics, 2005 IEEE International

Conference on, 4, pp.3437–3443.

Gunes, H. & Schuller, B., 2013. Categorical and

dimensional affect analysis in continuous input:

Current trends and future directions. Image and Vision

Computing, 31(2), pp.120–136.

Hupont, I. et al., 2011. Scalable multimodal fusion for

continuous affect sensing. In IEEE.

Kołakowska, A., Landowska, A. & Szwoch, M., 2013.

Emotion recognition and its application in software

engineering. (HSI), 2013 The 6th.

Landowska, A., 2013. Affective computing and affective

learning--methods, tools and prospects. EduAction.

Electronic education magazine, 1(5), pp.16–31.

Landowska, A., 2015. Emotion monitor - Concept,

construction and lessons learned. In Proceedings of the

2015 Federated Conference on Computer Science and

Information Systems, FedCSIS 2015. pp. 75–80.

Landowska, A. Brodny, G., 2017. Postrzeganie

inwazyjności automatycznego rozpoznawania emocji

w kontekście edukacyjnym. EduAction. Electronic

education magazine, pp.1–18., Submitted.

Landowska, A., 2008. The role and construction of

educational agents in distance learning environments.

In Proceedings of the 2008 1st International

Conference on Information Technology, IT 2008.

Landowska, A., Brodny, G. & Wrobel, M.R., Limitations

of emotion recognition from facial expressions in e-

learning context. Submitted.

Landowska, A. & Miler, J., 2016. Limitations of Emotion

Recognition in Software User Experience Evaluation

Context. In Proceedings of the 2016 Federated

Conference on Computer Science and Information

Systems. pp. 1631–1640.

Mehrabian, A., 1996a. Analysis of the Big-five

Personality Factors in Terms of the PAD

Temperament Model. Australian Journal of

Psychology, 48(2), pp.86–92.

Mehrabian, A., 1996b. Pleasure-Arousal . Dominance : A

General Framework for Describing and Measuring

Individual Differences in Temperament. Current

Psychology, 14(4), pp.261–292.

Ortony, A., Clore, G.L. & Collins, A., 1988. The

Cognitive Structure of Emotions,

Poria, S. et al., 2017. A review of affective computing:

From unimodal analysis to multimodal fusion.

Information Fusion.

Scherer, K.R. & Ekman, P., 1984. Approaches to emotion,

L. Erlbaum Associates.

Shi, Z. et al., 2012. Affective transfer computing model

based on attenuation emotion mechanism. Journal on

Multimodal User Interfaces, 5(1–2), pp.3–18.

Valenza, G., Lanata, A. & Scilingo, E., 2012. The role of

nonlinear dynamics in affective valence and arousal

recognition. IEEE transactions on.