Enhancing Recommender Systems for TV by Face Recognition

Toon De Pessemier, Damien Verlee and Luc Martens

iMinds, Ghent University, Technologiepark 15, B-9052 Ghent, Belgium

Keywords:

Recommender System, Face Recognition, Face Detection, TV, Emotion Detection.

Abstract:

Recommender systems have proven their usefulness as a tool to cope with the information overload problem

for many online services offering movies, books, or music. Recommender systems rely on identifying individ-

ual users and deducing their preferences from the feedback they provide on the content. To automate this user

identiﬁcation and feedback process for TV applications, we propose a solution based on face detection and

recognition services. These services output useful information such as an estimation of the age, the gender,

and the mood of the person. Demographic characteristics (age and gender) are used to classify the user and

cope with the cold start problem. Detected smiles and emotions are used as an automatic feedback mechanism

during content consumption. Accurate results are obtained in case of a frontal view of the face. Head poses de-

viating from a frontal view and suboptimal illumination conditions may hinder face detection and recognition,

especially if parts of the face, such as eyes or mouth are not sufﬁciently visible.

1 INTRODUCTION

Recommender systems are software tools and tech-

niques providing suggestions for items to be of inter-

est to a user (Resnick and Varian, 1997). By ﬁltering

the content and selecting the most appropriate items

according to the user’s personal preferences, recom-

mender systems can help to overcome the problem of

information overload. The suggestions provided are

aimed at supporting their users in various decision-

making processes, such as what items to buy, what

movies to watch, or what news to read.

For years, traditional recommender systems are

very successful for desktop internet applications.

With the growing popularity of mobile devices and

their increased connectivity, recommender systems

have expanded their area of application to the mo-

bile platform. Additional information about the

user, which is accessible through the camera, micro-

phone, or sensors such as gyroscope and GPS, al-

low to further improve the accuracy of the recom-

mendations and adjust them to the current user con-

text (De Pessemier et al., 2014b). The accessibility

and popularity of operating systems such as Android

further stimulate the development of recommenda-

tion tools. A similar evolution can be expected for

the television platform, on which smart TVs (running

Android) are becoming more popular. Recommender

systems for smart TVs can assist users in selecting the

TV content that matches the user’s preferences best.

For the television platform, additional challenges

for recommender systems emerge, such as the lim-

ited interaction possibilities (a TV viewer is “leaning

back” in the sofa, using only a remote control as a

means of interaction), the undesirability of user ac-

counts (the television is a shared devices on which

users are not used to log in), and the consumption of

content in group (people often watch television to-

gether). To cope with the problem of limited user

feedback and user identiﬁcation, we propose a rec-

ommender system that uses face detection and recog-

nition. An overview of existing research related to

the face recognition problem is provided in Section 2.

Section 3 explains how and which face detection and

recognition services have been used in our recom-

mender system. Section 4 elaborates on the advan-

tages of face recognition for recommender systems.

The cold start problem can be alleviated by recogniz-

ing demographic characteristics of the user such as

age and gender. Recognizing emotions of users dur-

ing content watching can be used to derive implicit

feedback for the content. For group recommenda-

tions, identifying the people in front of the TV can be

automated by using face recognition. In Section 5, the

used face recognition services are evaluated. Details

are provided about the accuracy of estimating peoples

age, recognizing gender, and detecting emotions from

a picture of a person’s face. Furthermore, the accu-

racy of the face detection process is evaluated for pic-

tures with various poses of the head and with different

Pessemier, T., Verlee, D. and Martens, L.

Enhancing Recommender Systems for TV by Face Recognition.

In Proceedings of the 12th International Conference on Web Information Systems and Technologies (WEBIST 2016) - Volume 2, pages 243-250

ISBN: 978-989-758-186-1

243

illumination conditions. Finally, Section 6 draws con-

clusions and points to future work.

2 RELATED WORK

Various solutions for tracking people and face recog-

nition have been proposed in literature. The Reading

People Tracker (Siebel, 2015) is an open source soft-

ware tool for tracking people in camera images for

visual surveillance purposes. It is often used for auto-

matic visual surveillance systems for crime detection

and prevention. The software is written in C++ and

therefore difﬁcult to integrate into an Android appli-

cation running on a smart TV.

TrackLab is a tool developed by Noldus for

recognition and analysis of spatial behavior (Noldus,

2015b). TrackLab facilitates the development of in-

teractive systems that respond in real-time to the loca-

tion or spatial behavior of subjects being tracked. The

collected data can be visualized by showing tracks on

a map or by heat maps of the aggregated location data

of multiple people. Statistics can provide insights into

the current position of a user, and when a user enters

or leaves a speciﬁc room (e.g. the TV room). Since

TrackLab is a software tool running on Windows with

minimum requirements of 1 GB hard disk space, 1

GB RAM, and a 1Ghz CPU, it is less suitable for in-

tegration into an Android Smart TV.

FaceR is a commercial service developed by Ani-

metrics (Animetrics, 2015) that offers a REST API

(Application Programming Interface) for face recog-

nition based on pictures. The FaceR service is able to

provide the coordinates, orientation, and pose of the

detected face. Compute-intensive tasks, such as im-

age analysis or face template generation, are handled

by the server layer, a set of clonable servers to ensure

scalability. The core business of this service is iden-

tity management and authentication with use cases in

law enforcement and commercial and consumer mar-

kets.

The FaceReader service is the facial expression

analysis tool of Noldus (Noldus, 2015a). FaceReader

can detect the position of mouth, eyes, and eyebrows.

This service can analyze six basic facial expressions

(emotions) and detect the gaze direction, head orien-

tation, and person characteristics (gender, race, age,

wearing glasses, etc.). An API is available to serve as

an interface between FaceReader and different soft-

ware programs using it, thereby facilitating the inte-

gration of the service.

Another service for emotion recognition is

EmoVu (Eyeris, 2015). Their deep learning based

technology enables to recognize emotions from fa-

cial micro-expressions. The recognition process can

handle pictures as well as videos as input. The ser-

vice outputs the coordinates and orientation of the

face, gender, an age category, and recognized emo-

tions with an intensity score.

Rekognition is a service that can recognize more

than gender, age, and emotions (Orbeus, 2015).

Rekognition has the ambition of recognizing con-

cepts, such as a party, the beach, a cat, the Golden

Gate Bridge, etc., from pictures as well. Although

the concept recognition is still under development, the

beta version of Rekognition can return the best ﬁve

guesses with conﬁdence scores.

A few studies have combined face detection and

recognition with recommender systems. Recognized

emotions can be used to automatically derive feed-

back for the content. This way, the topical relevance

of a recommended video has been predicted by an-

alyzing affective aspects of user behavior (Arapakis

et al., 2009). In addition, affective aspects can be

used to model user preferences. The underlying as-

sumption is that affective aspects are more closely re-

lated to the user’s experience than generic metadata,

such as genre (Tkalcic et al., 2010). The end goal

is to incorporate emotions into the recommendation

process as well. Recommending music according to

the user’s emotion is a typical use case (Kuo et al.,

2005). However, in this model for emotion-based mu-

sic recommendation, the user’s emotions are obtained

by querying them in stead of by automatic recogni-

tion.

3 FACE DETECTION AND

RECOGNITION

In this section, we describe the face detection and

recognition process of our recommender system for

Android smart TVs. Face detection and recognition

is performed in two subsequent phases.

In phase 1, human faces are detected using the

face detection mechanism of the Android API (Meier,

2015) (Android API level 14). Android’s face detec-

tion mechanism is commonly used in applications that

use the camera to focus on people’s face. It can detect

up to sixteen faces simultaneously. Figure 1 shows a

screenshot of the application with rectangles indicat-

ing the detected faces in an image originating form

the camera. These rectangles are only shown in the

test version of the developed application. In the ﬁ-

nal version of the application, face detection is per-

formed as a background process, without bothering

the end-users. A face detection listener is coupled to

the Android camera object to check continuously who

WEBIST 2016 - 12th International Conference on Web Information Systems and Technologies

244

is in front of the TV. If one or more faces are detected

by the Android system, the listener is notiﬁed and a

picture is taken. The camera focus is automatically

adjusted to the detected faces.

In phase 2, the taken pictures are used as input

for the face recognition process. In our implementa-

tion we use two different face recognition services:

Face++ and SkyBiometry. Face++ (Face++, 2015)

is a real-time face detection and recognition service.

The results of the Face++ recognition process are the

positions of the faces with detailed X,Y coordinates

for the eyes, nose, and mouth. Besides, the face

recognition service can detect glasses. More impor-

tant for our recommender system is the service’s es-

timation of the person’s age, gender, and race, to-

gether with a conﬁdence value for each attribute. An-

other interesting outcome of the service is the degree

to which the subject smiles with an associated con-

ﬁdence value. Face++ stores the results of the face

recognition processes in a database to compare future

recognition requests. If a new face recognition re-

quest shows similarities with a previously recognized

face, a similarity indicator is specifying the resem-

blance. If this resulting similarity indicator is above

a certain threshold, our application assumes that this

person is a returning user and therefore already regis-

tered in the system.

SkyBiometry (SkyBiometry, 2015) is a service

very similar to Face++ but uses a different computer

vision algorithm. It is a cloud based face detection

and recognition service that is available through an

API. The service is able to detect multiple faces at

different angles in a picture and also provides the lo-

cation of the eyes, nose, and lips. The service makes

an assessment of the presence of glasses (dark glasses

or not), the fact that the person is smiling and the lips

are sealed or open, whether the person’s eyes are open

or not, the person’s gender, and the person’s mood

(e.g., happy, sad, angry, surprised, disgusted, scared,

neutral). For each of these attributes, a percentage is

indicating the conﬁdence value of the estimation. The

age of a person is speciﬁed by a point estimator. Faces

already known by the service can be recognized.

For an optimal face detection and recognition, a

picture is taken using the camera of the smart TV and

sent for analysis to these two services every ﬁfteen

seconds. The big advantage of using two face de-

tection and recognition services, using different algo-

rithms, is the increased accuracy by combining them.

In case the two services do agree, the results can be

used with a high degree of certainty. If they do not

agree, one of them is chosen (Section 5) or a new pic-

ture is send for reanalysis.

This way our application enables automatic au-

thentication of users in front of the TV. To provide

users feedback on this authentication process, the

recognized persons are shown in the user interface.

Therefore, the captured picture is cropped, so that

only the head is remaining, and used as proﬁle pic-

ture in the application (Figure 1, left side).

4 RECOMMENDER SYSTEM

4.1 Cold Start Solution

Traditional recommender systems suffer from the new

user problem, i.e. the issue that recommender sys-

tems cannot generate accurate recommendations for

new users who have not yet speciﬁed any preference.

To cope with the new user problem (also known as the

cold start problem), our system recommends videos

for new users based on the derived demographic char-

acteristics of the user, such as age and gender. These

user characteristics are matched to the demographic

breakdowns of the ratings for movies on IMDb.com.

Figure 2 shows an example of such a demographic

breakdown for the ratings of the movie “The Twilight

Saga: Breaking Dawn - Part 1”. For this movie, a

signiﬁcant difference in rating behavior of 1.8 stars is

visible for men and women. For speciﬁc age groups,

these differences may vary. For example, a differ-

ence of 2.3 stars is witnessed for people under 18,

whereas the age group of 45+ has a difference of 0.9

between men and women. The ratings of the spe-

ciﬁc age group and gender are selected based on the

user’s gender and age as estimated by the face recog-

nition service. Subsequently, the user’s preference for

a movie is predicted based on these ratings. As soon

as more detailed preferences of the user become avail-

able (e.g. through ratings), these are taken into ac-

count by using a standard collaborative ﬁltering sys-

tem. These collaborating ﬁltering (CF) recommenda-

tions are combined with the recommendations based

on demographics (demo) using a weighted average.

Rec

combined

= w

· Rec

+ w

demo

· Rec

demo

(1)

As more rating data of the user becomes available,

the collaborative ﬁlter is expected to become more ac-

curate and therefore the weight of the collaborative

ﬁlter (w

) is increasing while the weight of the de-

mographics (w

demo

) is decreasing. In Figure 1, these

recommendations are visualized on the right side of

the screen by means of the posters of the movies.

Posters and metadata of movies are retrieved using

the TMDb API (Themoviedb.org, 2015).

Enhancing Recommender Systems for TV by Face Recognition

245

Figure 1: A Screenshot of application illustrating the face detection mechanism by means of squares around the faces.

4.2 Implicit Feedback by Detecting

Emotions

Collecting these rating data is another issue of recom-

mender systems for TV. Because of the passive atti-

tude of the typical TV viewer, evaluating the content

by specifying a rating is often skipped. Another dif-

ﬁculty is the timing of the ratings. Whereas feedback

during video watching can be useful for suggesting

alternative TV content, ratings are typically collected

when content playback is ﬁnished.

Therefore, our system automatically collects feed-

back during content playback based on the detected

emotions of the viewers. Emotion detection is han-

dled by the used face recognition service (SkyBiome-

try, 2015) that keeps the complexity of this process in

the cloud. The extent to which users are engaged in a

TV show can be used to estimate their interests in the

show. So, expressed emotions are considered as user

engagement and used as a feedback mechanism. The

stronger the detected emotions, the stronger the user’s

engagement, and the stronger the feedback signal.

Since not all emotions expressed by the viewer

are provoked by the TV content, the user’s detected

emotions are compared to the emotions that can be

expected from the content. Therefore, a database is

created with typical emotions for different sections of

each content item. These typical emotions are calcu-

lated by aggregating the recognized emotions of all

users who watched the content item in the past. E.g.,

for a comical scene, ‘happy’ turns out to be the pre-

dominant emotion for most viewers, which is sub-

sequently considered as the typical emotion for this

scene. Some viewers may also be ‘surprised’ by the

scene, which can be considered as another emotion

evoked by the scene. If the user’s expressed emotions

are similar to the typical emotions as expressed by

the community, then these emotions are considered

as provoked by the content. Each emotion i detected

during scene watching is associated with a weight w

The value of this weight is determined by the ex-

tend to which this emotion is recognized (output face

recognition) and the number of people who expressed

this emotion during the scene.

The extent to which users show their emotions de-

pends on the character of the person (extrovert or in-

trovert). Some people clearly show their emotions

while others stay rather neutral all the time. There-

fore, the conﬁdence values of the face recognition

services regarding the detected emotions are normal-

ized for each individual user. This normalization is

based on all detected emotions of an individual user

over different content items. In the current imple-

mentation, the average conﬁdence value over all emo-

tions of the individual is used to normalize the inten-

sity (conﬁdence value) of a detected emotion of that

user. Each of these normalized conﬁdence values ex-

press user engagement in terms of a different emotion

WEBIST 2016 - 12th International Conference on Web Information Systems and Technologies

246

. Subsequently the recognized emotions are multi-

plied by the weights of the typically expected emo-

tions for the scene. This overall value of user en-

gagement is considered as an implicit feedback value,

Feedback

emo

Feedback

emo

∑

i=0

· w

(2)

4.3 Group Recommendations

Although most recommender systems are designed to

serve individual users, many activities, such as watch-

ing TV, are often group activities performed by mul-

tiple friends or family members together. If multiple

people are detected in the room watching television

together, the recommendations cannot be limited to

the preferences of one individual, but group recom-

mendations have to be generated.

One of the problems that comes with group rec-

ommenders is identifying the group for which recom-

mendations have to be generated. In many cases, only

one person is controlling the system (browsing con-

tent, making a selection, etc.), i.e. the person who

has the remote control in case of TV watching. A

traditional solution to express the group’s members is

asking the users to log on with their personal account.

However, on television sets people are not used to log

on and specifying passwords with a remote control is

a devious activity. Moreover, if a person is leaving the

room, this user should log off to specify the change in

group composition.

Therefore, our system has an automatic identiﬁca-

tion process of the users sitting in front of the TV. The

similarity indicator of the face recognition process is

used to decide if a person already utilized the recom-

mender system in the past. This way, preferences de-

duced from past watching behavior can be coupled to

the recognized user.

These group recommendations are suggestions for

the content items that are most suitable for the group

as a whole. Different solutions to aggregate the pref-

erences or recommendations of individuals and ﬁnd

the best items have been proposed (De Pessemier

et al., 2014a). In our implementation, the average

without misery strategy is used. The idea of this strat-

egy is to ﬁnd the optimal decision for the group, with-

out making some group members really unhappy with

this decision.

This strategy calculates P

Group,i

, the prediction of

a group’s interests for an item i, as follows. The inter-

ests of each individual member for an item can be cal-

culated by any recommendation algorithm. In our im-

plementation a traditional collaborative ﬁltering solu-

tion is used (Resnick and Varian, 1997). If the inter-

ests of one of the group members for i is predicted

to be below a certain threshold θ, the item i gets a

penalty and is excluded from the group recommenda-

tions (P

Group,i

= 0). This prevents that the group re-

ceives recommendations for items that will make one

of the group members unhappy. If the interests of all

group members for item i are predicted to be above

the threshold, the prediction of the group’s interests

for i is calculated as the average of the prediction of

each member’s interests, P

u,i

Group,i







0, if ∃ u ∈ Group : P

u,i

< θ

∑

u∈Group

u,i

Group

, otherwise

(3)

Figure 2: Demographic breakdown for the ratings of a

movie on IMDb according to the age and the gender.

5 EVALUATION

5.1 Gender and Age Estimation

To evaluate the estimation of gender and age as made

by the face recognition services, we used the database

of Minear and Park with 180 photos of people (Park

Aging Mind Lab, University of Texas, 2015). These

photos, published in grey-scale, are all frontal views

of people of different ages, gender, and race. All peo-

ple have a neutral face expression on the photos. Half

of the people are young adults (18 to 49 years old),

half of them are older adults (50 to 94 years old).

About half of them are men, half of them are women.

Enhancing Recommender Systems for TV by Face Recognition

247

Also within the various ethnic variations, the share

is about 50%-50% for men and women. For each of

these photos, the exact age and gender of the person

is available as ground truth. Since this database con-

tains no photos of people under 18 years old, 34 extra

photos originating from Google images were added to

ﬁll this void.

These photos are used to evaluate the accuracy of

the estimation of people’s age. The actual age of the

person is compared to the age range that is provided

by the Face++ recognition service. For each photo,

the error is calculated as the difference between the

predicted age range and the actual age. The age range

of the prediction is always between 5 and 10 years.

An average error of 2.68 years is obtained for 212

photos. (The face recognition service was unable to

predict the age in two photos.)

Figure 3 shows the average prediction error per

gender and age category. The most accurate results

are obtained for men between 30 and 60 years old,

with an error between estimated age range and actual

age that is below one. Also for people below 30 years

old, an accurate age estimation is obtained. In con-

trast to men, a lower accuracy is obtained for women

between 30 and 60 years old. This difference can be

explained by women who try to mask their age by

wearing make-up. Age estimation is the most difﬁ-

cult for people above 60 years old. For these people,

(small) age differences are less visible in the face.

Figure 3: The accuracy of the age estimation evaluated

based on the average difference (in years) between the ac-

tual age and the predicted age range, per gender and age

category.

5.2 Gender Recognition

The database of Minear and Park is also used to eval-

uate the gender recognition of the services. Both face

recognition services are combined with the aim of ob-

taining a higher classiﬁcation accuracy. If Face++ and

SkyBiometry agree and recognize the same gender,

there is a high probability that this is correct. Only in

a few cases in which Face++ and SkyBiometry agree,

but both have a low conﬁdence value, a wrong recog-

nition was made. If Face++ and SkyBiometry dis-

agree about the gender for a person above 18 years

old, the recognition of SkyBiometry is used since this

service obtained the most accurate results. For people

under 18 years old, the recognition of Face++ is the

most accurate and therefore used as ﬁnal recognition.

Table 1 shows the results of the gender recognition

by combining Face++ and SkyBiometry. An overall

accuracy of 92.52% is obtained for the combination

of services, whereas the individual services obtained

an accuracy of 75.23% (Face++) and 83.64% (Sky-

Biometry).

Table 1: Accuracy of the gender recognition.

Males Females

Recognized as male 98 6

Recognized as female 10 100

5.3 Emotion Recognition

For recognizing emotions in people’s faces, the Sky-

Biometry service is used. The output of Face++ con-

tains a value indicating the extent to which a per-

son is smiling, but recognizes no speciﬁc emotion,

such as surprised, disgusted, or scared. To evalu-

ate the emotion recognition, the Cohn-Kanade AU-

Coded Expression Database is used (Affect Analysis

Group - Research Lab at the University of Pittsburgh,

2015). This is a collection of photos of persons’ faces

with an emotional value (happy, sad, angry, surprised,

disgusted, scared or contempt) (Lucey et al., 2010).

The database contains no photos with a neutral face

expression. Photos with the emotion ‘contempt’ are

ignored since this emotion cannot be recognized by

SkyBiometry.

This database of photos is used to evaluate the

emotion recognition of the SkyBiometry service. For

each photo, the emotion as stated in the database

is compared to the recognized emotion. An over-

all accuracy of 83.88% is obtained. Figure 4 shows

the obtained accuracy of the emotion recognition per

emotion. Happy, disgusted, and scared are emotions

which are relatively easy to recognize in contrast to

sad, surprised, and angry.

5.4 Head Orientation

In ideal conditions, a front facing picture of the TV

viewer is available as input for the recognition pro-

cess. However in realistic scenarios (e.g. lying on the

sofa watching TV), the TV viewer’s head can be ob-

served under various orientations. Head poses deviat-

ing from a frontal view can introduce difﬁculties for

WEBIST 2016 - 12th International Conference on Web Information Systems and Technologies

248

Figure 4: The accuracy of emotion recognition.

the face detection process. Therefore, we evaluated

the inﬂuence of various head orientations on the accu-

racy of the face detection mechanisms of Face++ and

SkyBiometry by using photos of the Head Pose Im-

age Database (Gourier et al., 2015a). The Head Pose

Image Database contains 2790 face images of 15 per-

sons with variations of pan and tilt angles from -90 to

+90 degrees. For every person, 93 images (93 differ-

ent poses) are available. For each of these poses, the

face detection of the Face++ and SkyBiometry service

is tested.

Figure 5 shows the results for the Face++ service

for the various poses of the head. Blue dots stand

for detected faces. Red dots correspond to undetected

faces. In Figure 6, the results obtained with Sky-

Biometry are shown. Comparison of the two services

shows that Face++ is more sensitive to head poses de-

viating from a frontal view. Face++ is able to de-

tect the face for 24.73% of the different head poses,

whereas SkyBiometry can successfully detect the face

for 66.67% of the poses.

Face detection can only be successful if a consid-

erable part of the face is visible. The blue dots in

Figure 5 and 6 have the shape of a funnel: wide at

the top and narrow at the bottom. This means that for

variations with tilt angles, faces leaning backward can

better be detected than faces leaning forward.

5.5 Illumination Conditions

Another important factor inﬂuencing the accuracy of

face detection and recognition services is the illumi-

nation condition. To evaluate this inﬂuence, the pho-

tos of the Yale Face Database were used (Gourier

et al., 2015b). This database contains 5760 single

light source images of 10 people, each seen under 576

viewing conditions (9 poses x 64 illumination condi-

tions). The position of the illumination is denoted by

the azimuth and elevation of the single light source

direction.

Overall, Face++ is able to detect the face in

76.56% of the pictures whereas SkyBiometry has a

successful detection for 70.31% of the various illu-

mination conditions. Undetectable faces are mainly

due to insufﬁcient exposure of the face: a face that is

completely in the shade, eyes and mouth that are not

sufﬁciently visible, etc. These cases generally corre-

spond to an illumination with an azimuth outside the

range [−35

◦

, +35

◦

] or an elevation outside the range

[−50

◦

, +50

◦

] (measured from a frontal view). Using

such an illumination, a considerable part of the face is

not exposed by the light.

Figure 5: The accuracy of the face detection of Face++ un-

der different orientations of the head.

Figure 6: The accuracy of the face detection of SkyBiome-

try under different orientations of the head.

6 CONCLUSIONS

Smart TVs equipped with a camera and microphone

allow to develop additional features for enhancing

the TV watching experience. Face recognition ser-

vices can be used for improving the user experience

of recommender systems by automatic feedback gen-

eration and user identiﬁcation. Using cloud-based

face recognition services reduces the computational

requirements of TV sets and allow a continuous im-

provement of the accuracy. Two of these face recog-

nition services, Face++ and SkyBiometry are investi-

gated in detail. These services proved to be a valu-

able tool to estimate the age and recognize the gender

of people based on a photo of their face. By combin-

ing the results of both services, we improved the ac-

curacy of the recognition process. Furthermore, face

recognition services can be used to recognize emo-

Enhancing Recommender Systems for TV by Face Recognition

249

tions from people’s face. Automatic feedback for

the content can be generated by matching the recog-

nized emotions to the emotions that are expected to

be evoked by the content. In realistic environments,

such as a living room, face recognition has to cope

with some serious difﬁculties such as head poses de-

viating from a frontal view and suboptimal illumina-

tion conditions. If important parts of the face, such as

eyes or mouth, are not sufﬁciently visible, face detec-

tion and recognition fail. An interesting research track

for future work is to evaluate the accuracy of the auto-

matically generated feedback based on the recognized

emotions by a test panel. Also privacy and data secu-

rity aspects should be investigated, especially if per-

sonal information is sent to external services. Possible

solutions can include data encryption mechanisms or

privacy-preserving architectures that limit the amount

of disclosed data of the TV viewers.

REFERENCES

Affect Analysis Group - Research Lab at the University

of Pittsburgh (2015). Cohn-kanade au-coded ex-

pression database. Available at http://www.pitt.edu/

∼emotion/ck-spread.htm.

Animetrics (2015). Facer identity management sys-

tem: Connecting mobile devices with photo-

graphic clouds. Available at http://animetrics.com/

cloud-web-service-architecture/.

Arapakis, I., Moshfeghi, Y., Joho, H., Ren, R., Hannah, D.,

and Jose, J. (2009). Integrating facial expressions into

user proﬁling for the improvement of a multimodal

recommender system. In Multimedia and Expo, 2009.

ICME 2009. IEEE International Conference on, pages

1440–1443.

De Pessemier, T., Dooms, S., and Martens, L. (2014a).

Comparison of group recommendation algorithms.

Multimedia Tools and Applications, 72(3):2497–

2541.

De Pessemier, T., Dooms, S., and Martens, L. (2014b).

Context-aware recommendations through context and

activity recognition in a mobile environment. Multi-

media Tools and Applications, 72(3):2925–2948.

Eyeris (2015). Emovu emotion recognition software. Avail-

able at http://http://emovu.com.

Face++ (2015). Leading face recognition on cloud. Avail-

able at https://http://www.faceplusplus.com/.

Gourier, N., Hall, D., and Crowley, J. L. (2015a). Head

pose image database. Available at http://www-prima.

inrialpes.fr/perso/Gourier/Faces/HPDatabase.html.

Gourier, N., Hall, D., and Crowley, J. L. (2015b).

The yale face database. Available at http://vision.

ucsd.edu/content/yale-face-database.

Kuo, F.-F., Chiang, M.-F., Shan, M.-K., and Lee, S.-Y.

(2005). Emotion-based music recommendation by

association discovery from ﬁlm music. In Proceed-

ings of the 13th Annual ACM International Confer-

ence on Multimedia, MULTIMEDIA ’05, pages 507–

510, New York, NY, USA. ACM.

Lucey, P., Cohn, J., Kanade, T., Saragih, J., Ambadar, Z.,

and Matthews, I. (2010). The extended cohn-kanade

dataset (ck+): A complete dataset for action unit and

emotion-speciﬁed expression. In Computer Vision and

Pattern Recognition Workshops (CVPRW), 2010 IEEE

Computer Society Conference on, pages 94–101.

Meier, R. (2015). Android developers blog. Available

at http://android-developers.blogspot.be/2015/08/

face-detection-in-google-play-services.html.

Noldus (2015a). Facereader: Facial expression recogni-

tion software. Available at http://www.noldus.com/

facereader.

Noldus (2015b). Tracklab. Available at http://

www.noldus.com/innovationworks/products/tracklab.

Orbeus (2015). Rekognition - integrated visual recogni-

tion solution based in the cloud. Available at https://

rekognition.com/.

Park Aging Mind Lab, University of Texas (2015). Face

database. Available at http://agingmind.utdallas.edu/

facedb/view/normed-faces-by-kristen-kennedy.

Resnick, P. and Varian, H. R. (1997). Recommender sys-

tems. Commun. ACM, 40(3):56–58.

Siebel, N. T. (2015). People tracking - the reading people

tracker. Available at http://www.siebel-research.de/

people tracking/reading people tracker/.

SkyBiometry (2015). Cloud-based face detec-

tion and recognition api. Available at https://

www.skybiometry.com/.

Themoviedb.org (2015). TMDb API. Available at https://

www.themoviedb.org/documentation/api.

Tkalcic, M., Burnik, U., and Kosir, A. (2010). Using affec-

tive parameters in a content-based recommender sys-

tem for images. User Modeling and User-Adapted In-

teraction, 20(4):279–311.

WEBIST 2016 - 12th International Conference on Web Information Systems and Technologies

250