is in front of the TV. If one or more faces are detected
by the Android system, the listener is notified and a
picture is taken. The camera focus is automatically
adjusted to the detected faces.
In phase 2, the taken pictures are used as input
for the face recognition process. In our implementa-
tion we use two different face recognition services:
Face++ and SkyBiometry. Face++ (Face++, 2015)
is a real-time face detection and recognition service.
The results of the Face++ recognition process are the
positions of the faces with detailed X,Y coordinates
for the eyes, nose, and mouth. Besides, the face
recognition service can detect glasses. More impor-
tant for our recommender system is the service’s es-
timation of the person’s age, gender, and race, to-
gether with a confidence value for each attribute. An-
other interesting outcome of the service is the degree
to which the subject smiles with an associated con-
fidence value. Face++ stores the results of the face
recognition processes in a database to compare future
recognition requests. If a new face recognition re-
quest shows similarities with a previously recognized
face, a similarity indicator is specifying the resem-
blance. If this resulting similarity indicator is above
a certain threshold, our application assumes that this
person is a returning user and therefore already regis-
tered in the system.
SkyBiometry (SkyBiometry, 2015) is a service
very similar to Face++ but uses a different computer
vision algorithm. It is a cloud based face detection
and recognition service that is available through an
API. The service is able to detect multiple faces at
different angles in a picture and also provides the lo-
cation of the eyes, nose, and lips. The service makes
an assessment of the presence of glasses (dark glasses
or not), the fact that the person is smiling and the lips
are sealed or open, whether the person’s eyes are open
or not, the person’s gender, and the person’s mood
(e.g., happy, sad, angry, surprised, disgusted, scared,
neutral). For each of these attributes, a percentage is
indicating the confidence value of the estimation. The
age of a person is specified by a point estimator. Faces
already known by the service can be recognized.
For an optimal face detection and recognition, a
picture is taken using the camera of the smart TV and
sent for analysis to these two services every fifteen
seconds. The big advantage of using two face de-
tection and recognition services, using different algo-
rithms, is the increased accuracy by combining them.
In case the two services do agree, the results can be
used with a high degree of certainty. If they do not
agree, one of them is chosen (Section 5) or a new pic-
ture is send for reanalysis.
This way our application enables automatic au-
thentication of users in front of the TV. To provide
users feedback on this authentication process, the
recognized persons are shown in the user interface.
Therefore, the captured picture is cropped, so that
only the head is remaining, and used as profile pic-
ture in the application (Figure 1, left side).
4 RECOMMENDER SYSTEM
4.1 Cold Start Solution
Traditional recommender systems suffer from the new
user problem, i.e. the issue that recommender sys-
tems cannot generate accurate recommendations for
new users who have not yet specified any preference.
To cope with the new user problem (also known as the
cold start problem), our system recommends videos
for new users based on the derived demographic char-
acteristics of the user, such as age and gender. These
user characteristics are matched to the demographic
breakdowns of the ratings for movies on IMDb.com.
Figure 2 shows an example of such a demographic
breakdown for the ratings of the movie “The Twilight
Saga: Breaking Dawn - Part 1”. For this movie, a
significant difference in rating behavior of 1.8 stars is
visible for men and women. For specific age groups,
these differences may vary. For example, a differ-
ence of 2.3 stars is witnessed for people under 18,
whereas the age group of 45+ has a difference of 0.9
between men and women. The ratings of the spe-
cific age group and gender are selected based on the
user’s gender and age as estimated by the face recog-
nition service. Subsequently, the user’s preference for
a movie is predicted based on these ratings. As soon
as more detailed preferences of the user become avail-
able (e.g. through ratings), these are taken into ac-
count by using a standard collaborative filtering sys-
tem. These collaborating filtering (CF) recommenda-
tions are combined with the recommendations based
on demographics (demo) using a weighted average.
Rec
combined
= w
CF
· Rec
CF
+ w
demo
· Rec
demo
(1)
As more rating data of the user becomes available,
the collaborative filter is expected to become more ac-
curate and therefore the weight of the collaborative
filter (w
CF
) is increasing while the weight of the de-
mographics (w
demo
) is decreasing. In Figure 1, these
recommendations are visualized on the right side of
the screen by means of the posters of the movies.
Posters and metadata of movies are retrieved using
the TMDb API (Themoviedb.org, 2015).
Enhancing Recommender Systems for TV by Face Recognition
245