algorithms take two steps:
1. Look for users who share the same rating
patterns with the active user (the user who the
prediction is for).
2. Use the ratings from those like-minded users
found in step 1 to calculate a prediction for the
active user.
In the age of information explosion such
techniques can prove very useful as the number of
items in multimedia content (such as music, movies,
news, etc) have become so large that a single person
cannot possibly view them all in order to select
relevant ones. On the other hand, relying solely on a
scoring or rating system which is averaged across all
users ignores specific demands of a user, and its
outcome may be particularly poor in tasks where
there is large variation in interest, like movie
recommendation. Consequently, other methods to
combat information explosion must aid in the
process and in the scope of this work we focused on
one of them, i.e. hierarchical data clustering.
The essence of clustering data is the
classification of similar objects into different
homogeneous groups, based on the values of their
attributes. More precisely, data clustering is the
partitioning of a given data set into subsets
(clusters), so that the data in each subset share some
common trait according to some defined distance
measure. In general, clustering of data is still
considered an open research issue, basically because
it is difficult to handle in the cases that data is
characterized by numerous measurable features, as
in the case of movies.
Furthermore, the basic architectural concept in
MPEG-21 is the Digital Item (DI). Digital Items are
structured digital objects, including a standard
representation, identification and metadata. In this
context, complex digital objects, as the ones
containing multimedia content feature information
used in the presented hybrid CF approach, may be
declared using the notion and language of a Digital
Item. The usage of the MPEG-21 Digital Item
Declaration Language to represent such complex
digital objects, has introduced benefits to the
proposed framework in two major areas: The
management of the initial content presentation and
the management and distribution of multimedia
content, such as video, images and metadata.
Furthermore, the benefit from the adoption of
MPEG-21 is that every Digital Item can contain a
specific version of the content for each supported
platform. The dynamic association between entities
reduces any ambiguity over the target platform and
the content.
At this point let us assume that the multimedia
content offered to the end-users of the proposed
standalone system contains a set of movies to choose
from. These can be movies whose main genre is
comedy, drama, science fiction, etc. For the sake of
simplicity, we utilize only a part of the IMDB movie
information. More specifically, we take into
consideration only the subset of the following 14
movie attributes: Actor, Actress, Director, Genre,
Language, Location, MPAA Rating, Plot summary,
Producer, Rating, Release Date, Running Time,
Title. Let the end-users have preference ratings over
the set of movies, either for a specific movie or for a
group of movies (i.e. cluster of movies), in the
following 1-10 scale-based manner: 1 is used to
denote a really negative preference, i.e. “really
hate”, whereas 10 denotes a “really like” preference.
The basic problem is which movies the content
provider should offer to new users, given the
preferences of existing users over the set of offered
movies.
A simplified example of this situation may be
given as follows: three end-users, John, George and
Mary are already watching their favourite movies:
John has invited his friends at home to watch a
comedy film in his new home cinema, George is
travelling by train and watches his favourite drama
movie in his PDA and Mary is waiting for her turn
in the doctor’s office watching a drama movie on a
Set-Top-Box. All three of them have established
their user preferences for a set of ten (randomly
selected) movies (A to J), that include the three that
are currently viewing. As expected, each end-user
has a different view on the quality of the 10 selected
movies and rates them according to his/her
subjective criteria. A new user, Tom, opens his
personal computer and requests from the content
provider the top movies, according to system’s user
ratings, to select from.
3 A HYBRID PERSONALIZED
FILTERING APPROACH
One of the technical novelties introduced in the
proposed framework is the handling of its users in a
personalized manner, by building different profiles
according to their preferences. The system is able to
provide each user personalized multimedia content
according to his/her specific user profile; a
functionality provided considering a hybrid
collaborative filtering methodology, based on
hierarchical clustering on content information
acquired by all participating content material.
In this context, we apply traditional data mining
techniques, such hierarchical clustering on the
WEBIST 2007 - International Conference on Web Information Systems and Technologies
224