When a new domain is needed to integrate with
the framework, the corresponding data files should
be created according to the following ontology files:
Domain Ontology, Item Ontology, User Ontology,
Rating Ontology, User Profile Similarity Ontology
and Item Similarity Ontology.
The framework has different interfaces and
perspectives for end users and administrators.
5 EVALUATION
Automated testing is not applicable for cross-domain
testing because there is no dataset available for our
target domains. Therefore, we focus on the single
domain testing and defer cross-domain tests until the
deployment of the prototype of the framework and
beta testing with real end users. For each single
target domain, recommender engine is tested with
framework’s test suite and we observed the effects
of group rules on collaborative filtering algorithm.
Besides the evaluation results, our aim is to show
that our framework enables easy development of
knowledge-based recommenders and it also facilities
a testing environment to test, evaluate and verify the
algorithms of recommender engine and its
knowledge base. For their evaluation process, we
follow the steps explained below.
5.1 Knowledge Acquisition
In order to form a knowledge base about the target
domains ‘movies’, ‘music’ and ‘books’, we prepared
an online survey and 100 people from 10 different
countries were participated. The purpose of the
survey is to learn users’ preferences and needs in
target domains and their personality features. Some
example questions from the survey are as follows:
Would you sort the following MOVIE features
considering the importance for you?
"Title, Actor, Actress, Producer, Director, Year,
Genre, Tags, Language, Country”
Which types of MOVIES you like?
“Action, Animation, Comedy,…, Other”
How do you define yourself?
“Perfectionist, Helper, Performer, Romantic,
Observer, Questioner, Adventurer, Boss,
Peacemaker”
For personality types we chose the nine types of
the Enneagram of personality given above which are
useful in classifying characters. The results of the
survey are analyzed statistically with the SPSS
(Statistical Package for the Social Sciences)
software by the help of two professional
statisticians. The feature relations and rules about
users’ preferences and tendencies such as
“Observers dislike Romance movies” and “Helpers
like Romance movies” are obtained.
We tested each rule’s effects on the collaborative
and content-based recommendation algorithms.
5.2 Metrics
In order to determine the prediction quality of our
knowledge-based approach which extends
collaborative and content-based algorithms, Mean
Absolute Error (MAE) metrics (
Sarwar et al, 2001)
was used. The MAE is computed by first summing
the absolute errors of the N corresponding ratings-
prediction pairs and then averaging the sum. A
smaller value of MAE indicates a better accuracy.
5.3 Data Sets, Common Vocabulary
Adapters and Data Preprocessing
In order to test our approach we developed common
vocabulary adapters for the movie, music and book
domains using the datasets available datasets. For
this work, we present the dataset for movie domain.
We used a popular database, the MovieLens
dataset by the GroupLens Research group. The data
set contains 1682 movies, 943 users and 100,000
ratings (1–5 scales), where each user has rated at
least 20. We matched the movie’s information with
the IMDb dataset to extract extra features.
To compare our approach with the state of art
collaborative algorithm, we chose the cross
validation technique with holdout method and
performed the experiments under the different
configurations.
As our knowledge base rules make use of user’s
personality features, some preprocessing is required
in order to determine the active user’s personalities
in these configurations. We used Weka (Waikato
Environment for Knowledge Analysis) which is a
popular suite of machine learning software in order
to classify users via Decision Trees.
5.4 Evaluation Results
Because of space limitation, we present two
different knowledge base performances against a
state-of art collaborative filtering technique (CF)
(Adomavicius & Tuzhilin, 2005) on movie data set.
We prepare the knowledge bases with the following
rules:
Knowledge Base 1 (KB1)
“Observers like Animation movies”
“Observers dislike Romance movies”
CROSSING FRAMEWORK - A Dynamic Infrastructure to Develop Knowledge-based Recommenders in Cross Domains
129