the first experiment uses Pearson correlation
coefficient (Formula (1)) for the similarity
computation and we call it Correlation-Based RS
(CBRS). The second experiment uses Cosine Vector
similarity measure (Formula (2)) for the similarity
computation and we call it Cosine Vector RS
(CVRS). The third experiment uses mean difference
weights similarity measure (Formula (3)) for the
similarity computation and we call it Difference
Weights RS (DWRS). Finally, the forth experiment
uses the proposed fuzzy weighted Pearson
correlation coefficient (Formula (4b)) for the
similarity computation and we call it Fuzzy-
Weighted RS (FWRS).
The performance of each CRS is evaluated using
coverage, percentage of the correct predictions
(PCP), and mean absolute error (MAE)
(Adomavicius and Tuzhilin, 2005; Breese et al.,
1998; Herlocker et al., 2004). Coverage is the
measure of the percentage of items for which a RS
can provide predictions. We compute the active user
coverage as the number of items for which the RS
can generate predictions for that user over the total
number of unseen items (Vozalis and Margaritis,
2003; Herlocker et al., 2004). The split coverage
over all the active users is given by:
∑
∑
(8)
Here, N
is the total number of predicted items
for user u
, and M
is the total number of the active
users. The active user PCP is the percent of the
correctly predicted items by the system for a given
active user to the total number of items in the test
ratings set of that user. The set of correctly predicted
items for a given user and the split PCP over all the
active users are defined by the following formulae:
|
∈
,
,
,
(9)
∑|
|
∑
100%
(10)
The MAE measures the deviation of predictions
generated by the RS from the true ratings specified
by the active user (Breese et al., 1998; Vozalis and
Margaritis, 2003; Herlocker et al., 2004). The split
MAE over all the active users (M
) is:
1
1
,
,
(11)
Low Coverage value indicates that the RS will not
be able to assist the user with many of the items he
has not rated while lower MAE corresponds to more
accurate predictions of a given RS. Over all splits
we compute PCP (coverage) by summing all correct
predictions (predictions) over all active users over
all splits and divided it by the sum of all testing set
sizes of all active users over all splits. The MAE
over all splits is the average of all splits’ MAEs. To
get the predictions, we have to use a prediction
formula. The predicted rating,
,
, is usually
computed as an aggregate of the ratings of
’s
neighborhood set for the same item
. The common
prediction formulae are (Adomavicius and Tuzhilin,
2005):
,
∑
,
,
∈
∑
,
∈
(12a)
,
∑
,
,
∈
∑
,
∈
(12b)
where
denotes the set of neighbors for
who
have rated item
and
is the average rating of
user
. Formula (12a) scales the contribution of
each neighbor’s rating by his similarity to the given
active user. On the other hand, because users usually
vary in their use of rating scale, Formula (12b),
Resnick’s prediction Formula, compensates for
rating scale variations by keeping predicted ratings
for a given user to fall around his mean rating.
However, mean ratings for some users are high and
thus the predicted ratings may fall outside the rating
scale’s range [1.0, 5.0]. Thus we use priority-based
prediction formula where Formula (12b) is used
first. If its predicted rating is out of the rating range,
then we switch to Formula (12a). Formula (12a)
predicted ratings will not exceed the rating scale
range. The neighborhood set size
is varied from
10 to 100 by a step size of 10 each time,
10,20,…,100.
4.1 Analysis of the Results
The results presented in Figures 2, 3, and 4 show the
PCP, coverage, and MAE over all active users over
all splits for the four different RS, CBRS, CVRS,
DWRS, and FWRS. These results show that FWRS
performs better that all CBRS, CVRS, and DWRS in
terms of PCP, coverage and MAE. The higher PCP
of FWRS obviously illustrates that better set of like-
minded users is found and therefore the accuracy of
the RS gets enhanced.
ICEIS2013-15thInternationalConferenceonEnterpriseInformationSystems
412