A New Approach for Collaborative Filtering based on Bayesian

Network Inference

Loc Nguyen

Sunflower Soft Company, Ho Chi Minh City, Vietnam

Keywords: Collaborative Filtering, Bayesian Network.

Abstract: Collaborative filtering (CF) is one of the most popular algorithms, for recommendation in cases, the items

which are recommended to users, have been determined by relying on the outcomes done on surveying their

communities. There are two main CF-approaches, which are memory-based and model-based. The model-

based approach is more dominant by real-time response when it takes advantage of inference mechanism in

recommendation task. However the problem of incomplete data is still an open research and the inference

engine is being improved more and more so as to gain high accuracy and high speed. I propose a new

model-based CF based on applying Bayesian network (BN) into reference engine with assertion that BN is

an optimal inference model because BN is user’s purchase pattern and Bayesian inference is evidence-based

inferring mechanism which is appropriate to rating database. Because the quality of BN relies on the

completion of training data, it gets low if training data have a lot of missing values. So I also suggest an

average technique to fill in missing values.

1 INTRODUCTION

The recommendation system is a system which

recommends to the users, all the items which are

those among a large number of existing items in

database. Items which are to point to anything that

users are to considering such as products, services,

books, news papers, etc. And there has been also an

expectation that the recommended items will be

likely the ones that the users would be like the most.

Another words, such mentioned items are going to

go along with the users’ interests.

By those meanings, there are two

recommendations systems, found to be with a

common trends: content-base filtering (CBF) and

collaborative filtering (CF) (Su & Khoshgoftaar,

2009, pp. 3-13) (Ricci, et al., 2011, pp. 73-139):

- The CBF recommends an item to a user if such

item has similarities in contents to other items

that he like most in the past (and his rating for

such item is high). Note that each item has

contents which are their properties and so all

items will compose a matrix, called the items

content matrix.

- The CF on the other hands, recommends an item

to user if his neighbors (mean the other users that

are similar to him) are interested in such item.

Notes that, user’s rating on any item does

express his interest on that item. For that reason,

all user’s ratings which carry out on the items

will also composes a matrix, called the rating

matrix.

Both of the above mentioned filtering (CBF and CF)

do have their own strong, as well weak points. The

CBF is the one to focus on the item’s contents and

user personality’s interests. And it is designed to

recommend different items to different users. Each

user therefore, can receive a unique

recommendation; and this is also the strong point of

CBF filtering. However CBF doesn’t tend towards

community like CF does. As the items that users

may like “are hidden under” user community, CBF

hasn’t been capable of discovering such implicit

items. Because of this, it is acknowledged as a

common weak point of CBF. Moreover, in case the

number of users becomes larger at a huge volume,

CBF may give a wrong prediction; else the accuracy

of CF will get increased.

If there will be a huge contents associated with

items, for instance and these items have had various

properties then CBF in return, will consume even

much more system resources, as well the processing

time in order to analyze these items whereas, CF as a

Nguyen, L..

A New Approach for Collaborative Filtering based on Bayesian Network Inference.

In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015) - Volume 1: KDIR, pages 475-480

ISBN: 978-989-758-158-8

475

matter of fact, doesn’t pay any regard to the contents

of the meant items. Instead the CF only works on the

users’ ratings of the items and it is known as the

strong point of this CF type. Because of that, CF

wouldn’t be encountering with problems, such as

how to analyze the richness in items’ contents.

However this is also to reflecting the weak points of

CF type as well, simply because CF can also do

some unexpected recommendations in some

situations, in which items are to be considered

suitable to users, but they don’t relate to users’

profiles in fact. The problem then even turns into

more serious trouble when having to facing with too

many items which aren’t rated. It turns the rating

matrix into the spare one which is to containing

various missing values. In order to alleviate this

weakness of the CF type, there have been two

techniques which could be helpful, used for

improvements:

- The combinations of the CF and CBF types. This

technique is breaking into two stages. First, it

applies CBF to setting up a complete rating

matrix, and then the next step would be the CF

type, which is used to making predictions for

recommendations. This mentioned technique will

be positively useful to improve the predictions’

precision. But it does consuming more time

when the first stage plays the role of the filtering

step or pre-processing step while the content of

items must be fully represented as a requirement.

This technique is designed to requiring both, the

items’ content matrix, and the rating matrix.

- Compressing the rating matrix into a

representative model, which then is used to

predict all the missing data for recommendations.

This is a model-based approach for the CF type.

Note that to this CF type, there have been two

common approaches, such as the memory-based

and the model-based approaches. The model-

based approach applies statistical and machine

learning methods to mining the rating matrix.

The result of this mining task is the above

mentioned model.

Although the model-based approach doesn’t give

result which is as precise as the combination

approach, it can solve the problem of huge database

and sparse matrix. Moreover it can responds user’s

request immediately by making prediction on

representative model though instant inference

mechanism. So this paper focuses on model-based

approach for CF based on Bayesian network

inference. There are many other researches which

apply Bayesian network (BN) into CF. Authors

(Miyahara & Pazzani, 2000) propose the Simple

Bayesian Classifier for CF. Suppose rating values

range in the integer interval {1, 2, 3, 4, 5}, there is a

set of 5 respective classes {c

, c

}. The

Simple Bayesian Classifier uses Naïve Bayesian

classification method (Miyahara & Pazzani, 2000, p.

4) to determine which class a given user belongs to.

Mentioned in (Su & Khoshgoftaar, 2009, p. 9), the

NB-ELR algorithm is an improvement of Simple

Bayesian Classifier, which combines Naïve

Bayesian classification and extended logistic

regression (ELR). ELR is a gradient-ascent

algorithm, which is a discriminative parameter-

learning algorithm that maximizes log conditional

likelihood (Su & Khoshgoftaar, 2009, p. 9). NB-

ELR algorithm gains high classification accuracy on

both complete and incomplete data. Author

(Langseth, 2009) assumes that there is a linear

mapping from the latent space of users and items to

the numerical rating scale. Such mapping which

conforms the full joint distribution over all ratings

constructs a BN. Parameters of joint distribution are

learned from training data, which are used for

predicting active users’ ratings. According to

(Campos, et al., 2010), the hybrid recommender

model is the BN that includes three main kinds of

nodes such as feature nodes, item nodes, and user

nodes. Each feature node represents an attribute of

item. Active users’ ratings are dependent on these

nodes.

In general, other researches focus on

classification based on BN, discovering latent

variables, and predicting active users’ ratings while

this research focuses on using BN to model users’

purchase pattern and taking advantages of inference

mechanism of BN. It is the potential approach

because it opens a new point of view about

recommendation domain. In section 2 I propose an

idea for the model-based CF algorithm based on

Bayesian network. Section 3 tells about the

enhancement of our method. Section 4 is the

evaluation and its results. Section 5 is the

conclusion.

2 A NEW CF ALGORITHM

BASED ON BAYESIAN

NETWORK

The basic idea of model-based CF is to try to find

out an optimal inference model which can give real-

time response. Besides, sparse matrix and black

sheep are considered as important problems which

need to be solved. I propose a new model-based CF

KDIR 2015 - 7th International Conference on Knowledge Discovery and Information Retrieval

476

algorithm based on Bayesian network (Neapolitan,

2003, p. 40) inference so as to gain high accuracy

and solve the problem of sparse matrix. In general,

our method aims to build up Bayesian network (BN)

from rating matrix. Each node of such BN represents

an item and each arc expresses the dependence

relationship between two nodes. Whenever

recommendation task is required, the inference

mechanism of BN will determine which items are

recommended to user, based on the posterior

probabilities of such items. The larger the posterior

probability of an item is, the higher it’s likely that

this item is bought by many users. So such item has

high frequency and it should be recommended to

new users. If the rating matrix is sparse, we try to

replace missing values by estimated values so that it

is easy and efficient to build up BN from complete

matrix instead of from sparse matrix. The technique

of how to estimate missing values is discussed later.

New algorithm includes 4 steps:

1. Transposing user-based matrix to item-based

matrix.

2. Filling in missing values.

3. Learning BN from item-based matrix.

4. Performing recommendation task.

Steps 1, 2, 3 are offline-mode processes and so they

don’t affect the ability of real-time response in step

4. These steps are described in following sub-

sections 2.1, 2.2, 2.3, and 2.4.

2.1 Transposing User-based Matrix to

Item-based Matrix

User-based matrix is the original format of rating

matrix in which each row contains ratings that a

concrete user giving to many items. Otherwise, for

item-based matrix, each row contains ratings that a

concrete item receiving from many users. User-

based matrix transposed into item-based matrix in

this step is considered as simple pre-processing step

which is simple but very important because BN is

Table 1: Transposing user-based matrix to item-based

matrix.

item

user

= 1 r

= 3 r

= ?

user

= 3 r

= ? r

= 5

user

= 4 r

= 2 r

= 1

user

= ? r

= 3

user

item

= 1 r

= 3 r

= 4 r

= ?

item

= 3 r

= ? r

= 2 r

= ?

item

= ? r

= 5 r

= 1 r

= 3

constituted of item nodes. In real context, the

number of customers is unlimited and increased

much more than the number of items. We use item-

based matrix in order to keep the size of BN in small

so that the speed of inference is improved in

recommendation task (see step 4). Table 1 is an

example of transposing user-based matrix to item-

based matrix. Question marks (?) indicates missing

values.

2.2 Filling in Missing Values

The BN learned from complete rating matrix is more

adequate than the one learned from sparse matrix.

Some methods can learn BN from incomplete data

while other methods require complete data. In case

of requirement of complete data, the simplest way to

fill in incomplete data is to replace missing values

by average values. An average value is an estimate

of missing value. The replacement is iterative and

overlap procedure, which is considered as estimation

process:

- Replacement is done via many iterations.

Replacing missing values with average values in

next iteration is based on estimated values in

previous iteration.

- Average value is calculated as the mean of user

vector. If user vector is empty then the mean of

item vector becomes an estimate of average

value.

By applying average method, we have completely

estimated item-based rating matrix shown in table 2.

Table 2: Completely estimated item-based rating matrix.

user

item

= 1 r

= 3 r

= 4 r

= 3

item

= 3 r

= 4 r

= 2 r

= 3

item

= 2 r

= 5 r

= 1 r

= 3

This average technique is fast but not accurate

because replaced values don’t reflect the real values

that users rate on an item. Learning methods which

can undertake incomplete data in order to construct

BN are recommended but they go beyond this

research.

2.3 Learning BN from Item-based

Matrix

Machine learning techniques are used to learn BN

from the item-based matrix shown in tables 1 and 2.

The research applies the K2 learning algorithm built

in the Elvira system (Serafín, et al., 2003) into

constructing BN.

A New Approach for Collaborative Filtering based on Bayesian Network Inference

477

2.4 Performing Recommendation Task

Recommendation task is performed according to

evidence-based inference in BN. Firstly, it

determines posterior probabilities (PoP) of nodes in

networks and secondly, recommends which nodes

have high PoP to users. The target BN, learned from

item-based matrix, is considered user’s purchase

pattern and existing her/his rated items are

considered evidences. This method has two

advantages:

- Using BN being itself purchase pattern can

discover user interests and predict her/his

purchase trend in future. So the quality of

recommendation is improved.

- Evidence-based inference in BN is a solid and

powerful deduction mechanism. This decreases

mean square of error when estimating missing

ratings.

The target BN is very small with only three nodes

and so the complexity of computation is

insignificant and does not affect ability of real-time

response of recommendation system but when BN

gets huge, it becomes serious problem that needs to

be resolved. The Pearl algorithm (Neapolitan, 2003,

pp. 126-156) is the classical method to solve this

problem by propagating messages over the network

according to two different directions. This research

applies an inference algorithm – the variable

elimination method of propagation built in Elvira

system (Serafín, et al., 2003) into calculating

posterior probabilities.

Besides, next section will mention an

enhancement technique which alleviates such

complexity of computation.

3 AN ENHANCEMENT –

CLUSTERED BAYESIAN

NETWORKS

As discussed, however the number of items is

limited and not increased as much as the number of

users, it is still so large. Obviously, BN consists of

connected nodes but it may contain some incoherent

or unconnected nodes because it is learned from

large data. Such incoherent nodes make the

inference mechanism less efficient. This is problem

of incoherence among item nodes, which need

solved.

Suppose that there are a lot of items in

supermarket and they are divided into categories

(groups). Clothes items (T-shirts, trousers, jeans,

pulls, etc.) in the same category (clothes category)

are related together. So they are connected nodes in

BN and compose naturally a sub-group of nodes.

Nevertheless, other items not related to clothes

category will become incoherent nodes which are

unconnected to clothes nodes. The inference based

on whole BN including incoherent groups of nodes

is less precise. This issue is solved by learning one

BN for each category, thus, we build up many

individual BN (s) and each BN represents a group of

related items. In other words, a large and whole BN

is decomposed into small and individual BN (s) so

that nodes in the same individual BN are more

coherent. For example, three individual BN (s) are

built up, which correspond with three categories in

supermarket: clothes, furniture and electrical goods.

In case that the training data set (rating matrix)

doesn’t specify explicitly categories, we will apply

clustering technique into discovering groups of

items. So the improvement in building up BN

includes two steps:

1. Applying clustering methods such as k-mean,

k-medoid, etc. into grouping items. We can

classify items into groups (categories)

manually, thus each item can belong to more

than one group.

2. For each group (category) of items:

a. Training data set is pruned. Namely, for

each row of rating matrix, columns which

are not corresponding to items in this

group are removed.

b. BN is learned from pruned training

dataset. Such BN is called individual BN.

Note that a node can belong to more than one

individual BN. It is a drawback but occurs in

commercial context, an item can be classified into

more than one category (group).

So every time recommendation task is required

(in step 4 of our method), the inference process is

executed on individual BN instead of whole BN as

before. The speed is improved because the number

of nodes in individual BN is much smaller than in

whole large BN. But another issue is raised “given

active user how to choose a right individual BN in

order to perform inference task?”. If we browse over

all of individual BN (s) and all their nodes to find

out the right BN which contains most items (nodes)

of active user, it consumes a lot time and computer

resources. So it requires another approach. This is an

open research but I also suggest a solution so-called

mapping table (MT) technique.

The basic idea of MT technique is to create a

mapping table (MT) at the same time to learning

BN. Each row of MT is a key-value pair. Key is

KDIR 2015 - 7th International Conference on Knowledge Discovery and Information Retrieval

478

node’s name. Value is the bit set indicating which

individual BN (s) to which this node belongs. Each

bit of this bit set represents the occurrence of an

individual BN, in other words, whether or not such

individual BN contains the node specified by the

key. Suppose there are 3 individual BN (s) such as

, BN

and 8 nodes such as A, B, C, D, E, F,

G, H. The example MT is described in table 3.

Table 3: Mapping table.

A 100

B 100

C 010

D 010

E 001

F 001

G 011

H 011

This MT is interpreted as follows: “nodes A and B

belong to BN

”, “nodes C and D belong to BN

”,

“nodes E and F belong to BN

”, “nodes G and H

belong to BN

and BN

, respectively”.

Given active user and her/his rated items, for

each individual BN, the total number of nodes

contained in this BN is counted. Which individual

BN has the highest total number is chosen as right

one on which inference task will be executed. For

instance, given nodes E, F, G and H on which an

active user rates, we have:

- The total number of nodes contained in BN

is 0,

= 0 because BN

doesn’t any node rated by

active user.

- The total number of nodes contained in BN

is 0,

= 2 because BN

contains G and H.

- The total number of nodes contained in BN

is 0,

= 4 because BN

contains E, H, G and H.

Because t

is maximal, BN

is the right individual BN.

4 EVALUATION

Database Movielens (GroupLens, 1998) including

100,000 ratings of 943 users on 1682 movies is used

for evaluation. Database is divided into 5 folders,

each folder includes training set over 80% whole

database and testing set over 20% whole database.

Training set and testing set in the same folder are

disjoint sets.

The system setting includes: processor

Pentium(R) Dual-Core CPU E5700 @ 3.00GHz,

RAM 2GB, available RAM 1GB, Microsoft

Windows 7 Ultimate 2009 32-bit, Java 7 HotSpot

(TM) Client VM. The proposed BN method is

compared to four other methods: Green Fall –

model-based CF using mining frequent itemsets

technique, neighbor item-based method, neighbor

user-based and SVD (Ricci, et al., 2011, pp. 151-

152). Note that the BN method is enhanced by

clustering individual BN (s) aforementioned in

section 3.

There are 7 metrics (Herlocker, et al., 2004, pp.

19-39) used in this evaluation: MAE, MSE,

precision, recall, F1, ARHR and time. Time metric is

calculated in seconds. MAE and MSE are predictive

accuracy metrics that measure how close predicted

value is to rating value. The less MAE and MSE are,

the high accuracy is. Precision, recall and F1 are

quality metrics that measure the quality of

recommendation list – how much the

recommendation list reflects user’s preferences.

ARHR is also quality metric that indicates how well

recommendation list is matched to user’s rating list

according to rating ordering. The large quality

metric is, the better algorithm is.

The evaluation result is shown in table 4 as

follows:

Table 4: Evaluation result.

method

Green

Fall

Item-

based

User-

based

SVD

MAE 0.6127 0.7241 0.5222 0.9319 0.5363

MSE 0.9023 1.1640 0.6675 2.1664 1.1734

Precision 0.1430 0.1328 0.0245 0.0014 0.0041

Recall 0.0552 0.0523 0.0092 0.0005 0.0015

F1 0.0785 0.0739 0.0131 0.0008 0.0021

ARHR 0.0504 0.0442 0.0043 0.0005 0.0016

Time 1.8780 0.0050 9.3706 8.3831 0.0176

The proposed BN method is much more effective

than other methods when getting high quality via

metrics precision, recall, F1 and ARHR. Its accuracy

is approximate to item-based, user-based methods,

SVD and better than Green Fall via metrics MAE

and MSE. It consumes more time than Green Fall

and SVD but less time than item-based method and

user-based method.

In general, BN method is good at

recommendation quality and Green Fall is good at

real-time response. The other drawback of BN

method is that it requires a lot of computer resources

to learn data. Thus, building up BN takes much more

time than mining frequent itemsets.

5 CONCLUSION

The essence of our method is to learn Bayesian

A New Approach for Collaborative Filtering based on Bayesian Network Inference

479

network (BN) from item-based rating matrix. After

that BN inference is applied into recommending

items to user. Therefore, complexity of the proposed

algorithm is dependent on both learning task and

inference task. However, only inference task is

considered in real time application because learning

task is done in offline mode.

In the evaluation, our method is compared with

other memory-based and model-based methods such

as Green Fall, item-based, user-based and SVD. The

result shows that the BN method is very effective

when it gains high quality via precision, recall, F1

and ARHR metrics. However, learning BN is a big

problem. This research is still open.

ACKNOWLEDGEMENTS

This research is the place to acknowledge Madam

Do, Phung T. M. and Sir Vu, Dong N. who gave me

valuable comments and advices. These comments

help me to improve this research.

REFERENCES

Campos, L. M. d., Fernández-Luna, J. M., Huete, J. F. &

Rueda-Morales, M. A., 2010. Combining content-

based and collaborative recommendations: A hybrid

approach based on Bayesian networks. International

Journal of Approximate Reasoning, September, 51(7),

p. 785–799.

GroupLens, 1998. MovieLens datasets. [Online]

Available at: http://grouplens.org/datasets/movielens/

[Accessed 3 August 2012].

Herlocker, J. L., Konstan, J. A., Terveen, L. G. & Riedl, J.

T., 2004. Evaluating Collaborative Filtering

Recommender Systems. ACM Transactions on

Information Systems (TOIS), 22(1), pp. 5-53.

Langseth, H., 2009. Bayesian Networks for Collaborative

Filtering. s.l., Tapir Akademisk Forlag, pp. 67-78.

Miyahara, K. & Pazzani, M. J., 2000. Collaborative

Filtering with the Simple Bayesian Classifier. In: R.

Mizoguchi & J. Slaney, eds. PRICAI 2000 Topics in

Artificial Intelligence. s.l.:Springer Berlin Heidelberg,

pp. 679-689.

Neapolitan, R. E., 2003. Learning Bayesian Networks.

Upper Saddle River(New Jersey): Prentice Hall.

Ricci, F., Rokach, L., Shapira, B. & Kantor, P. B., 2011.

Recommender Systems Handbook. s.l.:Springer New

York Dordrecht Heidelberg London.

Serafín, C. M., Carmelo, R. T., Pedro, M. L. & Francisco,

V. J. D., 2003. Elvira system. 0.11 ed. s.l.:National

University of Distance Education.

Su, X. & Khoshgoftaar, T. M., 2009. A Survey of

Collaborative Filtering Techniques. Advances in

Artificial Intelligence, Volume 2009.

KDIR 2015 - 7th International Conference on Knowledge Discovery and Information Retrieval

480