CAPTURING USER’S PREFERENCES USING A GENETIC

ALGORITHM

Determining Essential and Dispensable Item Attributes

S. Valero, E. Argente and V. Botti

DSIC, Universidad Polit´ecnica de Valencia, Camino de Vera s/n, Valencia, Spain

Keywords:

Soft computing, Genetic algorithms, Recommender systems, Proﬁle acquisition, Capturing preferences.

Abstract:

Determining the most desired product attributes would be crucial for companies that want to offer their clients

those products which best ﬁt their preferences. In this work, a genetic approach is employed for establishing

the appropriate attribute weights of movies, determining which movie attributes are essential or dispensable

for users in their selection process. The obtained weights are employed to predict user’s ratings for a test set

of movies, proving that the obtained parameters really describe their preferences.

1 INTRODUCTION

Setting the most desired product attributes would be

very interesting for any companies which want to of-

fer its clients such items that best ﬁt these desired fea-

tures. In this way, it would be interesting to know

which item attributes are essential or superﬂuous for

users in their selection process when buying prod-

ucts or paying for services. This problem could be

seen as a combinatorial problem, in which attribute

based proﬁles of items are used as start point to set the

appropriate attribute weights that make these prod-

ucts attractive for users. Thus, a soft computing ap-

proach based on genetic algorithms could be used

to set the optimum combination of the item attribute

weights that best ﬁt the user’s preferences. Some Soft

Computing approaches have previously been applied

in user’s preference learning, such as (Guan et al.,

2002; Shibata et al., 2002). These works capture

users’ preferences but do not consider some start-

ing problems caused by missing information of new

users. In this way, a genetic algorithm (GA) capa-

ble of establishing the appropriate weights for item

attributes that deﬁne the users’ preferences, but need-

ing few starting data is presented in this work. Con-

cretely, this GA determines the suitable combination

of attributes that makes an speciﬁc item attractive for

users. These combinations are later used to predict

users’ rating movies, making recommendations about

unknown movies. Moreover, our approach needs few

users’ starting ratings (only 3 or 5) to obtain good rec-

ommendations (right rates closer to 75%). For this

purpose, the Movilens database will be used, which

contains ratings of movies made by MovieLens web

site

users in 2000 (Herlocker et al., 1999).

2 GENETIC APPROACH

The proposed approach for capturing users’ favorite

movie attributes is based on a real-coded GA (Gold-

berg, 1991). The developed codiﬁcation problem al-

lows studying diverse kinds of problems and estab-

lishing rules at different levels that the obtained chro-

mosomes must satisfy (Valero et al., 2009). In this

case, each possible solution of the problem (chromo-

some) contains a descriptive movie attribute weight

(release data) and genre attributeweights (19 possibil-

ities: action, adventure, animation, children, comedy,

crime, etc.).

Figure 1 shows the steps followed by the devel-

oped GA. First, the GA obtains a starting generation,

selecting the most diverse one from several random

generations. Then, the GA proposes new solutions

employing crossover and mutation operators. The

mutation operator modiﬁes some genes in a haphaz-

ard way, jumping randomly anywhere within the al-

lowed gene domain. Regarding the crossover oper-

ator, it has been adapted from the ones proposed in

(Ortiz et al., 2001), which is based on conﬁdence in-

tervals. The new operator takes into account the rules

deﬁned in the developed codiﬁcation. This GA ap-

http://movielens.umn.edu

599

Valero S., Argente E. and Botti V. (2010).

CAPTURING USER’S PREFERENCES USING A GENETIC ALGORITHM - Determining Essential and Dispensable Item Attributes.

In Proceedings of the 2nd International Conference on Agents and Artiﬁcial Intelligence - Artiﬁcial Intelligence, pages 599-602

DOI: 10.5220/0002727305990602

 SciTePress

proach has been successfully employed in combina-

torial catalysis for high-throughput experimental de-

sign (Valero et al., 2004b; Valero et al., 2004a; Valero

et al., 2009).

Converge

criterion

Random

Gene ra tion 0

Problem Codification

Setting up

GA parameters

Starting Generation

Setting Up

GA Operators

Mutation Cros s over

Gene ra tion i

Fina l

Genera tion

Yes

i=0

i=i+1

Figure 1: Steps of the developed genetic approach.

As mentioned above, the GA has been used to es-

tablish the appropriate attribute weights for movies

that deﬁne the preferences of a set of one or more

users. These obtained weights were employed to pre-

dict users’ ratings for a test set of movies, in order

to prove that the obtained parameters really described

their preferences. Absolute Mean Error (MAE) and

Mean Square Error (MSE) were used as metrics ((Sar-

war et al., 1998; Shardnand and Maes, 1995; Her-

locker et al., 1999)). The objectivefunction optimized

by the proposed GA was:

MAE =

∑

i=1

− R

(1)

where M indicates the number of users’ movie rat-

ings; P

represents the estimated user rating for the

movie i; R

is the user’s rating for the movie i. Func-

tion 2 describes the way in which P

is obtained:

∑

j= 1

(2)

where w

is the adjusted weight for the attribute j, q

is the value of the attribute j for the movie to be rated

and

∑

j= 1

= 1. Thus, the ﬁtness of each possible

solution or chromosome was

MAE

. Finally, the devel-

oped GA proposed those attribute weights that best ﬁt

users’ interests.

3 RESULTS

A user contained in the MovieLens database was se-

lected taking into account the amount of available in-

formation about him. This user was employed to set

up the GA parameters. Then, the GA was used for

capturing the preferences of a cluster of users. These

weights were later used to estimate both cluster rat-

ings and ratings made on some other movies by simi-

lar users.

3.1 GA Setting Up Process

One user (identiﬁer 276) with a lot of ratings (518)

was selected, hence well informed data was used for

the GA setting up process. The available information

of user 276 was divided into training (90%) and test-

ing (10%) data sets. Training data was used by the

GA for ﬁtness evaluation (function 1), whereas the

testing data set was employed to predict the user’s

ratings. The testing data set contains information

about recently rated movies. First of all, param-

eters concerning population size and crossover op-

erator were set: α value, parents ratio and popu-

lation size. A battery of tests was performed us-

ing all possible combinations of the values: pop-

ulation size= {100, 200, 300, 400, 500}; α value =

{0.6, 0.7, 0.8, 0.9}; parents ratio= {10%, 20%, 30%}.

Mutation probability was set to 0; thus, the muta-

tion operator did not act in these tests. The pro-

posed GA reached 25 optimization cycles for each

parameter combination (convergence criterion). At

the end of each test, the best chromosome (attribute

weights with best ﬁtness) was selected in order to pre-

dict the user’s ratings on movies in the testing data set.

Also, the number of right movie recommendations

was computed. A right recommedation was computed

when the predicted rating for a movie indicated that

the user would really like that movie and the stored

rating at the database was also favorable (rating ≥ 0.5

in both cases). Similarly, a right recommendation was

also computed when the predicted rating for a movie

shown a user dislike for the movie and its stored rat-

ing was also desfavorable (rating< 0.5 in both cases).

Besides right recommendations, MAE and MSE were

also considered to select the most suitable values for

the studied parameters.

Table 1: Best results obtained during the optimization of the

crossover and population parameters.

Parameters MAE MSE Right Recom.

500-0.7-10 0.2885 0.1250 69.23%

500-0.6-10 0.3029 0.1358 63.46%

100-0.6-10 0.3413 0.1550 50.00%

400-0.7-30 0.3413 0.1550 50.00%

400-0.9-30 0.3413 0.1550 50.00%

Best results obtained during the reported study are

shown in Table 1. The selected combination of pa-

rameters was: population size=500; α value=0.7 and

parents ratio=10. The GA (using them) proposed a set

of attribute weights which reached a 69.23% rate of

right recommendations and the least MAE and MSE

at the test phase.

Secondly, mutation parameter values (mutation

probability and number of genes to be mutated) were

ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence

600

Table 2: Best results obtained during the optimization of the

mutation parameters.

Parameters MAE MSE Right Recom.

15% - 2 0,2644 0,0925 78,85%

15% - 3 0,2645 0,0974 76, 92%

15% - 1 0,2836 0,1166 69, 23%

5% - 3 0,2836 0,1166 69, 23%

10% - 3 0,2788 0,1179 69, 23%

set following the above-mentioned process. The

values studied for these parameters were: mutation

probability= {5%, 10%, 15%}; number of genes to be

mutated= {1, 2, 3}. Crossover and population param-

eter values were those previously selected ones. Ob-

tained results are shown in Table 2.

The GA approach was ﬁnally parametrized as fol-

lows: population size = 500; α value = 0.7; parents

ratio = 10; mutation probability=15% and 2 genes

to be mutated. Using these parameters, the GA pro-

posed a set of attribute weights which performed a

rate of 78, 85% of right recommendations. These at-

tribute weigths indicated that the essential movie at-

tributes for user 276 were the release date (19.32%)

and action (23.51%), comedy (19.93%) and drama

(20.73%) movie genres. In the same way, the dispens-

able movie attributes (0%) were animation, musical,

mystery, romance, science ﬁction, war western and

unknoun movie genres.

3.2 Employing Clusters Preferences

In this section the proposed GA is used to capture the

preferences of system users which are similar to a ref-

erence user, employing few ratings of this reference

user (i.e. 3 movie ratings). This approach can be used

by recommender systems to offer acceptable sugges-

tions when few starting data for a user is available.

Clusters of similar users were obtained using

Pearson correlation as a proximity measure and

center-based as a neighborhood scheme (Sarwar et al.,

2000). All the data about clusters movie rating was di-

vided into training and testing data sets. Then, the GA

was employed for getting the appropriate movie at-

tribute weights which deﬁne the preferences of these

clusters (using training data for ﬁtness evaluation).

These optimized weights were later used to estimate

reference users’ ratings on the 20 movies recently

rated by them. In this study, several experiments were

carried out using two different reference users: user

276 (518 ratings), previously employed, and user 710

(86 ratings), selected in a haphazard way.

For each reference user, his ﬁrst rated movies

where considered to obtain his cluster of similar users.

Each cluster is composed of users that have seen the

same ﬁlms and have evaluated them in a similar way

with regard to the reference user. More speciﬁcally,

clusters were formed considering three (3M) or ﬁve

(5M) reference users’ movie ratings and neighbor-

hoods of 10, 15 or 20 users (k). Six clusters were

computed for each user, but since for user 710 all clus-

ters obtained with 3M were equal to those obtained

with 5M, then less tests were performed for this user.

The optimized movie attribute weights proposed

by the GA for each computed cluster were later used

to estimate the cluster ratings on the movies in the

testing data set. The results obtained in these estima-

tions are shown in Table 3.

These appropriate movie attribute weights that de-

ﬁne the cluster preferences were also used to predict

the ratings on the 20 movies recently ranked by simi-

lar users. This study simulated the way in which real

users act in Recommender Systems. In such environ-

ments, users enter few starting information (3M or

5M in this case) but expect good recommendations.

Table 4 describes the achieved results. It should be

pointed out the excellent results which were obtained

for similar users different from the reference ones

(even 100%). These users were computed as similar

to the reference ones during the different cluster for-

mations (they also rate the same three or ﬁve movies),

but they were not included in those clusters.

Table 3: Best right movie recommendations rate (in %)

computed using cluster preferences achieved by the GA at

the testing phase.

Cluster 10k 15k 20k

276-3M 59.40 59.90 55.40

276-5M 63.96 59.34 65.16

710-3M/5M 61.09 51.80 57.68

Table 4: Right movie recommendations rate (in %) com-

puted using cluster preferences to estimate similar users’

ratings on 20 recently rated movies.

Cluster User Right User Right

276-3M-10k 276 75.00 251 80.00

276-3M-15k 276 75.00 54 90.00

276-3M-20k 276 75.00 554 83.33

276-5M-10k 276 65.00 382 60.00

276-5M-15k 276 75.00 906 100.00

276-5M-20k 276 80.00 879 90.00

710-3M/5M-10k 710 70.00 62 80.00

710-3M/5M-15k 710 70.00 533 75

710-3M/5M-20k 710 70.00 492 70

Finally, the best results for reference users were

obtained with the cluster formed for user 276 with 5

starting ratings and 20 neighbors (276-5M-20k). In

this case, the obtained attribute weights indicated that

the essential movie attributes for users similar to 276

CAPTURING USER'S PREFERENCES USING A GENETIC ALGORITHM - Determining Essential and Dispensable

Item Attributes

601

were: release date (20%), action (18.46%), comedy

(18.52%) and drama (27.65%) movie genres. The dis-

pensable movie attributes (0%) were animation, chil-

drens, crime, documentary, fantasy, ﬁlm noir, musi-

cal, mystery, war and western movie genres.

4 CONCLUSIONS

This work shows the application of a genetic algo-

rithm for capturing user preferences in Recommender

Systems. More speciﬁcally, the genetic algorithm al-

lows setting an appropriate weight for each attribute

of a product or service, thus deﬁning the preferences

of one user or a set of users over these attributes.

These obtained attribute weights permit to determine

which product or service attributes are more or less

valuables for users in their selection process when

buying products or paying for services.

All this ranking information given by the GA

could be employed by companies which want to of-

fer their clients those products which best ﬁt their de-

sired attributes. Also, as it has been shown in this

paper, the obtained weights could be employed to pre-

dict user ratings for a test set of movies with a highly

good rate of right predictions. Moreover, the GA re-

sults could be used for offering good recommenda-

tions even when there is not enough informationof the

current user. In this way, the GA could be employed

for capturing the preferences of the users of the sys-

tem that are similar to the current user, and then for

applying the learned weights in order to provide suit-

able recommendations to the current user. This aspect

is especially useful when new users enter into a Rec-

ommender System, since the genetic approach per-

mits offering acceptable recommendations even when

few starting data for a user is available.

ACKNOWLEDGEMENTS

Thanks to GroupLens Research Group to allow the

use of MovieLens data for researching purposes.

This work is partially supported by CONSOLIDER-

INGENIO 2010 under grant CSD2007-00022 and

TIN2008-04446/TIN project, which is co-funded by

the Spanish government and FEDER funds.

REFERENCES

Goldberg, D. (1991). Real-coded genetic algorithms, virtual

alphabets, and blocking. Complex Systems, 5:139–

157.

Guan, S., Ngoo, C., and Zhu, F. (2002). Handy broker: an

intelligent product-brokering agent for m-commerce

applications with user preference tracking. Electron.

Commer. Res. Appl., 1(Issues 3-4):314–330.

Herlocker, J. L., Konstan, J. A., Borchers, A., and Riedl,

J. (1999). An algorithmic framework for performing

collaborative ﬁltering. In ACM SIGIR ’99 Proceed-

ings, pages 230–237, New York, NY, USA. ACM.

Ortiz, D., Hervas, C., and Mu˜noz, J. (2001). Genetic algo-

rithm with crossover based on conﬁdence interval as

an alternative to traditional nonlinear regression meth-

ods. In ESANN’2001 Proceedings, pages 193–198,

Bruges,Belgium.

Sarwar, B., Karypis, G., Konstan, J., and Riedl, J.

(2000). Analysis of recommendation algorithms for e-

commerce. In ACM EC ’00 Proceedings, pages 158–

167, New York, NY, USA. ACM Press.

Sarwar, B. M., Konstan, J. A., Borchers, A., Herlocker,

J., Miller, B., and Riedl, J. (1998). Using ﬁltering

agents to improve prediction quality in the grouplens

research collaborative ﬁltering system. In CSCW ’98

Proceedings, pages 345–354, New York, NY, USA.

ACM Press.

Shardnand, U. and Maes, P. (1995). Social information ﬁle-

tering: Algorithms for automating “word of mouth”.

In ACM CHI’95 Proceedings, pages 210–217.

Shibata, H., Hoshiai, T., Kubota, M., and Teramoto, M. (6-

7 Nov. 2002). Agent technology recommending per-

sonalized information and its evaluation. In 2nd In-

ternational Workshop on Autonomous Decentralized

System, 2002, pages 176–183.

Valero, S., Argente, E., Botti, V., Serra, J., and Corma, A.

(2004a). A soft computing technique applied to in-

dustrial catalysis. In ECAI2004 Proceedings, pages

765–769. IOS Press.

Valero, S., Argente, E., Botti, V., Serra, J., and Corma, A.

(2004b). Soft computing techniques applied to cat-

alytic reactions. LNAI, 3040:550–559.

Valero, S., Argente, E., Botti, V., Serra, J., Serna, P., Mo-

liner, M., and Corma, A. (2009). Doe framework for

catalyst development based on soft computing tech-

niques. Comput. Chem. Eng., 33:225–238.

ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence

602