Is Products Recommendation Good? An Experiment on User Satisfaction

Jaime Wojciechowski, Rafael Romualdo Wandresen, Rafaela Mantovani Fontana,

Jo˜ao Eugˆenio Marynowski and Alexander Robert Kutzke

Professional and Technological Education Department, Federal University of Paran´a,

R. Dr. Alcides Vieira Arcoverde 1225, Curitiba, Brazil

Keywords:

Recommendation Systems, E-commerces, Collaborative Filtering, Content-based Filtering.

Abstract:

Recommendation systems may use different algorithms to present relevant information to users. In e-

commerce contexts, these systems are essential to provide users with a customized experience. Several studies

have evaluated different recommendation algorithms against their accuracy, but only a few evaluate algorithms

from the user satisfaction viewpoint. We here present a study that aims to identify how different recommenda-

tion algorithms trigger different perceptions of satisfaction on users. Our research approach was an experiment

using products and sales data from a real small retailer. Users expressed their satisfaction perception for three

different algorithms. The study results show that the algorithms proposed did not trigger different perceptions

of satisfaction on users, giving clues of improvements to small retailers websites.

1 INTRODUCTION

Recommendation systems are widely used in on-line

environments. From news to products items, ﬁltering

information is essential to provide users with a cus-

tomized experience (Liang et al., 2006; Thongpapanl

and Ashraf, 2011). In e-commerces, product recom-

mendation is part of the process needed to increase

customer loyalty and sales performance (Srinivasan

et al., 2002; Thongpapanl and Ashraf, 2011).

Several product recommendation algorithms have

been proposed and improved in the last years. Al-

though they have been extensively evaluated against

their accuracy (Bobadilla et al., 2013), accurate rec-

ommendations do not guarantee user satisfaction

(Herlocker et al., 2004). Evaluation of users aspects

on recommendation algorithms contributes to under-

standing whether the recommendations are useful and

help users to complete their tasks (Knijnenburg et al.,

2012).

Few studies compare real user satisfaction with

the recommendations of different algorithms (Knij-

nenburg et al., 2012). This study proposes a con-

tribution to the ﬁeld by answering the following re-

search question: how do different recommendation

algorithms trigger different perceptions of satisfaction

on users? We compared users perceptions for three

different algorithms and our results show that user sat-

isfaction did not differ between algorithms.

This paper is organized as follows: Section 2

presents a brief conceptual explanation for recom-

mendation algorithms; Section 3 presents related

work; and Section 4 explains how this study was con-

ducted. Sections 5 and 6 present our results and dis-

cuss ﬁndings. Finally, Section 7 concludes and sug-

gests future work.

2 RECOMMENDATION

SYSTEMS

Recommendation systems are those responsible for

ﬁltering information, mainly in online environments,

to help users ﬁnd the information they need (Isinkaye

et al., 2015). One well-known example of recom-

mendation system is the Amazon e-commerce web-

site, which suggests products to the user that were

also bought by people with similar interests (Li and

Karahanna, 2015).

These systems implement a three-step computa-

tional process to identify the items to be presented to

users: 1) the user information collection, to under-

stand the user interests; 2) the learning, which ﬁlters

and explores related items; and 3) the recommenda-

tion per se, which predicts the items the user might

prefer.

(Isinkaye et al., 2015) describes that recommen-

dation systems can use different strategies to iden-

Wojciechowski, J., Wandresen, R., Fontana, R., Marynowski, J. and Kutzke, A.

Is Products Recommendation Good? An Experiment on User Satisfaction.

DOI: 10.5220/0006316307130720

In Proceedings of the 19th International Conference on Enterprise Information Systems (ICEIS 2017) - Volume 2, pages 713-720

ISBN: 978-989-758-248-6

713

tify items to present to users – the recommendation

ﬁltering techniques – such as collaborative ﬁltering,

content-based and hybrid ones. In any way that infor-

mation is selected, recommendation systems “try to

balance accuracy, novelty, dispersity and stability in

recommendations” (Bobadilla et al., 2013, p.109).

The “content-based recommendation systems an-

alyze items descriptions to identify items that are of

particular interest to the user”(Pazzani and Billsus,

2007, p.325). Systems that use this technique imple-

ment different types of algorithms to ﬁnd similarities

between items to generate useful recommendations,

such as statistical analysis or machine learning tech-

niques (Isinkaye et al., 2015).

To understand user interests, recommendation

systems build user proﬁles based on the user’s choices

made in the past (Bobadilla et al., 2013). This proﬁle

may include a model of the user preferences, predict-

ing the probability that the user gets interested in a

speciﬁc item; or a history of user’s interaction with the

recommendation system, storing the items that users

have viewed (Pazzani and Billsus, 2007).

Another strategy to identify the items to present

to users is collaborative ﬁltering: “the process of ﬁl-

tering or evaluating items using the opinions of other

people” (Schafer et al., 2007). Users give ratings

about a set of elements (Bobadilla et al., 2013) and the

system builds predictions or recommendations based

on a database built with preferences for items by users

(Isinkaye et al., 2015). These ratings may be gathered

explicitly, when users provide their opinions; or im-

plicitly, when users’ preferences are inferred based on

users’ actions (Schafer et al., 2007).

According to (Schafer et al., 2007), collaborative

and content-based ﬁltering techniques differ in the

sense that the ﬁrst is based on the assumption that

people with similar tastes rate items similarly; and the

second assumes that items with similar features will

be rated similarly.

Hybrid approaches are possible and improve

the recommendation systems results by combining

collaborative ﬁltering and content-based techniques

(Isinkaye et al., 2015; Li and Karahanna, 2015).

(Isinkaye et al., 2015) suggests a combination by im-

plementing the algorithms separately and then com-

bining results; by using content-based ﬁltering in col-

laborative approach or vice-versa; or creating a uni-

ﬁed recommendation system with both approaches.

3 RELATED WORK

Evaluating the recommendation systems results is an

important step in the process of providing useful rec-

ommendations to users. Identifying how the recom-

mended products correspond to users’ needs enables

to gather data to improve the overall recommendation

process (Li and Karahanna, 2015).

Quality and evaluation metrics have been used

in recommendation systems research to acquire the

quality of techniques, methods, and algorithms for

predictions and recommendations (Bobadilla et al.,

2013). According to (Isinkaye et al., 2015), metrics

are classiﬁed into accuracy and coverage. While ac-

curacy veriﬁes the proportion of correct recommen-

dations over all possible recommendations, coverage

measures the fraction of the search space for which

the system is able to build recommendations.

(Herlocker et al., 2004) states that recommenda-

tion systems evaluation is difﬁcult for a number of

reasons. First, different algorithms might be better

for different data types. Second, the evaluation objec-

tives might differ (e.g., accuracy importance is giv-

ing space for customer decision support relevance).

Lastly, when comparing different algorithms, it is dif-

ﬁcult to deﬁne measurements that will provide an ef-

fective comparison.

Some researchers state that user satisfaction is the

bottom-line success measure of recommendation sys-

tems (Herlocker et al., 2004). Algorithms accuracy is

only part of the users’ experience (Knijnenburg et al.,

2012). According to (Xiao and Benbasat, 2007), trust,

usefulness and usability compose the main elements

that support the customer decision-making.

Several studies relate user satisfaction with the use

of recommendation systems (Jiang et al., 2010), with

consumer participation (Dabholkar and Sheng, 2012),

and with personalized content (Liang et al., 2006),

among others. Yet, few studies investigate how dif-

ferent algorithms generate different user perceptions

(Knijnenburg et al., 2012). This study contributes to

this ﬁeld by presenting a comparison of user satisfac-

tion among three different algorithms, as explained in

the next Section.

4 RESEARCH APPROACH

This study aims to identify how different recommen-

dation algorithms trigger different perceptions of sat-

isfaction on users. To accomplish that, we applied an

experiment as the research method, in November and

December, 2016. (Montgomery, 2009) states an ex-

periment is a test, “in which purposeful changes are

made to input variables of a process or system so that

we may observe and identify the reasons for changes

that may be observed in the output response” (Mont-

gomery, 2009, p.1). Our “input variables” are prod-

ICEIS 2017 - 19th International Conference on Enterprise Information Systems

714

uct recommendation algorithms and the “output re-

sponse” is user satisfaction.

We used three different product recommendation

algorithms in the context of a furniture e-commerce

website. The three algorithms shared the same private

database, from a real retailer website, with 270.000

records of products and sales data:

• Algorithm 1: implements a simple database query

that ﬁnds products based on the same category,

similar size and price. There is no computational

intelligence embedded.

• Algorithm 2: implements the slope one predictor

for collaborative ﬁltering, based on the descrip-

tion provided by (Lemire and Maclachlan, 2005).

We built a model from all the sales performed in

2016. In this algorithm, once we have an item

as input, we recommend other items to user that

were bought before together with the input one,

regardless of their category or other attributes.

• Algorithm 3: implements another collaborative

ﬁltering algorithm, now using Apache Mahout

(Apache, 2016). We implemented an API that

uses as input a ﬁle relating users, products, and

ratings.

Algorithm 1 restricts and focuses on showing sim-

ilar products. On the other hand, Algorithm 2 and 3

focus on showing different products, recommending

products of different types and categories based on

sales performed to other users. As we used an exten-

sive real sales database, collaborative ﬁltering algo-

rithms worked on a relevant history of products sold.

Our data analysis focused on invalidating three

null hypothesis:

• H01: There is no signiﬁcant difference in user sat-

isfaction by comparing the recommendations re-

ceived by Algorithm 1 with the ones received by

Algorithm 2;

• H02: There is no signiﬁcant difference in user sat-

isfaction by comparing the recommendations re-

ceived by Algorithm 1 with the ones received by

Algorithm 3;

• H03: There is no signiﬁcant difference in user sat-

isfaction by comparing the recommendations re-

ceived by Algorithm 2 with the ones received by

Algorithm 3;

According to the classiﬁcation given by (Her-

locker et al., 2004), our study on evaluating rec-

ommendation systems is explicit, a laboratory study,

based on the outcome and on a short-term analysis, as

explained in the next subsections.

4.1 An Explicit Evaluation

We asked users to explicitly expose their perception

of the system. We provided users with a question-

naire for them to answer questions about their satis-

faction with the recommendation received. The ques-

tions were based on (Liang et al., 2006) and asked

users:

• whether the system ﬁnds the furniture the user

wants to view;

• whether the system ﬁlters out the furniture the

user does not want;

• whether the system captures the right category

(the one that is of interest to the user);

• whether the system captures the users interests;

• whether the system ﬁnds interesting furniture ef-

ﬁciently;

• overall satisfaction;

We also asked users, in an open-ended question,

to provide us their free perception about the recom-

mendations received.

4.2 A Laboratory Study

We performed a controlled experiment, asking under-

graduate students to search for products and to eval-

uate the recommendations provided. Students were

considered because they represent a usual Internet

shopping public (Dabholkar and Sheng, 2012).

The students were randomly divided into three

groups that accessed different websites, each web-

site implemented a recommendation algorithm. They

were given a scenario in which they were starting their

software developer careers and rented a workroom.

They should consider they had enough money to buy

one desk, one chair, and one bookcase. We asked

them to search once for each of these products and

to evaluate the products recommended (with no time

restriction).

The procedure was as follows: they should enter a

text-based search in the website. Once the search was

performed, the site would present the products. The

participant should click in the chosen product and, in

another page, view this product and a list of recom-

mended items. The list was differentdependingon the

product and on the algorithm being used. The users

evaluated this list of recommended products answer-

ing questions about their satisfaction with the recom-

mendation.

Is Products Recommendation Good? An Experiment on User Satisfaction

715

4.3 An Outcome Evaluation

As stated by (Herlocker et al., 2004), we evaluated the

effect of users action on the recommendation system.

Based on the recommendations received, participants

answered each of the questions in the questionnaire in

a 5-point Likert scale.

Based on the users responses, we applied the

ANOVA statistical test to verify whether there was

difference in the mean responses. The test was ap-

plied to compare evaluations between the three algo-

rithms. If a difference was found, the Tukey test was

applied to identify which one was different. Then, the

best one was inferred by the higher mean. For the

open-ended question analysis, we applied the The-

matic Network Analysis to code, group and analyze

the answers texts (Attridge-Stirling, 2001).

4.4 A Short-term Analysis

Our analysis is based on an evaluation performed im-

mediately after users received recommendations. We

did not include any post-sales analysis.

5 RESULTS

We applied the experiment described in Section 4

with 68 undergraduate students. Their ages ranged

from 18 to 41, 86% were men and 14% were women.

To understand their experience with e-commerce

websites, we asked them how many times they had

used the Internet for shopping in the last 6 months.

We identiﬁed that 39.7% of them had bought no to

two times; 32.2% of them, three to six times; and

27.9% used e-commerce for shopping more than six

times in the last six months. We thus consider that

our respondents had enough experience with Internet

shopping to provide their opinion.

Tables 1, 2 and 3 present descriptive statistics

for the responses obtained considering the products

recommended by Algorithms 1, 2, and 3, respectively.

Next, we analyze the answers of each question by

comparing the algorithms using ANOVA and Tukey

statistical tests. Later, we present our qualitative re-

sults and hypotheses analysis.

5.1 Analysis of Questions

5.1.1 The System Finds Furniture the User

Wants to View

We asked users whether the system showed furniture

they wanted to view. The mean evaluation for Algo-

rithms 1, 2, and 3, was 3.60, 3.38, and 2.70, respec-

tively.

By executing the ANOVA test, we veriﬁed sta-

tistically signiﬁcant different means among all the

algorithms, since p was less than 0.05 (F(2, 65) =

3.75, p < 0.05). It means that algorithms triggered

different reactions from users regarding ﬁnding furni-

ture they wanted to view.

Then, we conducted the Tukey test and generated

the box plot graph of the answers. It provides data

to evaluate the statistically signiﬁcant difference be-

tween means, pair by pair.

Algorithms 1 and 2 were not signiﬁcantly different

according to the Tukey test (p > 0.1), and Algorithms

1 and 3 were signiﬁcantly different (p < 0.05). This

information is conﬁrmed by the box plot graph of the

answers to Question 1. (Figure 1).

Figure 1: Box Plot of the users’ answers regarding Question

1 and Algorithms 1, 2, and 3.

In this case, we can infer that Algorithms 2 is as

good as Algorithm 1, but Algorithm 3 is not as good

as Algorithm 1.

5.1.2 The System Filters Out Furniture the User

Does Not Want

Regarding the evaluation whether the system ﬁlters

out furniture that the user does not want, the means

for the Algorithms 1, 2, and 3 were 2.82, 2.71, and

2.83, respectively.

By executing the ANOVA test, we veriﬁed the

three algorithms were statistically similar (F(2, 65) =

0.07, p > 0.1), since p was greater than 0.1.

This result indicates that users do not feel any dif-

ference between the three algorithms regarding their

ability of ﬁltering the products that users do not want.

5.1.3 The System Captures the Right Category

The question that evaluated whether the system cap-

tures the category that is interesting to the user, the

mean of the answers for Algorithm 1 was 4.08, for

Algorithm 2 was 3.85 and for Algorithm 3 was 2.79.

ICEIS 2017 - 19th International Conference on Enterprise Information Systems

716

Table 1: Descriptive statistics of user satisfaction considering Algorithm 1.

Question Median Mean Std. Dev.

The system ﬁnds the furniture the user wants to view 4 3.60 1.19

The system ﬁlters out the furniture the user does not want 3 2.82 1.15

The system captures the right category 4 4.08 1.04

The system captures the users interests 4 3.52 0.99

The system ﬁnds interesting furniture efﬁciently 4 3.56 0.94

Overall satisfaction 3 3.10 1.15

Table 2: Descriptive statistics of user satisfaction considering Algorithm 2.

Question Median Mean Std. Dev.

The system ﬁnds the furniture the user wants to view 3 3.38 1.07

The system ﬁlters out the furniture the user does not want 3 2.71 1.18

The system captures the right category 4 3.85 1.06

The system captures the users interests 3 3.23 1.17

The system ﬁnds interesting furniture efﬁciently 4 3.52 1.07

Overall satisfaction 3 3.28 1.14

Table 3: Descriptive statistics of user satisfaction considering Algorithm 3.

Question Median Mean Std. Dev.

The system ﬁnds the furniture the user wants to view 2 2.70 1.07

The system ﬁlters out the furniture the user does not want 2.50 2.83 1.18

The system captures the right category 3 2.79 1.06

The system captures the users interests 3 2.70 1.17

The system ﬁnds interesting furniture efﬁciently 3 2.79 1.07

Overall satisfaction 2 2.50 1.14

The ANOVA test presented statistically signiﬁcant

difference between means of the answers from the

three algorithms (F(2, 65) = 7.89, p < 0.001). Then,

we conducted the Tukey test and generated the box

plot graph of the answers (Figure 2).

Figure 2: Box Plot of the users’ answers regarding Question

3 and Algorithms 1, 2, and 3.

Algorithms 1 and 2 presented no signiﬁcant dif-

ference according to the Tukey test (p > 0.1). How-

ever, Algorithms 2 and 3 were considered different

(p < 0.05); Algorithms 1 and 3 presented statistically

different means (p < 0.05), too. This information is

conﬁrmed in the box plot in Figure 2. Algorithms 1

and 2 are grouped as a, separated from the answers to

Algorithm 3, which is in group b.

We hence conclude that Algorithm 1 performed as

good as Algorithm 2 regarding evaluating whether the

system captures the right category, but Algorithm 3 is

not as good as Algorithms 1 and 2.

5.1.4 The System Captures the User’s Interests

By analyzing the responses regarding the system abil-

ity to capture users’ interests, the mean of the re-

sponses for Algorithm 1 was 3.52, for Algorithm 2,

3.23, and for Algorithm 3, 2.70.

The ANOVA test resulted in no statistically sig-

niﬁcant difference among algorithms (F(2, 65) =

3.01, p > 0.05). We thus consider that the three al-

gorithms generate a similar perception among cus-

tomers regarding capturing users’ interests.

5.1.5 The System Finds Interesting Furniture

Efﬁciently

We also analyzed whether the system efﬁciently ﬁnds

the products that are interesting to users (Question 5).

The mean of the values for Algorithm 1 was 3.56, for

Algorithm 2, 3.52, and for Algorithm 3, 2.79.

The ANOVA test showed signiﬁcant difference

among means of the answers (F(2, 65) = 3.65, p <

0.05). The Tukey test showed no signiﬁcant differ-

ence between Algorithms 1 and 2 (p > 0.1), and be-

Is Products Recommendation Good? An Experiment on User Satisfaction

717

tween Algorithms 2 and 3 (p > 0.05). However, there

was a signiﬁcant difference between Algorithms 1

and 3 (p < 0.05). The box plot graph in (Figure 3)

conﬁrms this information.

Figure 3: Box Plot of the users’ answers regarding Question

5 and Algorithms 1, 2, and 3.

We hence conclude that Algorithm 2 was as good

as Algorithm 1 regarding ﬁnding interesting furniture

efﬁciently, but Algorithm 3 is not as good as Algo-

rithm 1.

5.1.6 Overall Satisfaction

Regarding the responses for overall satisfaction eval-

uation, Algorithm 1 presented a mean of 3.17, Al-

gorithm 2 presented 3.28 and Algorithm 3 presented

2.50.

The ANOVA test showed no signiﬁcant difference

among the means of answers (F(2, 65) = 2.90, p >

0.05). We thus consider that there is no difference

between algorithms regarding the users’ overall sat-

isfaction.

5.1.7 Open-ended Question Analysis

The quantitative analysis of responses might be com-

plemented and explained by the qualitative analysis of

the open-ended question. For each algorithm, we cre-

ated a thematic network showing the codes obtained

directly from the answers. The numbers shown in the

rounded-corner rectangles show how many times the

theme appeared in the responses.

Figure 4 shows the thematic network for Algo-

rithm 1. We observed that users enjoyed receiv-

ing the recommendation of similar products (with

the same colors, prices or comprising the same cat-

egory). On the other hand, users complained that

products were too similar or equal to the one they

had searched. Algorithm 1 also presented connection

problems and, sometimes, did not recommend prod-

ucts. Users also complained that recommendations

were good for some products and bad for the oth-

ers (inconsistency). The “no sense” theme represents

users that pointed out that the recommended products

had nothing to do with their needs.

Figure 4: Thematic network of answers to open-ended ques-

tion and Algorithm 1.

When analyzing the positive and negative feed-

back for Algorithm 2 (Figure 5), we observed that

users also enjoyed receiving the recommendation of

similar products. Negative feedback was mainly re-

lated to products that made no sense to users. This ex-

periment showed an interesting result, as users com-

plained about the product search as if it were the prod-

uct recommendation. Users also pointed out that there

were recommendations that were inconsistent, as for

Algorithm 1.

Figure 5: Thematic network of answers to open-ended ques-

tion and Algorithm 2.

Algorithm 3 got positive feedback on similar

products and some positive feedback on comple-

mentary products recommendation (Figure 6). Most

users’ complaints concerned recommended products

that made no sense. There were also issues with

search – as for Algorithm 2. There were also themes

related to inconsistency and too much information

shown. Users also complained of having recommen-

dations that were not related to a category of the prod-

uct they sought.

5.2 Hypotheses Analysis

Aiming at comparing three different recommendation

algorithms regarding user satisfaction, we proposed

ICEIS 2017 - 19th International Conference on Enterprise Information Systems

718

Figure 6: Thematic network of answers to open-ended ques-

tion and Algorithm 3.

to verify three hypotheses (see Section 4). In this sec-

tion, we combine quantitative and qualitative results

to conﬁrm or not our hypotheses. Table 4 shows a

summary of the conﬁrmation (or not) about the simi-

larity among algorithms, considering the satisfaction

aspects measured.

Our null hypothesis H01 stated that there is no sig-

niﬁcant difference in user satisfaction by comparing

the recommendations received by Algorithm 1 with

the ones received by Algorithm 2. We consider we

conﬁrmed this hypothesis since, for all the questions,

there was no signiﬁcant difference between the an-

swers regarding Algorithms 1 and 2. By analyzing

our qualitative results, we observed that the main pos-

itive feedback for both algorithms was receiving sim-

ilar products as recommendation.

Our null hypothesis H02 stated that there is no sig-

niﬁcant difference in user satisfaction by comparing

the recommendations received by Algorithm 1 with

the ones received by Algorithm 3. By analyzing our

summarized results in Table 4, we see that user satis-

faction was indeed different in three aspects: ﬁnding

products the user wants to view, capturing the right

category and ﬁnding interesting products efﬁciently.

This evidence refuses our hypothesis H02.

Qualitative data show that Algorithm 1 performed

better by presenting similar products and products re-

lated to the category. Users complained about the di-

versity of the recommendations of Algorithm 3.

For the null hypothesis H03, which stated that

there is no signiﬁcant difference in user satisfaction

by comparing the recommendations received by Al-

gorithm 2 with the ones received by Algorithm 3, the

conclusion is the same as for hypothesis 2, as there is

difference when comparing Algorithm 2 to Algorithm

3. It refuses our third hypothesis.

The open-ended answers showed, however, that

both Algorithms 2 and 3 had problems when present-

ing products that made no sense to users and, also,

issues with product search and excessive information.

6 DISCUSSION

The objective of this study was to identify how differ-

ent recommendation algorithms trigger different per-

ceptions of satisfaction on users. We performed an ex-

periment with 68 undergraduate students, simulating

a three-item shopping transaction, and asking users to

evaluate the recommendations received. Users were

randomly assigned to one of three algorithms. Their

responses were statistically described and analyzed

using ANOVA and Tukey tests. All three algorithms

used the same database, with products and sales of a

small-retailer website.

We observed that, regarding overall satisfaction,

there was no difference on user satisfaction compar-

ing the three algorithms. When we tested each ques-

tion, we observed that Algorithm 1 performed better

for ﬁnding the furniture that the user wants to view,

for capturing the right category and for ﬁnding in-

teresting furniture efﬁciently. When analyzing the

open-ended question, we clearly see that users seem

to be more satisﬁed when similar products are recom-

mended, that is, our respondents did not value variety.

(Knijnenburg et al., 2012, p.450) state that “re-

searchers who do compare the user experience effects

of several algorithms ﬁnd surprising results” – and

this was our case. We observed that although more

elaborate algorithms were compared to simple ones

(without computational intelligence), there was no ef-

fect on increasing user satisfaction. Our results con-

ﬁrm that – besides the algorithm implemented – there

are other subjective aspects comprising user satisfac-

tion (Xiao and Benbasat, 2007; Knijnenburg et al.,

2012).

Literature points that cold start is an issue to be

dealt with in small retailers websites. In these con-

texts, content-based recommendations are viable so-

lutions (Kaminskas et al., 2015). Our study conﬁrms

this ﬁnding, by showing that our participants appre-

ciated receiving the recommendation of similar prod-

ucts.

We also observed that an expressive number of

participants – despite the instructions received to eval-

uate the products recommendation – also evaluated

the products search. For us, it is evidence that cus-

tomers might see search and recommendation as re-

lated functionalities. Developers should focus their

investment on improving both to increase user satis-

faction.

This study was performed in a context-speciﬁc

simulated environment, with undergraduate students,

which is a threat to validity. Although our participants

present a proﬁle similar to that of Internet shoppers,

our results should be conﬁrmed with real consumers.

Is Products Recommendation Good? An Experiment on User Satisfaction

719

Table 4: Summary of the similarity of user satisfaction perception among algorithms.

Question Algorithms considered similar

The system ﬁnds the furniture the user wants to view 1 and 2

The system ﬁlters out the furniture the user does not want 1, 2, and 3

The system captures the right category 1 and 2

The system captures the users interests 1, 2, and 3

The system ﬁnds interesting furniture efﬁciently 1 and 2

Overall satisfaction 1, 2, and 3

Nevertheless, this study still contributes to the litera-

ture on recommender systems evaluations that go be-

yond algorithmic accuracy, as claimed by (Konstan

and Riedl, 2012).

7 CONCLUSIONS

We presented the results of an experiment that aimed

at identifying how different recommendation algo-

rithms trigger different perceptions of satisfaction on

users. We tested three algorithms using the database

of a real furniture small-retailer.

Our results pointed out no signiﬁcant difference in

user satisfaction regarding the compared algorithms.

Algorithms were found to be generally similar, al-

though some difference was observed in speciﬁc is-

sues. In this case, the algorithm which showed similar

products performed better.

Future studies should focus on including a

content-based algorithm in the experiment to be com-

pared with collaborative ﬁltering algorithms. We also

plan to reduce our threat to validity, by including sat-

isfaction evaluation with on-line users in their real

context.

REFERENCES

Apache (2016). Apache Mahout: Scalable Machine Learn-

ing and Data Mining. http://mahout.apache.org/.

Attridge-Stirling, J. (2001). Thematic networks: an analytic

tool for qualitative research. Qualitative Research,

1(3):385–405.

Bobadilla, J., Ortega, F., Hernando, A., and Guti´errez, A.

(2013). Recommender systems survey. Knowledge-

Based Systems, 46:109–132.

Dabholkar, P. A. and Sheng, X. (2012). Consumer partici-

pation in using online recommendation agents: effects

on satisfaction, trust, and purchase intentions. Service

Industries Journal, 32(9):1433–1449.

Herlocker, J., Konstan, J. A., Terveen, L. G., and Riedl, J. T.

(2004). Evaluating collorative ﬁltering recommender

systems. ACM Transactions on Information Systems,

22(1):5–53.

Isinkaye, F. O., Folajimi, Y. O., and Ojokoh, B. A. (2015).

Recommendation systems : Principles , methods and

evaluation. Egyptian Informatics Journal, pages 261–

273.

Jiang, Y., Shang, J., and Liu, Y. (2010). Maximizing cus-

tomer satisfaction through an online recommendation

system: A novel associative classiﬁcation model. De-

cision Support Systems, 48(3):470–479.

Kaminskas, M., Bridge, D., Foping, F., and Roche, D.

(2015). Product recommendation for small-scale re-

tailers. Lecture Notes in Business Information Pro-

cessing, 239:17–29.

Knijnenburg, B. P., Willemsen, M. C., Gantner, Z., Soncu,

H., and Newell, C. (2012). Explaining the user expe-

rience of recommender systems. pages 441–504.

Konstan, J. and Riedl, J. (2012). Recommender systems:

From algorithms to user experience. User Modeling

and User-Adapted Interaction, 22:101–123.

Lemire, D. and Maclachlan, A. (2005). Slope One Predic-

tors for Online Rating-Based Collaborative Filtering.

Society for Industrial and Applied Mathematics, pages

471–475.

Li, S. S. and Karahanna, E. (2015). On line Recommen-

dation Systems in a B2C E-Commerce Context : A

Review and Future Directions . Journal of the Associ-

ation for Information Systems, 16(2):72–107.

Liang, T.-P., Lai, H.-J., and Ku, Y.-C. (2006). Personalized

Content Recommendation and User Satisfaction: The-

oretical Synthesis and Empirical Findings. Journal of

Management Information Systems, 23(3):45–70.

Montgomery, D. C. (2009). Design and Analysis of Exper-

iments. John Wiley and Sons Inc., New Jersey, 7th

edition.

Pazzani, M. J. and Billsus, D. (2007). Content-Based Rec-

ommendation Systems. The Adaptive Web, 4321:325–

341.

Schafer, J., Frankowski, D., Herlocker, J., and Sen, S.

(2007). Collaborative Filtering Recommender Sys-

tems. The Adaptive Web, pages 291–324.

Srinivasan, S. S., Anderson, R., and Ponnavolu, K. (2002).

Customer loyalty in e-commerce: An exploration of

its antecedents and consequences. Journal of Retail-

ing, 78(1):41–50.

Thongpapanl, N. and Ashraf, A. R. (2011). Enhancing on-

line performance through website content and person-

alization. Journal of Computer Information Systems,

Fall 2011:3–13.

Xiao, B. and Benbasat, I. (2007). E-Commerce Product

Recommendation agents :Use, Characteristics, and

Impact. MIS Quarterly, 31(1):137–209.

ICEIS 2017 - 19th International Conference on Enterprise Information Systems

720