tourism, there is a big part of users coming to buy
product without searching (usually a single popular
event) and never come back (or at least we cannot
identify their return by cookies). Moreover purchase
of such product is not connected to registration and
we do not get any information about these
customers.
Our focus is on users which are searching for a
more expensive product, return several times, open
details to several offers (we can assume that they
behave similarly on competing web shops). These
users form a quite small fraction of portal visitors
(let us call them target group) and from those only a
very small fraction purchases a product.
Nevertheless, in our domain, a purchase is not an
every day event, it usually appears only once-twice a
year per customer (and hence for him/her it is quite
important to make a good choice).
1.3 The Goal and Contributions
From the above we can summarize:
- We do not have here any information about
content of purchased objects; we have only
information about user behavior
- Our target group in this research are users
which visit / display several objects
- The only preference indicator is purchases
- We would like to improve recommendation on
our target group
Goal of this paper is to check whether mere data
on users’ (implicit) behavior are sufficient for any
business relevant conclusions about user
preferences.
We are able to show that our methods improved
quality of recommendation based solely on user
behavior data.
Main contributions of the paper are:
- Models, methods and experimental tools for
learning user preference from behavioral data
- Experiments on real production data and order
sensitive metrics showing improvement of
recommendation
- Report on collection of time dependent user
behavior data for future research
2 DATA, MODEL, METHODS
In this chapter we describe our application domain
(which influences the formal model) and problem
formulation.
To protect our data source from disclosing
business relevant data, all results in this paper are
only relative portions of measured phenomenon
(relativized to maximal value). Offline experiments
were provided with unrelativized real production
data.
2.1 Implicit Factors Describing User
Behavior
In our situation, as described above, we have users
identified per cookies. We have two possibilities;
either to require explicit or implicit feedback.
Explicit feedback forces users to additional activities
beyond their normal search behavior (Kelly and
Teevan, 2003). Following natural user interaction
and collecting implicit feedback with system is
possible through new browser technologies. Data
collected on the client side can be (asynchronously)
stored on the server side. Kelly and Teevan, 2003,
argue: as large quantities of implicit data can be
gathered at no extra cost to the user, they are
attractive alternatives.
Table 1: Example of entries of the dataset, here implicit
factors are abbreviated as follows: userID = uID, Object
ID = OID, Purchase = Pur, Pageview = Page, scroll = scr,
timeOnPage = timeOP, mouseMoves = moMo,
openFromList = opFL.
uID OID Pur Page scr timeOP moMo opFL
Id1 56 1 2 0 77 100 0
Id2 164 1 3 28 414 900 0
Id3 74 0 1 3 2 0 0
Id4 1990 0 1 0 160 20 1
In our system, we follow only users from our
target group. We collect data in following structure
(F
i
’s are called implicit factors):
userID,objectId,purchase,F
1
=pageView,F
2
=scroll,
F
3
=timeOnPage,F
4
=mouseMoves,F
5
=openFromList
Data are collected incrementally, that is after a
certain period (depending on the attribute) database
entry is appropriately increased. We collect data per
user and object (see example in Table 1).
Dependence between number of page views and
purchases is illustrated in Figure 1.
In general a point in data cube (representing user
behavior) is of form
(b
1
ui
, …, b
5
ui
) D
F
(1)
Because these are explanation variables, we try
to show that purchase is a dependent variable.