Markov Models.
The Dual-strategy User Interest Prediction
Method (DUIPM) is then the method that integrates
the
th
-Markov Model and inter-transaction
association rules for low values of.
For addressing this trade-off between accuracy
and complexity of computation, we use the
frequency pruned Markov Model to pre-process the
data. Referring to (Khalil et al., 2008), accuracy of
the 1
st
to 4
th
FPMM is compared on four different
databases: D1, D2, D3 and D4. Results are shown in
Figure 1 and Table 1. As seen from Figure 1, the 2
nd
FPMM is more accurate than the 1
st
.
Figure 1: The contrast of the 1
st
, 2
nd
, 3
rd
and 4
th
FPMM
based on accuracy (in percentages).
Table 1 shows that the 2
nd
FPMM covers the
data much better than the 1
st
FPMM and is closer to
the 3
rd
and 4
th
. Therefore, we choose the 2
nd
FPMM
for our dual strategy.
Table 1: The data coverage of the 1
st
- 4
th
FPMM.
1-FP 2-FP 3-FP 4-FP
D1 745 9162 14977 17034
D2 502 6032 18121 22954
D3 623 5290 11218 13697
D4 807 7961 19032 23541
3.1 Data Pre-processing
Data from the web log cannot be used directly; part
of the data is redundant, and part is not relevant for
the computations to follow. Thus, pre-processing of
the data is a necessary step in order to increase the
efficiency of the algorithms. Some examples of data
that need to be eliminated include redundant data,
error logs, and graphical, video and audio files.
We now use an example involving four users and
their surfing sessions in order to show how we
construct our Dual-Strategy Database. We show
only the elimination of data by FPMM (redundant
data and low frequency data). Table 2 shows the
original database, which includes the surfing
sessions from four users. The items in each session
are all web pages that a specific user has visited.
Table 3 shows explicitly which pages were visited
by users, in which order, and including the time they
spent on the session. Table 4 provides the frequency
of every web page. From definition of FPMM, the
items whose frequency is less than some minimum
frequency value are pruned.
Table 2: The original database.
A,G,T,A,C,S,G,J,R,A,D,H,M,D,J
F,D,H,N,I,J,E,A,C,D,H,M,I,J,G,M
A,F,I,J,E,C,D,H,N,I,J,G,D,H,N,C,I,J,G,A,N
F,L,S,D,H,N,J,Q,E,I,P,C,I,O,A,D,H,M
A,C,G,A,D,H,M,C,F,C,G,R,I,P,H,O,J
A,I,J,B,A,E,C,T,D,H,M,I,Q,G
A,F,I,B,A,E,D,H,N,P,I,Q,F,J,D,H,N,G,C
F,D,H,M,I,J,E,H,F,I,J,E,D,H,M,A,G,N
F,D,H,N,J,A,D,A,E,D,J,R,H,N,G,C,F,G
A,C,D,E,G,C,A,F,N,H,M
Table 3: Surfing sessions for four users.
User
Session Time
A,F,I,J,E,C,D,H,N,I,J,G,D,H,N,C,I,J,G,A,N 150s
F,D,H,N,I,J,E,A,C,D,H,M,I,J,G,M 300s
F,D,H,M,I,J,E,H,F,I,J,E,D,H,M,A,G,N 120s
A,C,D,E,G,C,A,F,N,H,M 260s
A,C,G,A,D,H,M,C,F,C,G,R,I,P,H,O,J 20s
A,G,T,A,C,S,G,J,R,A,D,H,M,D,J 10s
A,F,I,A,E,D,H,N,I,F,J,DH,N,G,C 40s
A,I,J,A,E,C,D,H,M,I,G 50s
F,D,H,N,J,A,D,A,E,D,J,H,N,G,C,F,G 30s
F,D,H,N,J,E,I,C,I,A,D,M 10s
Table 4: The frequency of each page.
Page A B C D E F G H I
Freq. 18 2 13 18 9 11 13 18 14
J L M N O P Q R S T
15 1 9 11 2 3 3 3 2 2
Assuming that the minimum confidence value is
set to 4, web pages B, L, O, P, Q, R, S and T are
eliminated from the database.
When a user
visits some web site for the first
time, if the parameters from web log satisfy
(,
,)<, we use database
strategy 1 to create the database for predicting the
users interest. However, if the parameters from web
log satisfy
(
,
,
)
> , we use
database strategy 2 to create the database. Thus, this
process of building the database is named Dual-
0
10
20
30
40
50
60
1-FPMM 2-FPMM 3-FPMM 4-FPMM
D1 D2 D3 D4
KDIR 2011 - International Conference on Knowledge Discovery and Information Retrieval
246