
 
Markov Models. 
The  Dual-strategy User Interest Prediction 
Method (DUIPM) is then the method that integrates 
the  
th
-Markov Model and inter-transaction 
association rules for low values of. 
For addressing this trade-off between accuracy 
and complexity of computation, we use the 
frequency pruned Markov Model to pre-process the 
data. Referring to (Khalil et al., 2008), accuracy of 
the 1
st
 to 4
th
 FPMM is compared on four different 
databases: D1, D2, D3 and D4. Results are shown in 
Figure 1 and Table 1. As seen from Figure 1, the 2
nd
 
FPMM is more accurate than the 1
st
.  
 
Figure 1: The contrast of the 1
st
, 2
nd
, 3
rd
 and 4
th
 FPMM 
based on accuracy (in percentages). 
Table 1 shows that the 2
nd
 FPMM covers the 
data much better than the 1
st
 FPMM and is closer to 
the 3
rd
 and 4
th
. Therefore, we choose the 2
nd
 FPMM 
for our dual strategy. 
Table 1: The data coverage of the 1
st
- 4
th
 FPMM. 
 1-FP 2-FP 3-FP 4-FP 
D1 745  9162 14977 17034 
D2 502  6032 18121 22954 
D3 623  5290 11218 13697 
D4 807  7961 19032 23541 
3.1 Data Pre-processing 
Data from the web log cannot be used directly; part 
of the data is redundant, and part is not relevant for 
the computations to follow. Thus, pre-processing of 
the data is a necessary step in order to increase the 
efficiency of the algorithms. Some examples of data 
that need to be eliminated include redundant data, 
error logs, and graphical, video and audio files.  
We now use an example involving four users and 
their surfing sessions in order to show how we 
construct our Dual-Strategy Database. We show 
only the elimination of data by FPMM (redundant 
data and low frequency data). Table 2 shows the 
original database, which includes the surfing 
sessions from four users. The items in each session 
are all web pages that a specific user has visited. 
Table 3 shows explicitly which pages were visited 
by users, in which order, and including the time they 
spent on the session. Table 4 provides the frequency 
of every web page. From definition of FPMM, the 
items whose frequency is less than some minimum 
frequency value are pruned. 
Table 2: The original database. 
A,G,T,A,C,S,G,J,R,A,D,H,M,D,J 
F,D,H,N,I,J,E,A,C,D,H,M,I,J,G,M 
A,F,I,J,E,C,D,H,N,I,J,G,D,H,N,C,I,J,G,A,N 
F,L,S,D,H,N,J,Q,E,I,P,C,I,O,A,D,H,M
 
A,C,G,A,D,H,M,C,F,C,G,R,I,P,H,O,J 
A,I,J,B,A,E,C,T,D,H,M,I,Q,G
 
A,F,I,B,A,E,D,H,N,P,I,Q,F,J,D,H,N,G,C 
F,D,H,M,I,J,E,H,F,I,J,E,D,H,M,A,G,N 
F,D,H,N,J,A,D,A,E,D,J,R,H,N,G,C,F,G 
A,C,D,E,G,C,A,F,N,H,M 
Table 3: Surfing sessions for four users. 
User 
Session Time 
 
A,F,I,J,E,C,D,H,N,I,J,G,D,H,N,C,I,J,G,A,N 150s 
 
F,D,H,N,I,J,E,A,C,D,H,M,I,J,G,M 300s 
 
F,D,H,M,I,J,E,H,F,I,J,E,D,H,M,A,G,N 120s 
 
A,C,D,E,G,C,A,F,N,H,M 260s 
 
A,C,G,A,D,H,M,C,F,C,G,R,I,P,H,O,J 20s 
 
A,G,T,A,C,S,G,J,R,A,D,H,M,D,J 10s 
 
A,F,I,A,E,D,H,N,I,F,J,DH,N,G,C 40s 
 
A,I,J,A,E,C,D,H,M,I,G 50s 
 
F,D,H,N,J,A,D,A,E,D,J,H,N,G,C,F,G 30s 
 
F,D,H,N,J,E,I,C,I,A,D,M 10s 
Table 4: The frequency of each page. 
Page A B C D E F G H  I 
Freq. 18 2 13 18 9 11 13 18 14 
 
J L M N O P Q R S T 
15 1 9 11 2 3 3 3 2 2 
Assuming that the minimum confidence value is 
set to 4, web pages B, L, O, P, Q, R, S and T are 
eliminated from the database.  
When a user 
 visits some web site for the first 
time, if the parameters from web log satisfy 
(, 
,)<, we use database 
strategy 1 to create the database for predicting the 
users interest. However, if the parameters from web 
log satisfy
(
, 
,
)
> , we use 
database strategy 2 to create the database. Thus, this 
process of building the database is named Dual-
0
10
20
30
40
50
60
1-FPMM 2-FPMM 3-FPMM 4-FPMM
D1 D2 D3 D4
KDIR 2011 - International Conference on Knowledge Discovery and Information Retrieval
246