groups (Bezdek, J., 1981). To show the viability of
our approach we applied upper similarity
approximation to cluster clickstream transactions.
In this paper, we present an agglomerative
clustering approach using upper similarity
approximation for mining clickstream data.
Clickstream is a sequence of URLs browsed by a
user within a particular website in one session. To
discover the pattern of groups of users with similar
interest and motivation for visiting that particular
website can be found by clustering users’
clickstream on a particular website. A user session is
the clickstream of page views for a single user in the
website. We considered each user session as a
clickstream transaction, which contains the sequence
of URLs (or hyperlinks) of a visitor visiting a web
site.
A lot of research has been done in the area of
Web Usage Mining (Cooley, R., 2000, Spiliopoulou,
1999, Manco, G et.al., 2003) which directly or
indirectly addresses the issues involved in the
extraction of web navigational patterns (
Spiliopoulou, M. and Faulstich, L. C., 1999),
ordering relationships (Mannila, H. and Meek, C.,
2000), prediction of web surfing behavior ( Pitkow,
J and Pirolli, P., 1999), and clustering of web usage
sessions (Fu . et. al , 2000) based on web logs,
possibly supplemented by web content or structure
information. Perkowitz and Etzioni (Perkowitz and
Etzioni, 2000) proposed the idea of optimizing the
structure of web sites based on co-occurrence
patterns of pages within usage data for the site.
Spiliopoulou and Cooley (Spiliopoulou, 1999;
Cooley, R., 2000) have applied data mining
techniques to extract usage patterns from web logs,
for the purpose of deriving market intelligence.
Well-developed mining techniques cannot be
applied directly for web data as web logs being
unstructured in nature. Clustering in web mining
faces several additional challenges (Jhoshi, A. and
Krishnapuram , R., 1998).The specific problem of
web usage clustering has been studied over the past
few years. In (De and Radha Krishna, 2002),
automatic personalization of a web site from user
transactions using fuzzy proximity relations is
presented. In (De and Radha Krishna, 2004), a
clustering algorithm is presented using rough
approximation to cluster web transactions from web
access logs. Web clusters tends to have fuzzy
boundaries. It is likelihood that an object may be a
candidate for more than one clusters. To deal with
the special challenges found in web usage data a
non-conventional clustering approach using rough
set theory has been presented in (Hogo, M et al.
,2004). Pawan Lingras (Lingras, P., 2003) has used
rough set theory for web mining clustering.
The rest of the paper is organized as follows:
section 2 describes the basics of rough set theory. In
section 3, we present an approach for grouping
clickstream using upper similarity approximation.
Experimental results are presented in section 4 and
we conclude in section 5.
2 ROUGH SET THEORY
Zdzisław Pawlak introduced Rough set theory
(Pawlak ,1982) to deal with uncertainty and
vagueness. Rough set theory became popular among
scientists around the world due to its fundamental
importance in the field of artificial intelligence and
cognitive sciences. This section provides a brief
summary of the concepts of rough set theory. The
building block of rough set theory is an assumption
that with every set of the universe of discourse we
associate some information in the form of data and
knowledge.
Let U denote a universe and let R ⊆ U × U be a
equivalence relation on U. The pair A = ( U,R ) is
called an approximation space. The equivalence
relation R partitions the set U into disjoint subsets.
Such a partition of the universe is denoted by
U/R = ( E
1
,E
2
,E
3
,….,E
n
) , where E
i
is an equivalence
class of R.. If two elements u, v ∈ U belong to the
same equivalence class E ⊆ U/R, we say u,v are
indistinguishable. The equivalence classes of R are
called the elementary or atomic sets in the
approximation space A = ( U,R).
Within the same equivalence class it is not
possible to differentiate the elements. Hence, one
may not get a precise representation for an arbitrary
set X ⊆ U in terms of elementary sets in A. Rather
its upper and lower bounds may represent the set X.
Lower approximation A
(X) is union of all the
elementary sets which are subsets of X.
A(
X) = { x ∈ U : ( x ) ⊆ X }
The upper bound ⎯A(X) is union of all the
elementary sets that have a non empty intersection
with X.
⎯A(X) = { x ∈ U : ( x ) ∩ X ≠ φ}
The pair (A(
X) ,⎯A(X) ) is the representation of an
ordinary set of X in the approximation space
A = ( U, R) or simply the rough set of X. Fig 1
illustrate the rough set approximation.
ICEIS 2005 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS
316