COMPLEX USER BEHAVIORAL NETWORKS
AT ENTERPRISE INFORMATION SYSTEMS
Peter G
´
eczy, Noriaki Izumi, Shotaro Akaho and K
ˆ
oiti Hasida
National Institute of Advanced Industrial Science and Technology (AIST)
Keywords:
Complex networks, web behavior, behavior segmentation, navigation space, knowledge workers, enterprise
systems, information services, data mining.
Abstract:
We analyze human behavior on a large-scale enterprise information system. Employing a novel framework that
efficiently captures complex spatiotemporal dimensions of human dynamics in electronic spaces we present
vital findings about knowledge workers’ behavior on enterprise intranet portal. Browsing behavior of knowl-
edge workers resembles a complex network with significant concentration on navigational starters. Common
browsing strategy utilizes the knowledge of the starting navigation point and recollection of the traversal path-
way to the target. Complex traversal network topology has a small number of behavioral hubs concentrating
and disseminating the browsing pathways. Human browsing network topology, however, does not match the
link topology of the web environment. Knowledge workers generally underutilize the available resources,
have focused interests, and exhibit diminutive exploratory behavior.
1 INTRODUCTION
Elucidation of human dynamics in electronic environ-
ments is of central importance in personalization tech-
nologies (Baraglia and Silvestri, 2007), recommender
systems (Adomavicius and Tuzhilin, 2005), and col-
laborative filtering engines (Jin et al., 2006). Cor-
porate sector has been exploring the customer web
behavior primarily for commercial purposes (Park
and Fader, 2004), (Moe, 2003) and search ranking
(Agichtein et al., 2006). Little attention has been de-
voted to the study of user behavior in enterprise in-
ternal information environments. This study presents
the scarce results of knowledge worker behavior on a
large enterprise intranet portal.
It has been reported that the individual human ac-
tions in web environments follow non-Poisson sta-
tistics characterized by the long tails (Dezso et al.,
2006), (Vazquez et al., 2006). The long tail attributes
of human dynamics (Barabasi, 2005) are equivalent
to those observed in complex networks (Newman,
2003),(Newman et al., 2005), (Caldarelli, 2007). A
common property of complex networks is that the ver-
tex connectivities follow a long tail distribution. The
long tiled power-law has been detected in the tempo-
ral characteristics of human information access on the
web (Dezso et al., 2006). Similar results have been re-
ported from workload studies of search engines and
server systems (Bedue et al., 2006),(Schroeder and
Harchol-Balter, 2006). The long tails of human in-
teractions have been modeled by power distributions
(Vazquez et al., 2006), (Vazquez, 2005), lognormal
and Pareto distributions (Downey, 2005), or Zipf dis-
tribution (Leskovec et al., 2005).
This work focuses on frequency rather than tem-
poral characteristics of human dynamics in elec-
tronic environments, and targets traversal networks of
knowledge worker intranet browsing behavior. Ap-
plying novel analytic and exploratory framework we
present valuable behavioral findings.
2 CONCEPT PRESENTATION
User browsing interactions in web environments are
reasonably represented by the clickstream sequences.
The clickstream sequences of page transitions are seg-
mented into sessions and subsequences. The ses-
sions outline tasks of various complexities, under-
taken by the users, that are further divided into the
subtasks represented by the subsequences. Segmen-
tation is done according to the users’ temporal ac-
tivity characteristics. Consider the sequence of the
form: {(p
i
, d
i
)}
i
where p
i
denotes the visited page
URL
i
and d
i
denotes a delay between the consecutive
views p
i
p
i+1
. User browsing activity {(p
i
, d
i
)}
i
is
233
Géczy P., Izumi N., Akaho S. and Hasida K. (2008).
COMPLEX USER BEHAVIORAL NETWORKS AT ENTERPRISE INFORMATION SYSTEMS.
In Proceedings of the Tenth International Conference on Enterprise Information Systems - HCI, pages 233-239
DOI: 10.5220/0001700502330239
Copyright
c
SciTePress
divided into subelements according to the periods of
inactivity d
i
satisfying certain criteria.
Definition 1. (Session, Subsequence, Train)
Let {(p
i
, d
i
)}
i
be a sequence of pages p
i
with delays
d
i
between consecutive transitions p
i
p
i+1
.
Browsing session is a sequence B = {(p
i
, d
i
)}
i
where
each d
i
T
B
. Length of the browsing session is |B|.
Browsing session is often referred to simply as a ses-
sion.
Subsequence of an individual browsing session B is a
sequence S = {(p
i
, d p
i
)}
i
where each delay d p
i
T
S
,
and {(p
i
, d p
i
)}
i
B. The length of subsequence is
|S|.
A browsing session B = {(S
i
, ds
i
)}
i
thus consists of a
train of subsequences S
i
separated by inactivity de-
lays ds
i
.
Important issue is determining the appropriate val-
ues of T
B
and T
S
that segment the user activity into
sessions and subsequences. The former research
(Catledge and Pitkow, 1995) indicated that student
browsing sessions last on average 25.5 minutes. How-
ever, we adopt the average maximum attention span
of 1 hour as a value for T
B
. If the user’s browsing ac-
tivity was followed by a period of inactivity greater
than 1 hour, it is considered a single session, and the
following activity comprises the next session.
Value of T
S
is determined dynamically and com-
puted as an average delay in a browsing session:
T
S
=
1
N
N
i=1
d
i
. If the delays between page views are
short, it is useful to bound the value of T
S
from below.
This is preferable in environments with frame-based
and/or script generated pages where numerous logs
are recorded in a rapid transition. Since our situation
contained both cases, we adjusted the value of T
S
by
bounding it from below by 30 seconds:
T
S
= max
30,
1
N
N
i=1
d
i
!
. (1)
Using these primitives we define navigation space
and subspace as follows.
Definition 2. (Navigation Space and Subspace)
Navigation space is a triplet G = (P , B, S ) where P
is a set of points (e.g. URLs), B is a set of browsing
sessions, and S is a set of subsequences.
Navigation subspace of G is a space A = (D,H,K)
where D P , H B, and K S; denoted as A G .
Separation of subspaces within a navigation space
reflects the nature of detected or defined sequences.
For example, a human navigation space consists of
human generated sequences, and a machine naviga-
tion space may contain only the machine generated
sequences. Different spaces may have distinctly dif-
ferent characteristics.
Important aspect to observe in human browsing
behavior is to identify the starting and attracting
points in navigation space, as well as the single user
actions.
Definition 3. (Starter, Attractor, Singleton)
Let G = (P, B, S ) be a navigation space and
B = {(S
i
, ds
i
)}
M
i
, B B, be a browsing session, and
S = {(p
k
, d p
k
)}
N
k
, S S, be a subsequence.
Starter is the first point of an element of subsequence
or session with length greater that 1, that is, p
1
P
such that there exist B B or S S where |B| > 1 or
|S| > 1 and (p
1
, d
1
) B or (p
1
, d p
1
) S.
Attractor is the last point of an element of subse-
quence or session with length greater that 1, that is,
p
N
P or p
M
P such that there exist B B or
S S where |B| > 1 or |S| > 1 and (p
M
, d
M
) B or
(p
N
, d p
N
) S.
Singleton is a point p P such that there exist B B
or S S where |B| = 1 or |S| = 1 and (p, d) B or
(p, d p) S.
The starters refer to the initial navigation points of
users, whereas the attractors denote the users’ targets.
The singletons relate to the single user actions such
as use of hotlists (e.g. history or bookmarks) (Thakor
et al., 2004).
Page traversal network may contain points that are
occasionally accessed and also points concentrating
traffic—hubs. Hubs have larger incoming and out-
going spectrum of navigational choices. To quantify
a variety of navigational pathways that lead into and
out of a point, we define the in and out degrees.
Definition 4. (In and Out Degrees)
Let p
i
P be a point in a navigation space
G = (P , B, S) such that there exists B B where
|B| > 1 and (p
i
, d
i
) B.
In degree of a point p
i
is the cardinality of a set of all
preceding points p
i1
in sessions; p
i1
p
i
, denoted
as:
In(p
i
) = |{p
i1
|(p
i1
, d
i1
) B (p
i
, d
i
) B}|.
Out degree of a point p
i
is the cardinality of a set of
all following points p
i+1
in sessions; p
i
p
i+1
, de-
noted as:
Out(p
i
) = |{p
i+1
|(p
i+1
, d
i+1
) B (p
i
, d
i
) B}|.
The in degree of a point reflects the variety of
choices from which the users access it. The point’s
out degree represent the spectrum of branches from it
that users utilize. Note that the defined in and out
degrees delineate browsing behavior characteristics
rather than the number of links pointing to and out of
a given point. Some pathways might not be exploited
by the users, or users may choose to utilize hotlists at
a given browsing stage. The human browsing behav-
ior hubs in the navigation space may differ from the
link hubs.
ICEIS 2008 - International Conference on Enterprise Information Systems
234
3 INFORMATION SYSTEM CASE
STUDY
The information system investigated in this study is
the large-scale intranet portal of The National Insti-
tute of Advanced Industrial Science and Technology.
The core comprises of six servers connected to the
high-speed backbone in a load balanced configura-
tion. The accessibility is provided via wide ranging
connectivity options (from high-speed optical to wire-
less) accommodating several platforms (up to mobile
devices). The portal provides extensive range of web
services and documents vital to the organization (Ta-
ble 1). The rich intranet services support business
processes for management, accounting and adminis-
tration, research cooperation with industry and other
institutes, and resource localization; but also bulletin
boards and networking within organization. The in-
stitute has a number of branches throughout the coun-
try, thus several services and resources are distrib-
uted. Visible web space exceeded 1 GB, and deep
web space was substantially larger, but difficult to es-
timate due to the decentralized architecture and vary-
ing back-end data.
Table 1: Case study data information.
Data Volume 60 GB
Average Daily Volume 54 MB
Number of Servers 6
Number of Log Files 6814
Average File Size 9 MB
Time Period 3/2005 - 4/2006
Log Records 315 005 952
Clean Log Records 126 483 295
Unique IP Addresses 22 077
Services 855
Unique URLs 3 015 848
Scripts 2 855 549
HTML Documents 35 532
PDF Documents 33 305
DOC Documents 4 385
Others 87 077
Sessions 3 454 243
Unique Sessions 2 704 067
Subsequences 7 335 577
Unique Subsequences 3 547 170
Valid Subsequences 3 156 310
Unique Valid Subsequences 1 644 848
Users 10 000
The majority of the enterprise portal users were
skilled knowledge workers. Significant traffic on the
portal resulted in a large web log data pool. The traf-
fic was both human and machine generated, thus the
data required cleaning. The data preparation, process-
ing, filtering, and segmentation to sessions and subse-
quences are described in (G
´
eczy et al., 2007). The ini-
tial data cleaning eliminated most of the machine gen-
erated traffic, however, further filtering was needed
after subsequence extraction. It is noticeable that the
data cleaning and filtering reduced the number of log
records by 59.85%, as well as the number of unique
valid subsequences by 53.6%.
4 BROWSING BEHAVIOR
ANALYSIS
By analyzing the point characteristics we infer several
relevant observations. The point characteristics of a
navigation space highlight the initial and the terminal
targets of knowledge worker activities, and also the
single-action behaviors. Analysis demonstrates the
applicability and usefulness of the approach.
It is evident that knowledge worker navigation
space is substantially smaller, with respect to the
essential navigation points, than the observed com-
plete navigation space. The unique valid sets of
starters (115770), attractors (288075), and singletons
(57 894) are very small in comparison to the set of
unique URLs (3015848) in the navigation space (see
Table 1 and Table 2). The largest set, unique valid at-
tractors, is only 9.55% of unique URLs. Unique valid
starters and singletons represent only approximately
3.84% and 1.92% of unique URLs, respectively.
Browsing behavior of knowledge workers resem-
bles the complex networks. Topology of knowledge
worker navigation space clearly corresponds to the
complex network. Characteristic feature of complex
networks is a long tailed distribution of the in and out
degrees of the nodes. Histograms of in and out de-
grees of starters and attractors distinctly display long
tail characteristics–with small number of high fre-
quency elements gradually progressing to the large
number of low frequency elements (Figure 1 and 2).
The network of starting navigation points as well as
the network of users’ targets are both complex net-
works. Certain points in the navigation space concen-
trate the human web traffic and serve as hubs.
Knowledge workers’ browsing behavior concen-
trates on the navigational starters. Starters are the
major concentration points of the users’ complex nav-
igational network. They are the main hubs. There
are approximately one hundred primary starter hubs
and three hundred primary attractor hubs. These one
COMPLEX USER BEHAVIORAL NETWORKS AT ENTERPRISE INFORMATION SYSTEMS
235
Table 2: Statistics for starters, attractors, and singletons.
Starters Attractors Singletons
Total 7 335 577 7 335 577 1 326 954
Valid 2 392 541 2 392 541 763 769
Filtered 4 943 936 4 943 936 563 185
Unique 187 452 1 540 093 58 036
Unique Valid 115 770 288 075 57 894
hundred primary starters constitute 0.086% of unique
valid starters, and three hundred primary attractors ac-
count for 0.1% of unique valid attractors. Thus the
ratio between the primary starter and attractor hubs
is approximately one to three. This one-to-three ra-
tio approximately holds also between the numbers of
unique valid starters (115 770) and attractors (288
075) see Table 2.
Figure 1: Histograms and quantiles of starter: a) in degrees,
b) out degrees. Right y-axis contains a quantile scale. X-
axis is in a logarithmic scale.
The initial navigation points primarily dissemi-
nate the knowledge worker browsing pathways. The
starters disperse the navigation more than the attrac-
tors. This is evident from the quantification of the
in and out degrees of the major starters and attrac-
tors. In and out degrees of starters range from one to
over twenty thousand. Range of attractor in degrees
(1 to about 6800) and out degrees (1 to about 3400) is
approximately three to six times lower, respectively.
Top ten starters (approximately 0.0086% of unique
valid starters) have in and out degrees ranging from
five thousand to over twenty thousand (Figure 1).
Compound in and out degrees of top thirty starter
hubs (approximately 0.026% of unique valid starters)
represented approximately 20% of total starter in and
out degrees.
Knowledge workers are more behaviorally diverse
in reaching their targets than proceeding to the start-
ing points of the following sub-tasks. The attractors’
in degree range is two times greater than the out de-
gree range (refer to Figure 2). Thus the users employ
approximately two times more arriving pathways to
the targets than the departing ones. They are more di-
verse in reaching the targets than proceeding to the
following navigation points of the consequent sub-
tasks. Only approximately top twenty attractors have
in and out degrees greater than one thousand. Dis-
crepancies between their in degrees are greater than
between their out degrees.
Variability of arriving and departing pathways to
and from starters is relatively balanced. Both, in and
out degrees of starters extend to approximately 20000
(Figure 1). The in and out degree ranges of starters
are significantly greater than the attractor ranges (see
Figures 1 and 2). Hence the users have richer traversal
repertoire when reaching and leaving the initial navi-
gation points rather than the targets.
Knowledge workers utilized a small spectrum
of starting navigation points and targeted relatively
small number of resources during their browsing. The
set of unique valid starters (115770), i.e. the initial
navigation points of knowledge workers’ (sub-)goals,
was approximately 3.84% of total navigation points
(see Tables 1 and 2). Although the set of unique valid
attractors (288075), i.e. (sub-)goal targets, was ap-
proximately three times higher than the set of initial
navigation points, it is still relatively minor portion
ICEIS 2008 - International Conference on Enterprise Information Systems
236
Figure 2: Histograms and quantiles of attractor: a) in de-
grees, b) out degrees. Right y-axis contains a quantile scale.
X-axis is in a logarithmic scale.
(approximately 9.55% of unique URLs). Knowledge
workers initiated their browsing experiences from a
small number of navigation points and aimed at rela-
tively few resources.
Few resources were perceived of value to be book-
marked. Number of unique single user actions was
minuscule. Single actions, such as use of hotlists
(Thakor et al., 2004), followed by delays greater than
1 hour are represented by the singletons. Unique valid
singletons (57894) accounted for only 1.92% of nav-
igation points (see Tables 1 and 2). The number of
singletons is approximately two times lower than the
number of starters and almost ve times lower than
the number of attractors (Table 2). If only small num-
ber of starters and/or attractors were perceived useful,
there is a possibility that they were bookmarked and
accessed directly in the future browsing experiences.
Knowledge workers had focused interests and ex-
hibited minuscule exploratory behavior. A narrow
spectrum of starters, attractors, and singletons was
frequently used. The histograms and quantile char-
acteristics of starters, attractors, and singletons (see
Figure 3: Histograms and quantiles: a) starters, b) attrac-
tors, and c) singletons. Right y-axis contains a quantile
scale. X-axis is in a logarithmic scale.
Figure 3) indicate that higher frequency of occur-
rences is concentrated to relatively small number of
elements. Approximately ten starters and singletons,
and fifty attractors were very frequent. About one
hundred starters and singletons, and one thousand at-
tractors were relatively frequent. The quantile analy-
sis in Figure 3 reveals that ten starters (0.0086%
COMPLEX USER BEHAVIORAL NETWORKS AT ENTERPRISE INFORMATION SYSTEMS
237
of unique valid starters) and singletons (0.017% of
unique valid singletons), and fifty frequent attrac-
tors (0.017% of unique valid attractors) accounted for
about 20% of total occurrences. One hundred starters
(0.086% of unique valid starters) and one thousand at-
tractors (0.35% of unique valid attractors) constituted
about 45% and 48% of total occurrences, respectively.
Analogously, one hundred twenty singletons (0.21%
of unique valid singletons) compounded to about 37%
of total occurrences.
Knowledge workers were generally more familiar
with the starting navigation points rather than the tar-
gets. Smaller number of starters repeats substantially
more frequently than the adequate number of attrac-
tors. That is, the users knew where to start and were
familiar with the navigational path to the target (in-
stead of just utilizing shortcuts such as bookmarks).
In and out degrees of frequent starters are also signif-
icantly higher than those of attractors (see Figures 1
and 2). The frequent starters have in and out degrees
between 5000 and 20000, whereas the frequent attrac-
tor in degrees are between 1000 and 6800, and out
degrees between 1000 and 3400.
Complex networks of knowledge worker browsing
behavior differ from the web topology constituted by
links. Hubs in the web topology are the pages with
large number of incoming and outgoing links. Behav-
ioral hubs are the navigation points that have large in
and out degrees resulting from the user traversal
patterns. It has been discovered that the behavioral
hubs in the knowledge worker navigation space did
not substantially match the link hubs. High out de-
grees of behavioral hubs (reaching almost 7000) also
significantly exceed the number of links on the served
pages at any given time.
5 CONCLUSIONS
We introduced a novel analytic framework for explo-
ration and modeling of human browsing behavior in
electronic environments. It utilizes a temporal seg-
mentation of browsing activity. The framework was
applied to browsing behavior analysis of the knowl-
edge workers on a large enterprise information sys-
tem. Numerous vital behavioral features have been
revealed. Knowledge worker browsing behavior con-
centrated on the navigational starters. They remem-
bered the starting point and recalled the navigational
path to the target. The knowledge workers effectively
utilized only a small amount of available resources.
A large number of resources have been occasionally
accessed.
Topology of knowledge worker traversal path-
ways resembles complex networks. However, the be-
havioral complex network differs from the hypertext
link network. The traversal hubs do not identically
correspond to the link hubs. Significant long tail char-
acteristics of the essential navigation points have been
exposed both in terms of frequencies as well as in and
out degrees.
REFERENCES
Adomavicius, G. and Tuzhilin, A. (2005). Toward the
next generation of recommender systems: A survey
of the state-of-the-art and possible extensions. IEEE
Transactions on Knowledge and Data Engineering,
17:734–749.
Agichtein, E., Brill, E., and Dumais, S. (2006). Improv-
ing web search ranking by incorporating user behav-
ior information. In Proceedings of The 29th SIGIR,
pp. 19–26, Seattle, Washington, USA.
Barabasi, A.-L. (2005). The origin of bursts and heavy tails
in human dynamics. Nature, 435:207–211.
Baraglia, R. and Silvestri, F. (2007). Dynamic personaliza-
tion of web sites without user intervention. Commu-
nications of the ACM, 50:63–67.
Bedue, C., Baeza-Yates, R., Ribeiro-Neto, B., Ziviani, A.,
and Ziviani, N. (2006). Modeling performance-driven
workload characterization of web search systems. In
Proceedings of CIKM, pp. 842–843, Arlington, USA.
Caldarelli, G. (2007). Scale-Free Networks: Complex Webs
in Nature and Technology. Oxford University Press,
Cambridge, UK.
Catledge, L. and Pitkow, J. (1995). Characterizing browsing
strategies in the world wide web. Computer Networks
and ISDN Systems, 27:1065–1073.
Dezso, Z., Almaas, E., Lukacs, A., Racz, B., Szakadat, I.,
and Barabasi, A.-L. (2006). Dynamics of information
access on the web. Physical Review, E73:066132(6).
Downey, A. (2005). Lognormal and pareto distributions in
the internet. Computer Communications, 28:790–801.
G
´
eczy, P., Akaho, S., Izumi, N., and Hasida, K. (2007). Us-
ability analysis framework based on behavioral seg-
mentation. In Psaila, G. and Wagner, R., Eds., Elec-
tronic Commerce and Web Technologies, pp. 35–45,
Springer-Verlag, Heidelberg.
Jin, R., Si, L., and Zhai, C. (2006). A study of mixture mod-
els for collaborative filtering. Information Retrieval,
9:357–382.
Leskovec, J., Kleinberg, J., and Faloutsos, C. (2005).
Graphs over time: Densification laws, shrinking di-
ameters and possible explanations. In Proceedings of
KDD, pp. 177–187, Chicago, Illinois, USA.
Moe, W. (2003). Buying, searching, or browsing: Differen-
tiating between online shoppers using in-store naviga-
tional clickstream. Journal of Consumer Psychology,
13:29–39.
ICEIS 2008 - International Conference on Enterprise Information Systems
238
Newman, M. (2003). The structure and function of complex
networks. SIAM Review, 45:167–256.
Newman, M., Barabasi, A.-L., and Watts, D. (2005).
The Structure and Dynamics of Complex Networks.
Princeton University Press, Princeton, N.J.
Park, Y.-H. and Fader, P. (2004). Modeling browsing behav-
ior at multiple websites. Marketing Scien ce, 23:280–
303.
Schroeder, B. and Harchol-Balter, M. (2006). Web servers
under overload: How scheduling can help. ACM
Transactions on Internet Technology, 6:20–52.
Thakor, M., Borsuk, W., and Kalamas, M. (2004). Hotlists
and web browsing behavior–an empirical investiga-
tion. Journal of Business Research, 57:776–786.
Vazquez, A. (2005). Exact results for the barabasi
model of human dynamics. Physical Review Letters,
95:248701(6).
Vazquez, A., Oliveira, J., Dezso, Z., Goh, K.-I., Kondor,
I., and Barabasi, A.-L. (2006). Modeling bursts and
heavy tails in human dynamics. Physical Review,
E73:036127(19).
COMPLEX USER BEHAVIORAL NETWORKS AT ENTERPRISE INFORMATION SYSTEMS
239