encompasses methods for optimally placing
information along the tiers of Hierarchical Storage
Management Systems. The goal is to improve their
overall performance in view of the axioms that not
all information items have the same value, and that
the value of an information resource changes over
time. Several valuation methods have been proposed
in this context, employing a rich variety of criteria
(Chen, 2005). We distinguish them into two main
categories; those based on the usage of information
resources over time, and those drawn on the
business criticality of information.
The approaches of the former category assume
that the value of information is reflected in its use,
and thus usage observed in the past is a suitable
indication for the future one. (Chen, 2005), for
example, combines it in a balanced way with its
recency, whereas (Turczyk et al., 2009) employs it
in stochastic models estimating the probability of
future use. On the whole, the output of these
approaches is a classification of information
resources into groups of intensively and slightly
used ones, thus being unsuitable for our
valuation/ranking task.
The alternative valuation in terms of business
criteria is analyzed in (Moody et al., 2009) on the
principle that information bears all the
characteristics of an asset. The authors examine the
laws that govern its behavior, and deduce the
applicability of accounting models in its evaluation.
Though theoretically well established, such
approaches are hardly put into practice, as they are
human-intensive and time-consuming.
Regarding the time-decay model, it constitutes a
common practice in the field of streams, where data
arrive at high rates, and the available resources for
processing them are limited. Data streams have,
therefore, to be summarized, with most recent data
considered as more relevant, and older ones
accounted for at a lower weight. An indicative
approach that focuses on estimating the highest
degree of stream approximation that does not reduce
the accuracy in answering continuous queries is
presented in (Cohen et al., 2003). The time-decay
model has also been used for improving the review
system of Amazon in (Wang et al., 2008).
3 VALUATION METHOD
In this section we introduce a novel method for
facilitating users of social applications in their
information quests. Our approach assigns to all
information resources a value reflecting their
likelihood of being used in the future, and then ranks
them accordingly. A list of the top resources derived
from this ranking enables users to quickly locate and
directly retrieve desired information items. It should
be stressed at this point that the size of the list
depends on the application at hand and the volume
of the information space it conveys. It is also worth
stressing that the value of each information resource
is actually based on the activity of the entire user
base (collaborating team). In other words, no
individual user profiling techniques are involved in
estimating it.
In short, our method adds an intelligent usage-
based browsing dimension to a social application.
Many content management systems are already
equipped with a similar functionality, employing
either an RSS feed or a short list embedded in their
interface. Both tools, however, merely implement
the Least Recently Used (LRU) caching algorithm,
thus ordering resources according to the time of their
last transaction (access or editing). In our opinion,
though, this plainly chronological arrangement of
resources is inadequate for predicting their future
use. A comprehensive method should additionally
take into account the degree of usage, as we
empirically prove in section 4.
3.1 Problem Formulation
We begin by formalizing the problem we are
tackling as follows: having a collection of
information items, I = {i
1, i2, ...}, together with their
observed usages over the past N transaction
batches
,
,…,
, rank them so that
the average ranking position of the items used within
the next, N+1, transaction batch is minimized.
As it is evident from the above definition, our
approach to the problem is event driven; it involves a
renewal of the ranking whenever a predefined
number of transactions, termed transaction batch, is
completed. The reason is that the alternative, time
driven methodology of periodically updating the
ranking, is unreliable, as it completely disregards the
actual traffic of the underlying application. It fails,
therefore, to refresh the ranking on time whenever
there is a traffic overload, and triggers updates even
when a time interval does not include the critical
mass of transactions for re-shuffling. On the other
hand, our approach guarantees that users are
instantly informed about active and new documents,
without even having to wait for them to be indexed.
It is also worth noting at this point, that the only
evidence considered when estimating the value of
information items is their past transactions, and their
DECAY-BASED RANKING FOR SOCIAL APPLICATION CONTENT
277