principled way to do it? Perhaps the easiest answer
to this question is to impose a metric on the
underlying graph and classify the label by adopting
the labels present on its nearest neighbor. There are
a variety of metrics to choose from (e.g., shortest
distance, commute time or electrical resistance,
etc.), but most of these are expensive to compute,
especially for large graphs. Furthermore,
conceptually simple ones like shortest distance have
undesirable properties; for example, they do not take
into account the number of paths between the
labeled and unlabeled nodes. Adsorption provides
an intuitive, iterative, manner in which to propagate
labels in a graph.
The first step is setting up the problem in terms
of a graph. For the news story classification task, the
embedding is straightforward: each story is a node
in the graph, and the weights of the edges between
nodes represent the similarity between two news
stories. The similarity is computed via the MIN-
HASH/LSH distance described previously; if there
is a collision via the LSH procedure, then an edge
exists and the weights is non-zero and positive. In
the simplest version of the algorithm, the stories that
are labeled, are labeled with a single category. The
remaining nodes, those to be labeled, will gather
evidence of belonging to each of the seven classes
as Adsorption is run. At the end of the algorithm, for
each node, the class with the largest accumulated
evidence is assigned to the node (and therefore the
news story). When designing a label propagation
algorithm in this framework, there are several
overarching, intuitive, desiderata we would like to
maintain. First, node v should be labeled l only
when there are short paths, with high weight, to
other nodes labeled l. Second, the more short paths
with high weight that exist, the more evidence there
is for l. Third, paths that go through high-degree
nodes may not be as important as those that do not
(intuitively, if a node is similar to many other nodes,
then it being similar to any particular node may not
be as meaningful). Adsorption is able to capture
these desiderata effectively.
Next, we present Adsorption in its simplest
form: iterated label passing and averaging. We will
also present an equivalent algorithm, termed
Adsorption-RW, that computes the same values, but
is based on random walks in the graphs. Although
not presented in this paper, Adsorption can also be
defined as a system of linear equations in which we
can express the label distribution at each vertex as a
convex combination of the other vertices. Our
presentation follows our prior work (Baluja et al.,
2008), which also includes additional details. These
three interpretations of the Adsorption algorithm
provide insights into the computation and direct us
to important practical findings; a few will be briefly
described in Section 3.3.
Figure 2: Basic adsorption algorithm.
3.1 Adsorption via Averaging
In Adsorption, given a graph where some nodes
have labels, the nodes that carry some labels will
forward the labels to their neighbors, who, in turn,
will forward them to their neighbors, and so on, and
all nodes collect the labels they receive. Thus each
node has two roles, forwarding labels and collecting
labels. The crucial detail is the choice of how to
retain a synopsis that will both preserve the essential
parts of this information as well as guarantee a
stable (or convergent) set of label assignments.
Formally, we are given a graph
),,( wEVG =
where
V denotes the set of vertices (nodes),
E denotes the set of edges, and R→Ew : denotes
a nonnegative weight function on the edges. Let
L denote a set of labels, and assume that each node
v in a subset
VV
L
⊆
carries a probability
distribution
v
L on the label set L . We often refer to
L
V
as the set of labeled nodes. For the sake of
exposition, we will introduce a pre-processing step,
where for each vertex
L
Vv
, we create a “shadow”
vertex
v
with exactly one out-neighbor, namely
,
connected via an edge
),
( vv ; furthermore, for each
L
Vv
, we will re-locate the label distribution
v
L
from
v to v
~
, leaving v with no label distribution.
Let
V
denote the set of shadow vertices,
}
L
VvvV ∈= |
~
~
. From now on, we will assume that
at the beginning of the algorithm, only vertices in
V
have non-vacuous label distributions. See Figure
2 for the full algorithm.
Some comments on Adsorption: (1) In the
algorithm, we say that convergence has occurred if
the label distribution of none of the nodes changes
in a round. It can be shown that the algorithm
Al
orithm Adsorption:
Input:
L
VLwEVG ,),,,(
repeat
for each
VVv
∪∈ do:
Let
u
uv
LvuwL ),(
end-for
Normalize
v
L to have unit
1
L norm
until convergence
Output: Distributions
}|{ VvL
v
∈
TEXT CLASSIFICATION THROUGH TIME - Efficient Label Propagation in Time-Based Graphs
177