Here, line 1 is executed in parallel, making use of
all p nodes at once, and is the source of parallelism of
the algorithm. On line two, Reduce applies the func-
tion that is its first argument cummulatively to the se-
quence that is its second argument, so that it effec-
tively merges P
1
with P
2
, followed by merging that
result with P
3
, etc.
We note that other divide-and-conquer schemes
are also possible, such as one where the merging in
line 2 of Algorithm 5 happens on pairs of decompo-
sitions coming from approximately the same number
of input documents, so that the two sets of singular
values are of comparable magnitude. Doing so could
lead to improved numerical properties, but we have
not investigated this effect yet.
The algorithm is formulated in terms of a (poten-
tially infinite) sequence of jobs, so that when more
jobs arrive, we can continue updating the decomposi-
tion in a natural way. The whole algorithm can in fact
act as a continuousdaemon service, providing decom-
position of all the jobs processed so far on demand.
3 EXPERIMENTS
In this section, we describe a set of experiments mea-
suring numerical accuracy of the proposed algorithm.
In all experiments, the decay factor γ is set to 1.0,
that is, there is no discounting in favour of new obser-
vation. The number of requested factors is k = 200 in
all cases.
3.1 Set Up
We will be comparing four implementations for par-
tial Singular Value Decomposition:
SVDLIBC A direct sparse SVD implementation due
to Douglas Rohde
2
. SVDLIBC is based on the
SVDPACK package by Michael Berry (Berry,
1992). We use the LAS2 routine (Lanczos of the
related implicit A
T
A or AA
T
matrix with selective
orthogonalizations) to retrieve only the k domi-
nant singular triplets. The implementation works
in-core and therefore doesn’t scale.
ZMS Implementationof the incremental one-pass al-
gorithm from (Zha et al., 1998). The right singular
vectors and their updates are completely ignored
so that our implementation of their algorithm also
realizes subspace tracking.
DLSA Our proposed method. We will be evalu-
ating three different versions of merging, Algo-
rithms 1, 2 and 3, calling them DLSA
1
, DLSA
2
and
2
http://tedlab.mit.edu/∼dr/SVDLIBC/
DLSA
3
in the results, respectively. Sparse SVD
during base case decomposition is realized by an
adapted LAS2 routine from SVDLIBC, see above.
With the exception of SVDLIBC, all the other al-
gorithms operate in a streaming fashion (ehek and So-
jka, 2010), so that the corpus need not reside in core
memory all at once. Although memory footprint of
all algorithms is independent of the size of the cor-
pus, it is still linear in the number of features, O(m).
It is assumed that the decomposition (U
m×k
,S
k×k
) fits
entirely into core memory.
For the experiments, we will be using a corpus of
3,494 mathematical articles collected from the digi-
tal libraries of NUMDAM, arXiv and DML-CZ. Af-
ter the standard procedure of pruning out word types
that are too infrequent (hapax legomena, typos, OCR
errors, etc.) or too frequent (stop words), we are
left with 39,022 distinct features. The final matrix
A
39,022×3,494
has 1.5 million non-zero entries. This
corpus was chosen so that it fits into core memory of
a single computer and its decomposition can there-
fore be computed directly. This will allow us to es-
tablish the “ground-truth” decomposition and set an
upper bound on achievable accuracy and speed.
3.2 Accuracy
Figure 1 plots the relative accuracy of singular val-
ues found by DLSA, ZMS, SVDLIBC and HEBB al-
gorithms compared to known, “ground-truth” values
S
G
. We measure accuracy of the computed singular
values S as r i = |s i − s G i|/s G i, for i = 1, ..., k.
The ground-truth singular values S
G
are computed di-
rectly with LAPACK’s DGESVD routine.
We observe that the largest singular values are
practically always exact, and accuracy quickly de-
grades towards the end of the returned spectrum. This
leads us to the following refinement: When request-
ing x factors, compute the truncated updates for k > x,
such as k = 2x, and discard the extra x− k factors only
when the final projection is actually needed. The er-
ror is then below 5%, which is comparable to the ZMS
algorithm (while DLSA is at least an order of magni-
tude faster even without any parallelization).
4 CONCLUSIONS
We developed and presented a novel single-pass eigen
decomposition method, which runs in constant mem-
ory w.r.t. the number of observations. The method is
embarrassingly parallel, so we also give its distributed
version.
ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence
450