This implies that the product U
k
V
k
equals to the ort-
hogonal projection of F onto the column space of U
k
.
According to Eq. 8, the column space of U
k
can be
represented by arbitrary orthonormal basis for the co-
lumns of FV
T
k−1
.
It is worth noting that we can compute it as Q via
fast QR decomposition FV
T
k−1
= QR. In this case,
the product U
k
V
k
can be equivalently computed as
U
k
V
k
= P
Q
F = QQ
T
F. Therefore U
k
and V
k
in Eq.
8 can be replaced by Q and Q
T
F repsectively, while
the product U
k
V
k
and the corresponding object value
are kept the same. This gives a faster updating proce-
dure
(
U
k
= Q, QR = qr(FV
T
k−1
)
V
k
= Q
T
F.
(10)
This relation can then be derived, from which the
alternating update can be viewed as mutually adap-
tive optimization of right sketch FV
T
and left ske-
tch Q
T
F for F. Since the right and left sketches re-
spectively describe the column and row spaces, which
largely decide the approximation precision, we can
temporarily ignore the QR decomposition in order to
see how the column/row space is tracked within this
scheme. Actually Eq. 8 has the same accuracy as po-
wer scheme randomized SVD method (Halko et al.,
2011). Different from power scheme, we updates Eq.
8 with a greedy incremental rank for both U and V .
The computatioin of GBM is dominated by the two
matrix multiplications that take 2dNr
i
flops. It can be
further speeded up if assigning sparsity on U and V ,
which will be described in the next subsection. The
overall greedy bilateral solver is wrapped up in Algo-
rithm 1.
3.0.2 Random Row-Space Projection (RRSP)
Beyond greedy bilateral method, in this section we
outline a scheme that is based on the approximate
SVD algorithm of Sarlos (Sarlos, 2006), (Fazel et al.,
2008). This method casts SVD-free algorithm as a di-
rect sensing the row and column space of the target
matrix.
Suppose rank(F) = r, we again perform two sets
of measurements (arbitrary random matrix) of F.
Here, the output of the first set is used as the sensing
matrix for the second set. Thus this method needs to
access F and F
T
to obtain the two sets of measure-
ments sequentially. The second set of measurements
are in fact quadratic in F.
We again have several choices for the sensing ma-
trix P ∈ R
r×m
, for example we can pick P with i.i.d.
Gaussian entries. It is also possible to use structu-
red matrices that are faster to apply, for example the
SRFT (The Subsampled Randomized Fourier Trans-
form) matrix which consists of randomly selected
rows of the product of a discrete Fourier transform
matrix and a random diagonal matrix (Woolfe et al.,
2008). From the viewpoint of sparsity, SRFT matrix
is encouraged to be the sensing matrix P. We can con-
sider the following scheme:
• Sensing : Make linear measurement
Y
1
= PF, f ollowed by Y
2
= Y
1
F
T
. (11)
• Recovery : Given measurements Y
1,2
, construct
ˆ
F
T
= Y
†
1
Y
2
(12)
The recovery step can be implemented eifficiently
using a QR decomposition of Y
1
.
A geometric interpretation is as follows: using
ˆ
F
T
= Y
†
1
Y
1
F
T
= (PF)
†
(PF) F
T
and noting that Y
†
1
Y
1
is the orthogonal projection matrix onto the range of
Y
1
, we see that the estimate
ˆ
F is given by the pro-
jection of each row of F onto the row-space of PF,
which is spanned by random linear combinations of
the rows of F. That is, each row of F is approxi-
mated by its closest vector in the row-space of PF.
Employing random projection matrix P is of crucial
importance in extracting informative spaces from the
target matrix and determining the effective rank. The
methodology presented in this work is verified by the
following Lemma 1.
Lemma 3.1. (Exact Recovery) Suppose entries of P
are i.i.d. Gaussian.
If rank(F) = r, the scheme described in Eq. 11 and
12 yields
ˆ
F = F with probability one.
Proof. Let p
i
denote the ith row or P (for example,
from a Gaussian or SRFT ensemble). If rank(F) = r
the set of random vectors F
T
p
i
, i = 1,..., r are linearly
independent with probability one, which implies that
row-space of PF is equal to row-space of F with pro-
bability one, and projecting F onto its own row-space
gives F.
To cope with relative error of SRFT matrix used,
the proof is in section 5.2 (Woolfe et al., 2008).
Lemma 3.2. Suppose P is an SRFT matrix and there
are α, β > 1 such that
α
2
β
(α −1)
2
2r
2
≤ l < m (13)
Then,
ˆ
F −F
= C
√
m ·σ
r+1
(F) (14)
holds with probability at least 1 −
1
β
. Constant C de-
pends on α.
However when F is not low-rank structure, the
truncated r-term SVD of F is approximated by
Y
†
1
Y
2
r
.
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
102