user B. If we complete the missing value of the ma-
trix, we can predict how users prefer items that they
have not evaluate yet. This technique of predicting is
can be used for recommendation systems.
A technique used in recommendation systems is
collaborative filtering (CF). CF algorithms have three
main categories: memory-based, model-based, and
hybrid (Su and Khoshgoftaar, 2009). Memory-based
CF algorithms calculate similarities between users or
items to predict users’ preferences. Model-based CF
algorithms learn a model in order to make predic-
tions. Hybrid CF algorithms combine several CF
techniques.
In memory-based CF algorithms, similarities be-
tween users or items are used to make predictions. As
in measures of similarity, the vector cosine correlation
and the Pearson correlation are often used. However,
when many values are missing, it is difficult to com-
pute similarities between users. In fact, the number
of items might be greater than the number of users
and each user may evaluate only a small number of
items. For this reason, many items are evaluated by
only a few users, while other users do not submit any
evaluations.
To overcome the weakness in memory-based CF
algorithms, model-based CF algorithms have been
investigated. Model-based CF approaches use data
mining or machine learning algorithms. One of the
techniques of model-based CF is dimensionality re-
duction, such as PCA or singular value decomposition
(SVD). As we mentioned above, high-dimensional
data is thought to be expressed by lower dimensional
data because related data lie in a lower dimensional
space.
1.2 Related Work
Low-rank matrix completion and approximation have
been studied. In this section, we briefly introduce
some papers about these techniques.
Let us consider completing a matrix V
V
V ∈ (R ∪
{?})
m×n
with missing values, where the symbol ? in-
dicates that the corresponding value is missing. That
is, V
i j
= ? indicates V
i j
is missing. Cand
´
es and Recht
(2009) proposed rank minimization to complete ma-
trix V
V
V . They considered the following problem:
minimize rank(X
X
X)
subject to X
i j
= V
i j
((i, j) ∈ Ω),
(2)
where Ω is the set of indices of observed entries, i.e.,
Ω = {(i, j) : V
i j
∈ R}. Problem (2) is NP-hard be-
cause it contains the l
0
-norm minimization problem.
This difficulty essentially arises from the nonconvex-
ity and discontinuity of the rank function. Hence, they
introduced nuclear norm minimization as a convex re-
laxation of Problem (2). Nuclear norm minimization
can be recast as a semidefinite optimization problem
(SDP). There are many efficient algorithms and high-
quality software packages available for solving SDP,
including the interior-point method. However, the
computation time for solving SDP is very sensitive
to instance size and is unsuitable for solving large in-
stances
Olsson and Oskarsson (2009); Gillis and Glineur
(2011) studied the following problem to complete V
V
V :
minimize
m
∑
i=1
n
∑
j=1
W
i j
(X
i j
−V
i j
)
2
subject to rank(X
X
X) ≤ r,
(3)
where decision variable X
X
X ∈ R
m×n
is a completed ma-
trix of V
V
V and W
W
W ∈ {0,1}
m×n
is a given weight ma-
trix corresponding to an observation, i.e., W
i j
= 1 for
V
i j
∈ R and otherwise W
i j
= 0. Using Ω, we obtain
the following equivalent formulation of Problem (3):
minimize
∑
(i, j)∈Ω
(X
i j
−V
i j
)
2
subject to rank(X
X
X) ≤ r.
(4)
Our formulation is similar to this one and this model
is helpful to understand our model. Olsson and Os-
karsson (2009) proposed a heuristic based on an ap-
proximated continuous (but nonconvex) formulation
of Problems (3). Gillis and Glineur (2011) proved the
NP-hardness of Problem (3), and equivalently, Prob-
lem (4).
On the other hand, the low-rank matrix approxi-
mation problem is easily solved when no values are
missing. In fact, when Ω is an entire set of indices,
Problem (4) is equivalent to the following problem:
minimize kX
X
X −V
V
V k
2
F
subject to rank(X
X
X) ≤ r,
where k · k
F
denotes the Frobenius norm defined by
kA
A
Ak
F
=
s
m
∑
i=1
n
∑
j=1
A
2
i j
for A
A
A ∈ R
m×n
. This problem is nonconvex; how-
ever, a global optimal solution is obtained by the
truncated SVD of V
V
V . Specifically, it is well known
that an optimal solution of this problem can be writ-
ten as
∑
r
l=1
σ
l
p
p
p
l
q
q
q
>
l
(Trefethen and Bau, 1997, The-
orem 5.9), where σ
l
, p
p
p
l
, and q
q
q
l
respectively repre-
sent the l-th largest singular value and correspond-
ing singular vectors of V
V
V . The computation of all
singular values and the vectors of V
V
V is expensive.
More specifically, it requires a computation time in
O(min{m
2
n,mn
2
}), which can be too heavy for a
Box Constrained Low-rank Matrix Approximation with Missing Values
79