In many situations we need to recover a matrix which has low rank or approximately low rank. The problem requires that we randomly select $m$ entries from an $n \times n$ matrix $M$ and find out the missing or unknown values based on the sampled entries. Such problems arise from many areas, such as multi-task learning [3], control [10], machine learning [1, 2], image processing, dimensionality reduction or recommender systems in e-commerce, and so on. A well known method for reconstructing low-rank matrices is based on convex optimization of the nuclear norm.
Let $\textit{M}\in \mathbb {R}^{\textit{n}\times \textit{n}}$ be an unknown matrix with rank r satisfying $\textit{r} \ll \min \{ \textit{m, n} \} $, and suppose that one has available m sampled entries $\{\textit{M} _{\textit{ij}}:(\textit{i, j })\in \Omega \}$, where $\Omega$ is a random subset of cardinality m, and $ \Omega \subset \{1, 2, \cdots, \textit{n} \} \times \{1, 2, \cdots, \textit{n} \} $. The authors in [4] showed that most low rank matrices $\mathbf{\textit{M}}$ can be perfectly recovered by solving the optimization problem
provided that the number of samples obeys $ \textit{m }\geq \textit{Cn}^{6/5}\textit{r} \log\textit{n} $ for some positive numerical constant C. Here the functional $\| \cdot \|_* $ stands for the nuclear norm of the matrix M, i.e., the summation of all singular values. The optimization problem (1.1) is convex and can be recast as a semidefinite programming [6, 7]. If there were only one low-rank object fitting the data, this would recover M. This is unfortunately of little practical usage because this optimization problem is NP-hard, and all known algorithms which provide exact solutions require time doubly exponential in the dimension n of the matrix in both theory and practice. Some solvers based on interior-point methods can deal with this problem, but they can only solve problems of size at most hundreds by hundreds on a moderate PC. Since the nuclear ball $ \{\textit{X}: \|\textit{X} \|_* \leq 1 \} $ is the convex hull of the set of rank-one matrices with spectral norm bounded by one, the nuclear norm minimization problem can be approximated by the rank minimization problem as its convex relaxation
Problem (1.1) is extended in [4] as follows
where $ X$ is an optimization variable. We can use a gradient ascent algorithm applied to the problem with a large parameter $\tau$ and scalar step sizes $\{\delta_k\}_{k\geq1}$. That is, starting with $Y^0=0 \in \mathbb{R}^{n\times n}$, the singular value thresholding iteration is
where $\mathcal{D}_{\tau} (\cdot) $ uses a soft-thresholding rule at lever $\tau $ to the singular values of the input matrix. Consider the singular value decomposition (SVD) of a matrix $Z \in \mathbb{R}^{n \times n} $, and the rank of it is $ r $. That is,
The definition of $\mathcal{D}_{\tau} (Z) $ is given as follows:
The most important property of (2.2) is that the sequence $ \{\textit{X}_k\} $ converges to the solution of the optimization problem (2.1) when the values of $\tau $ is large. We get the shrinkage iterations with fixed $\tau >0 $ and scalar step sizes $\{\delta_k\}_{k\geq1}$. Starting with $Y_0 $, we define for $ k= 1, 2, \cdots, $ until the stopping criterion is satisfied.
The parameters in the iterations are needed to be given. Let $\tau=5n $ and $p = m/n^2$. In general, we use constant step sizes $\delta= 1.2 p^{-1} $ [4], and set the stopping criterion
Since the initial condition is $\textit{Y}^0 = 0 $, we need to have a big $\tau $ to make sure that the optimization problem has a close solution. Now we let $ \textit{k}_0 $ be an integer and have the following condition
Because $ Y^0=0$, we needn't compute the first several steps [4]. It's easy to know that $ X^k=0 $ and $ Y^k=k\delta\mathcal{P}_\Omega (M)$ when $ k \leq k_0 $. To reduce the computing time, we begin the iteration at the $k_0 $ step.
In SVT, we need to compute $[U^{k-1}, \Sigma ^{k-1}, V^{k-1}]_{s_k}$, where $U^{k-1}, \Sigma ^{k-1}, V^{k-1} $ are the SVD factors of $ Y^{k-1} $ and $ s_k $ is the parameter of Lanczos process. The SVT algorithm uses the Lanczos method via the package PROPACK [9] to compute the singular value decomposition of a huge matrix. The main disadvantage of the classical singular value thresholding algorithm is that we need to compute the SVD of a large matrix at each stage by using a Krylov subspace method such Lanczos or Arnoldi to compute the rank-$k$ SVD. As we know, the efficiency of Krylov subspace depends on the spectrum of the matrix, and only BLAS-2 operations are applied. When the rank of the matrix is not very low, it will take a lot of time to achieve the SVD approximation.
We use the randomized algorithm [8] instead of the Lanczos method to compute the SVD. The Lanczos method is one of Krylov subspace method and can be unstable, while the randomized is robust and simply to be implemented. It is not dependent on the spectrum of the sampled matrix. What's more, the randomized algorithm is easy to be parallelized.
The idea of the randomized algorithm is that we project the matrix onto a smaller matrix which preserves most of the important information and ignore the less important information. The pseudo-codes of the randomized algorithm are given as follows (see Algorithm 1) [11].
In SVT iterations, the SVD is needed in each step. Since the classical methods for SVD approximation are costly. We use the randomized SVD, i.e., Algorithm 1, to replace the classical one, and obtain the R-SVT algorithm (see Algorithm 2). We can clearly see that in Step 4 of the pseudo-code of Algorithm 2, RSVD is used instead, while the classical SVT algorithm uses Lanczos method to find the singular values. At the beginning of computing, we don't know the number of the singular values, so we have to spend much time to find this number, and it could be very slow.
On the other hand, in the randomized algorithm, we just preserves the important information and ignore the less important information, so the relatively error of our result can be larger than the SVT algorithm. To obtain a small relatively error at low cost, we combine the two algorithms together, and have the algorithm R-SVT$^*$ (see Algorithm 3). At the first stage we use SVT based on RSVD until the error is smaller than $ \epsilon_1 $, for example $ 0.1 $. Then we switch to the classical SVT based on PROPACK, until the error is smaller than $ \epsilon_2 $, for example $ 1e-4 $. The pseudo-codes of R-SVT$^*$ algorithm are given as follows.
The classical methods use the PROPACK to compute the approximate SVD, based on Lanczos process. In the algorithm R-SVT$^*$, we use RSVD instead to perform SVT, and later switch to the classical SVT. Lanczos procedure needs to access the coefficient matrix several times, and use the BLAS-2 operations. In RSVD, the large matrix is accessed by less times, and the BLAS-3 operations are used. So we can expect that the randomized algorithm can be much faster than Lanczos process for SVD approximation. Note that our work is different from that in [5]. Here we use a different randomized algorithm, i.e., algorithm from [11], and we also apply the strategy of switching to the classical SVT in our algorithm R-SVT$^*$.
In our numerical tests, we use Matlab to implement the R-SVT algorithm, and all the results in this paper are obtained by a computer with 2.13 GHz CPU and 2 GB RAM. At first, we generate an $n \times n$ random matrix. Then, we generate a random data array with the length $m$. Next, we sample the entries of the matrix by the data array. We use the sampled matrix to complete the random matrix we generate.
First, setting the tolerance $\epsilon$ is 0.1, we compare the R-SVT with the SVT based on PROPACK to complete the matrix. In Table 1, the matrices of size $500\times 500$, $1000\times 1000$, $2000\times 2000$ are tested. We compare the computational time and solution accuracy of the classical SVT and our R-SVT. In Table 1, the notations T, iter, RE stand for the computational time, outer iteration number, and relative error, respectively. And in Table 1 we find that both our R-SVT and the classical SVT can achieve the final relative errors of almost the same order with almost the same number of iterations. According to the computational time, we also find that our R-SVT is faster than the SVT based on PROPACK, and the time difference becomes more obvious when the matrix is larger. For example, when the size of matrix is $2000 \times 2000$ with the rank of 400, the computational time of SVT is almost five times of that of R-SVT.
Second, we set the relative error as small as to be $10^{-4}$. Based on the former algorithm R-SVT, we just make a small modification. We use the R-SVT until the error is 0.1, and then switch to the SVT based on PROPACK until the error is smaller than $10^{-4}$. The computational result are shown in Table 2. We compare the results and can draw the similar conclusions as the first algorithm.
In this paper, we consider the randomized SVT for matrix completion problems. When we nearly finish our work, we notice the work in [5]. But here we use a different randomized algorithm, i.e., the algorithm from [11], and we also apply the strategy of switching to the classical SVT in our algorithm R-SVT$^*$. We use the random matrices to test our new algorithm. We can draw the conclusions as follows.
1. The computational time of our randomized-SVT algorithm is less than the classical SVT algorithm. And this advantage becomes more obvious when the rank of the matrix becomes larger. The amazing is that the our R-SVT algorithm works well for the matrix whose rank is not very low, and this is a great improvement for matrix completion.
2. When the tolerance is very small, then the computational time of R-SVT will increase, but this can be overcome when we make a switch in R-SVT$^*$. That is, when the tolerance is very small we switch from the R-SVT algorithm to SVT algorithm. Using this strategy in R-SVT$^*$, the advantages of the R-SVT are still kept.