A MATRIX COMPLETION ALGORITHM USING RANDOMIZED SVD


扩展功能
	加入收藏夹

	复制引文信息

	加入引用管理器

	Email Alert

	RSS
本文作者相关文章
	XU Xue-min

	XIANG Hua

XU Xue-min, XIANG Hua

School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China

Received date: 2014-12-19; Accepted date: 2015-04-21

Foundation item: Supported by National Natural Science Foundation of China (10901125; 11471253)

Biography: Xu Xuemin (1991–), female, born at Nanyang, Henan, master, major in numerical algebra

Abstract: In this paper, we investigate the large low-rank matrix completion problem. By using randomized singular value decomposition (RSVD) algorithm, we compute singular values of sparse matrix. Compared to the Lanczos method, the computational time is greatly reduced with the same error. The algorithm also can be used to solve the relatively low rank matrix.

Key words: matrix completion singular value thresholding unclear norm minimization randomized singular value decomposition

用随机奇异值分解算法求解矩阵恢复问题

许雪敏, 向华

武汉大学数学与统计学院, 湖北武汉 430072

摘要：本文研究了大型低秩矩阵恢复问题.利用随机奇异值分解（RSVD）算法，对稀疏矩阵做奇异值分解.该算法与Lanczos方法相比，在误差精度一致的同时运算时间大大降低，且该算法对相对低秩矩阵也有效.

关键词：矩阵恢复奇异值阈值核范数最小化随机奇异值分解

1 Introduction

In many situations we need to recover a matrix which has low rank or approximately low rank. The problem requires that we randomly select $m$ entries from an $n \times n$ matrix $M$ and find out the missing or unknown values based on the sampled entries. Such problems arise from many areas, such as multi-task learning [3], control [10], machine learning [1, 2], image processing, dimensionality reduction or recommender systems in e-commerce, and so on. A well known method for reconstructing low-rank matrices is based on convex optimization of the nuclear norm.

Let $\textit{M}\in \mathbb {R}^{\textit{n}\times \textit{n}}$ be an unknown matrix with rank r satisfying $\textit{r} \ll \min \{ \textit{m, n} \} $, and suppose that one has available m sampled entries $\{\textit{M} _{\textit{ij}}:(\textit{i, j })\in \Omega \}$, where $\Omega$ is a random subset of cardinality m, and $ \Omega \subset \{1, 2, \cdots, \textit{n} \} \times \{1, 2, \cdots, \textit{n} \} $. The authors in [4] showed that most low rank matrices $\mathbf{\textit{M}}$ can be perfectly recovered by solving the optimization problem

$ \begin{array}{ll} \text{minimize} \;\;\; ||\textit{X}||_{*}, \\ \text{subject to} \;\;\; \textit{X}_\textit{ij}=\textit{M}_\textit{ij }, \;\;\; ( \textit{i, j } ) \in \Omega \end{array} $

(1.1)

provided that the number of samples obeys $ \textit{m }\geq \textit{Cn}^{6/5}\textit{r} \log\textit{n} $ for some positive numerical constant C. Here the functional $\| \cdot \|_* $ stands for the nuclear norm of the matrix M, i.e., the summation of all singular values. The optimization problem (1.1) is convex and can be recast as a semidefinite programming [6, 7]. If there were only one low-rank object fitting the data, this would recover M. This is unfortunately of little practical usage because this optimization problem is NP-hard, and all known algorithms which provide exact solutions require time doubly exponential in the dimension n of the matrix in both theory and practice. Some solvers based on interior-point methods can deal with this problem, but they can only solve problems of size at most hundreds by hundreds on a moderate PC. Since the nuclear ball $ \{\textit{X}: \|\textit{X} \|_* \leq 1 \} $ is the convex hull of the set of rank-one matrices with spectral norm bounded by one, the nuclear norm minimization problem can be approximated by the rank minimization problem as its convex relaxation

$ \begin{array}{ll} \text{minimize} \;\;\; \text{rank} (\textit{X}), \\ \text{subject to} \;\;\; \textit{X}_\textit{ij}=\textit{M}_\textit{ij }, \;\;\; ( \textit{i, j } ) \in \Omega . \end{array} $

(1.2)

2 Algorithms for Completing Matrix

2.1 The Singular Value Thresholding (SVT) Algorithm

Problem (1.1) is extended in [4] as follows

$ \begin{array}{ll} \text{minimize} \;\;\; ||X||_{*}, \\ \text{subject to} \;\;\; \mathcal{P}_\Omega (X) = \mathcal{P}_\Omega (M), \end{array} $

(2.1)

where $ X$ is an optimization variable. We can use a gradient ascent algorithm applied to the problem with a large parameter $\tau$ and scalar step sizes $\{\delta_k\}_{k\geq1}$. That is, starting with $Y^0=0 \in \mathbb{R}^{n\times n}$, the singular value thresholding iteration is

$ \begin{equation} \left\{ \begin{array}{ll} X^{k}= \mathcal{D}_\tau (Y^{k-1}), \\ Y^{k}= Y^{k-1} +\delta_k \mathcal{P}_\Omega (M-X_{k}), \end{array} \right. \end{equation} $

(2.2)

where $\mathcal{D}_{\tau} (\cdot) $ uses a soft-thresholding rule at lever $\tau $ to the singular values of the input matrix. Consider the singular value decomposition (SVD) of a matrix $Z \in \mathbb{R}^{n \times n} $, and the rank of it is $ r $. That is,

$ \begin{array}{ll} Z= U \Sigma V^*, \;\;\; \Sigma = \text{diag} (\{\sigma_i\}_{1 \leq i \leq r}). \end{array} $

The definition of $\mathcal{D}_{\tau} (Z) $ is given as follows:

$ \begin{equation*}\label{mat} \mathcal{D}_{\tau} (Z):=U \begin{bmatrix} (\sigma_1- \tau)_+ && \\ &\ddots&\\ && (\sigma_s - \tau)_+ \\ \end{bmatrix} V^*, \end{equation*} $

$ \text{where}~ (\sigma_s - \tau)_+ = \left \{ \begin{array}{ll} \sigma_i - \tau, \;\;\; \sigma_i - \tau > 0, \\ 0, \;\;\; ~~~~~~~\text{otherwise}. \end{array} \right. $

(2.3)

The most important property of (2.2) is that the sequence $ \{\textit{X}_k\} $ converges to the solution of the optimization problem (2.1) when the values of $\tau $ is large. We get the shrinkage iterations with fixed $\tau >0 $ and scalar step sizes $\{\delta_k\}_{k\geq1}$. Starting with $Y_0 $, we define for $ k= 1, 2, \cdots, $ until the stopping criterion is satisfied.

The parameters in the iterations are needed to be given. Let $\tau=5n $ and $p = m/n^2$. In general, we use constant step sizes $\delta= 1.2 p^{-1} $ [4], and set the stopping criterion

$ \begin{array}{ll} ||\mathcal{P}_\Omega (X^k -M||_F / ||\mathcal{P}_\Omega M||_F < \epsilon_2. \end{array} $

(2.4)

Since the initial condition is $\textit{Y}^0 = 0 $, we need to have a big $\tau $ to make sure that the optimization problem has a close solution. Now we let $ \textit{k}_0 $ be an integer and have the following condition

$ \begin{array}{ll} \frac{\tau}{\delta \|\mathcal{P}_\Omega (M)\|_2} \in (k_0-1, k_0]. \end{array} $

(2.5)

Because $ Y^0=0$, we needn't compute the first several steps [4]. It's easy to know that $ X^k=0 $ and $ Y^k=k\delta\mathcal{P}_\Omega (M)$ when $ k \leq k_0 $. To reduce the computing time, we begin the iteration at the $k_0 $ step.

2.2 The Randomized Algorithm

In SVT, we need to compute $[U^{k-1}, \Sigma ^{k-1}, V^{k-1}]_{s_k}$, where $U^{k-1}, \Sigma ^{k-1}, V^{k-1} $ are the SVD factors of $ Y^{k-1} $ and $ s_k $ is the parameter of Lanczos process. The SVT algorithm uses the Lanczos method via the package PROPACK [9] to compute the singular value decomposition of a huge matrix. The main disadvantage of the classical singular value thresholding algorithm is that we need to compute the SVD of a large matrix at each stage by using a Krylov subspace method such Lanczos or Arnoldi to compute the rank-$k$ SVD. As we know, the efficiency of Krylov subspace depends on the spectrum of the matrix, and only BLAS-2 operations are applied. When the rank of the matrix is not very low, it will take a lot of time to achieve the SVD approximation.

We use the randomized algorithm [8] instead of the Lanczos method to compute the SVD. The Lanczos method is one of Krylov subspace method and can be unstable, while the randomized is robust and simply to be implemented. It is not dependent on the spectrum of the sampled matrix. What's more, the randomized algorithm is easy to be parallelized.

The idea of the randomized algorithm is that we project the matrix onto a smaller matrix which preserves most of the important information and ignore the less important information. The pseudo-codes of the randomized algorithm are given as follows (see Algorithm 1) [11].

2.3 The SVT Algorithm Using RSVD

In SVT iterations, the SVD is needed in each step. Since the classical methods for SVD approximation are costly. We use the randomized SVD, i.e., Algorithm 1, to replace the classical one, and obtain the R-SVT algorithm (see Algorithm 2). We can clearly see that in Step 4 of the pseudo-code of Algorithm 2, RSVD is used instead, while the classical SVT algorithm uses Lanczos method to find the singular values. At the beginning of computing, we don't know the number of the singular values, so we have to spend much time to find this number, and it could be very slow.

On the other hand, in the randomized algorithm, we just preserves the important information and ignore the less important information, so the relatively error of our result can be larger than the SVT algorithm. To obtain a small relatively error at low cost, we combine the two algorithms together, and have the algorithm R-SVT$^*$ (see Algorithm 3). At the first stage we use SVT based on RSVD until the error is smaller than $ \epsilon_1 $, for example $ 0.1 $. Then we switch to the classical SVT based on PROPACK, until the error is smaller than $ \epsilon_2 $, for example $ 1e-4 $. The pseudo-codes of R-SVT$^*$ algorithm are given as follows.

The classical methods use the PROPACK to compute the approximate SVD, based on Lanczos process. In the algorithm R-SVT$^*$, we use RSVD instead to perform SVT, and later switch to the classical SVT. Lanczos procedure needs to access the coefficient matrix several times, and use the BLAS-2 operations. In RSVD, the large matrix is accessed by less times, and the BLAS-3 operations are used. So we can expect that the randomized algorithm can be much faster than Lanczos process for SVD approximation. Note that our work is different from that in [5]. Here we use a different randomized algorithm, i.e., algorithm from [11], and we also apply the strategy of switching to the classical SVT in our algorithm R-SVT$^*$.

3 Numerical Results

In our numerical tests, we use Matlab to implement the R-SVT algorithm, and all the results in this paper are obtained by a computer with 2.13 GHz CPU and 2 GB RAM. At first, we generate an $n \times n$ random matrix. Then, we generate a random data array with the length $m$. Next, we sample the entries of the matrix by the data array. We use the sampled matrix to complete the random matrix we generate.

First, setting the tolerance $\epsilon$ is 0.1, we compare the R-SVT with the SVT based on PROPACK to complete the matrix. In Table 1, the matrices of size $500\times 500$, $1000\times 1000$, $2000\times 2000$ are tested. We compare the computational time and solution accuracy of the classical SVT and our R-SVT. In Table 1, the notations T, iter, RE stand for the computational time, outer iteration number, and relative error, respectively. And in Table 1 we find that both our R-SVT and the classical SVT can achieve the final relative errors of almost the same order with almost the same number of iterations. According to the computational time, we also find that our R-SVT is faster than the SVT based on PROPACK, and the time difference becomes more obvious when the matrix is larger. For example, when the size of matrix is $2000 \times 2000$ with the rank of 400, the computational time of SVT is almost five times of that of R-SVT.

Table 1
Comparisons of SVT and R-SVT

Second, we set the relative error as small as to be $10^{-4}$. Based on the former algorithm R-SVT, we just make a small modification. We use the R-SVT until the error is 0.1, and then switch to the SVT based on PROPACK until the error is smaller than $10^{-4}$. The computational result are shown in Table 2. We compare the results and can draw the similar conclusions as the first algorithm.

Table 2
Comparisons of SVT and R-SVT$^*$.

4 Conclusion

In this paper, we consider the randomized SVT for matrix completion problems. When we nearly finish our work, we notice the work in [5]. But here we use a different randomized algorithm, i.e., the algorithm from [11], and we also apply the strategy of switching to the classical SVT in our algorithm R-SVT$^*$. We use the random matrices to test our new algorithm. We can draw the conclusions as follows.

1. The computational time of our randomized-SVT algorithm is less than the classical SVT algorithm. And this advantage becomes more obvious when the rank of the matrix becomes larger. The amazing is that the our R-SVT algorithm works well for the matrix whose rank is not very low, and this is a great improvement for matrix completion.

2. When the tolerance is very small, then the computational time of R-SVT will increase, but this can be overcome when we make a switch in R-SVT$^*$. That is, when the tolerance is very small we switch from the R-SVT algorithm to SVT algorithm. Using this strategy in R-SVT$^*$, the advantages of the R-SVT are still kept.

References

[1]	Amit Y, Fink M, Srebro N, Ullman S. Uncovering shared structures in multiclass classification[A]. Proceedings of the 24th International Conference on Machine Learning[C]. Providence, RI: ACM, 2007: 17-24.

[2]	Argyriou A, Evgeniou T, Pontil M. Multi-task feature learning[J]. Adv. Neural Inform. Proc. Syst., 2007: 41–48.

[3]	Argyriou A, Evgeniou T, Pontil M. Convex multi-task feature learning[J]. Machine Learning, 2008, 73(3): 243–272. DOI:10.1007/s10994-007-5040-8

[4]	Cai Jianfeng, Candes J Ammanuel, Shen Zuowei. A singular value thresholding algorithm for matrix completion[J]. Soc. Indust. Appl. Math., 2010, 20(4): 1956–1982.

[5]	Dhanjal Charanpal, Clemencon Stephan, Gaudel Romaric. Online matrix completion through nuclear norm regularisation[EB/OL]. http://arxiv.org/pdf/1401.2451.pdf. hal-00926605, Ver. 1, 9 Jan. 2014.

[6]	Fazel M. Matrix rank minimization with applications[D]. Stanford, CA: Stanford University, 2002. Matrix rank minimization with applications

[7]	Fazel M, Hindi H, Boyd S P. Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices[J]. Proc. Amer. Control Conf., 2003, 3: 2156–2162.

[8]	Halko N, Martinsson P G, Troop J A. Finding structure with randomness:Probabilistic algorithms for constructing approximate matrix decompositions[J]. SIAM Review, 2011, 53(2): 217–288. DOI:10.1137/090771806

[9]	Larsen R M. PROPACK-software for large and sparse SVD calculations[OL]. http://sun.stanford.edu/rmunk/PROPACK/.

[10]	Mesbahi M, Papavassilopoulos G P. On the rank minimization problem over a positive semidefinitelinear matrix inequality[J]. IEEE Trans Automat Control, 1997, 42(2): 239–243. DOI:10.1109/9.554402

[11]	Xiang Hua, Zou Jun. Regularization with randomized SVD for large-scale discrete inverse problems[J/EB]. Inverse Problem, 2013, 29(8): http://iopscience.iop.org/0266-5611/29/8/085008/.

[12]	Guo Wei. Singular value decomposition and algorithm of o-symmetric matrix[J]. J. Math., 2009, 29(3): 346–350.