Phase retrieval is to recover the phase information from its magnitude measurements, i.e.,
where $ x\in \mathbb{F} ^n $ is the unknown vector, $ a_i \in \mathbb{F} ^n $ are given sampling vectors which are random Gaussian vector in this paper, $ y_i $ are the observed measurements, $ \epsilon_i $ are the noise, and $ m $ is the number of measurements (or the sample size). The $ \mathbb{F} ^n $ can be $ \mathbb{R} ^n $ or $ \mathbb{C} ^n $, and we consider the real case $ \mathbb{F} ^n = \mathbb{R} ^n $ in this work. The phase retrieval problem arises in many fields like X-ray crystallography [1], optics [2], microscopy [3] and others, see e.g. [4]. Due to the lack of phase information, the phase retrieval problem is a nonlinear and ill-posed problem.
When the measurements are overcomplete, i.e., $ m>n $, there are many algorithms in the literature. Earlier approaches were mostly based on alternating projections, e.g. the work of Gerchberg and Saxton [5] and Fienup [4]. Recently, convex relaxation methods such as phase-lift [6] and phase-cut [7] were proposed. These methods transfer the phase retrieval problem into a semi-definite programming, which can be computationally expensive. Another convex approach named phase-max which does not lift the dimension of the signal was proposed in [8]. In the meanwhile, there are other works based on solving nonconvex optimization via first and second order methods including alternating minimization [9] (or Fienup methods), Wirtinger flow [6] Kaczmarz [10], Riemannian optimization [11]; Gauss-Newton [12, 13] etc. With a good initialization obtained via spectral methods, the above mentioned methods work with theoretical guarantees. Progresses were made by replacing the desired initialization with random initialized ones in alternating minimization [14, 15], gradient descent [16] and Kaczmarz method [17, 18] while keeping convergence guarantee with high probability. Also, recent analysis in [19, 20] showed that some nonconvex objective functions for phase retrieval have a nice landscape — there is no spurious local minima — with high probability. As a consequence, for these objective functions, any algorithms finding a local minima are guaranteed to give a successful phase retrieval.
For the large scale problem, the requirement $ m>n $ becomes unpractical due to the huge measurement and computation cost. In many applications, the true signal $ x $ is known to be sparse. Then the sparse phase retrieval problem can be solved with a small number of samplings, thus possible to be applied to large scale problems. It was proved in [21] that $ m = O(s\mathrm{log}\, n/s) $ measurement is sufficient to ensure successful recovery in theory with high probability when the model is Gaussian (i.e., the sampling vector $ a_i $ are i.i.d Gaussian and the target is real). But the exiting computational trackable algorithms require $ O(s^{2}\mathrm{log}\, n) $ number of measurements to reconstruct the sparse signal, for example, $ \ell_{1} $ regularized PhaseLift method [22], sparse AltMin [9], GESPAR [23], Thresholding/projected Wirtinger flow [24, 25], SPARTA [26] and so on. Two stage methods based on phase-lift and compressing has been introduced in [27, 28], which is able to do successful reconstruction with $ O(s\mathrm{log}\, n) $ measurements for some specially designed sampling matrix which exclude the Gaussian model (1.1). When a good initialization is available, the sample complexity can be improved to $ O(s\mathrm{log}\, n) $ [25,29]. However, it requires $ O(s^2\mathrm{log}\, n) $ samples to get a desired sparse initialization in the existing literature. This gap naturally raises the following challenging question.
Can one recover the $ s $-sparse target from the phaseless generic Gaussian model (1.1) with $ O(s\mathrm{log}\, n) $ measurements via just using random initializations?
In this paper, we propose a novel algorithm to solve the sparse phase retrieval problems in the very limited number of measurements (numerical examples show that $ m = O(s \mathrm{log} \, n) $ can be enough). The algorithm is a stochastic version of alternating minimizing method. The idea of alternating minimization method is: during each iteration, we first given an estimation of the phase information, then substitute the approximated phase into (1.1) with the sparse constraint and solve a standard compressed sensing problem to get an updated sparse signal. But since the alternating minimization method is a local method, it is very sensitive to the initialization. Without enough measurements, it is very difficult to compute a good initial guess. To overcome this difficulty, we change the sample matrix during each iteration via bootstrap technique, see Algorithm 1 for details. The numerical experiments show that the proposed algorithm needs only $ O(s \, \mathrm{log}\, n) $ measurements to recover the true signal with high probability in Gaussian model, and it works for a random initial guess. The experiments also show that the proposed algorithm is able to recover signal in a wide range of sparsity.
The rest of this paper is organized as follows. In Section 2 we introduce the setting of problem and the details of the algorithm. Numerical experiments are given in Section 3.
First, we introduce some notations. For any $ a, b\in \mathbb{R}^{n} $, we denote that $ a\odot b = (a_1 b_1, a_2 b_2, \cdots, a_n b_n) $, $ \lVert x\lVert_0 $ is the number of nonzero entries of $ x $, and $ \lVert x\lVert_2 $ is the standard $ l_2 $-norm, i.e., $ \lVert x\lVert_2 = \sqrt{\sum\limits_{i = i}^{n}x_{i}^2} $. The floor function $ \lfloor c \rfloor $ is the greatest integer which is less than or equal to $ c $.
Recall from (1.1), we denote the sampling matrix and the measurement vector by $ A = [a_1^t, \cdots, a_m^t] \in \mathbb{R} ^{m\times n} $ and $ y = [y_1, \cdots, y_m] \in \mathbb{R} ^m $, respectively. Let $ x\in \mathbb{R} ^n $ be the unknown sparse signal to be recovered. In the noise free case, the problem can be written as to find $ x $ such that
In the noisy case, this can be written by the nonconvex minimization problem
Now we propose the $ \rm \underline{sto}chastic\;alte\underline{r}nating \;\underline{m}inimization$ method for $ \rm \underline{sp}arse \;ph\underline{a}se\; \underline{r}etrieval$ (StormSpar) as follows. It starts with a random initial guess $ x^{0} $. In the $ \ell $-th step of iteration ($ \ell = 1, 2, \cdots $), we first randomly choose some rows of the sampling matrix $ A $ to form a new matrix $ A^{\ell} $ (which is a submatrix of $ A $), and denoted by the corresponding rows of $ y $ to $ y^{\ell} $. Then we compute the phase information of $ A^{\ell}x^{\ell -1} $, say $ p^{\ell} = \mathrm{sign}(A^{\ell}x^{\ell -1}) $, and to solve the standard compressed sensing subproblem
where $ \tilde{y}^{\ell} = p^{\ell}\odot y^{\ell} $. Problem (2.2) can be solved by a lot of compressed sensing solver, and we will use the efficient Hard Thresholding Pursuit (HTP) [30] in our algorithm. For completion, HTP is given in Algorithm 2. We summarize the StormSpar algorithm in the Algorithm 1.
The true signal $ x $ is chosen as $ s $-sparse with random support and the design matrix $ A\in\mathbb{R}^{m\times n} $ is chosen to be random Gaussian matrix. The additive Gaussian noise following the form $ \epsilon = \sigma* \rm{randn}(n, 1) $, thus the noise level is determined by $ \sigma $. The parameter $ \gamma $ is set to be $ \mathrm{min}(\frac{s}{m}*\mathrm{log}\, \frac{n}{0.001}, 0.6) $, and $ \delta = 0.01 $.
The estimation error $ r $ between the estimator $ \hat{x} $ and the true signal $ x $ is defined as
We say it is a successful recovery when the relative estimation error $ r $ satisfy that $ r\le 1e-2 $ or the support is exactly recovered. The tests repeat independently for $ 100 $ times to compute a successful rate. "Aver Iter" in Tables 1 and 2 means the average number of iterations for 100 times of tests. All the computations were performed on an eight-core laptop with core i7 6700HQ@3.50 GHz and 8 GB RAM using $\texttt{MATLAB}$ 2018a.
Example 1 First we examine the effect of sample size $ m $ to the probability of successful recovery in Algorithm 1. The dimension of the signal $ x $ is $ n = 1000 $.
a) When we set sparsity to be $ s = 10, 25, 50 $, Figure 1 shows how the successful rate changes in terms of the sample size $ m $. In this experiment, we fix a number $ K = \lfloor(s(\mathrm{log}\, n+\mathrm{log}\, \frac{1}{0.01}))\rfloor $, which is $ 115, 287, 575 $ with respect to the sparsity $ 10, 25, 50 $. Then we compute the probability of success when $ m/K $ changes: for each $ s $ and each $ m/K = 1, 1.25, \cdots, 3 $, we run our algorithm for $ 100 $ times. It shows when the sample size is in order $ O(s\, \mathrm{log}\, n ) $ in this setting, we can recover the signal with high possibility.
b) We compare StormSpar to some existing algorithm, i.e., CoPRAM [31], Thresholded Wirtinger Flow (ThWF) [24] and SPArse truncated Amplitude flow (SPARTA) [26]. The sparsity is set to be $ 30 $ and the model is noise free. Figure 2 shows the successful rate comparison in terms of sample size, the results are obtained by averaging the results of $ 100 $ trials. We find it that StormSpar requires more iterations and more cpu time than these algorithms which requires initialization. But StormSpar achieves better accuracy with less sample complexity.
Example 2 Figure 3 shows that StormSpar is robust to noise. We set $ n = 1000, s = 20 $, and $ m = \lfloor(2.5s(\mathrm{log}\ n+\mathrm{log}\frac{1}{0.01}))\rfloor ( = 575) $. The noise we added is i.i.d. Gaussian, and the noise level is shown by signal-to-noise ratios (SNR), we plot the corresponding relative error of reconstruction in the Figure 3. The results are obtained by average of $ 100 $ times trial run.
Example 3 We compare StormSpar with a two-stage method Phaselift+BP proposed in [27], which has been shown to be more efficient than the standard SDP of [32]. The dimension of data is set to be $ n = 1000 $. The comparison are two-folder. First, for different sparsity level, we compare the minimum number of measurements required to give successful recovery rate higher than $ 95\% $, the result can be found in Figure 4. Second the average computational time is given in Figure 5, where $ m = \lfloor (2.5s(\mathrm{log}\, n+\mathrm{log}\, \frac{1}{0.01})) \rfloor $.
Example 4 Let $ m = O(s \mathrm{log} \, n) $, we test for different sparse levels and different dimensions. In Table 1, we fix dimension $ n = 2000 $, and the sample size is chosen to be $ m = \lfloor (2.5s(\mathrm{log}\, n+\mathrm{log}\, \frac{1}{0.01})) \rfloor $. The sparsity level changes from $ 5 $ to $ 100 $, we find the algorithm can successfully recover the sparse signal in most case, and the iteration number is very stable.
In Table 2, the sparsity level is fixed by $ s = 10 $, the sample size is $ m = \lfloor (2.5s(\mathrm{log}\, n+\mathrm{log}\, \frac{1}{0.01})) \rfloor $ for dimension $ n $ from $ 100 $ to $ 10000 $. We find the algorithm can successfully recover the sparse signal in most cases, and the number of iteration dependent on the dimension $ n $.
In this paper, we propose a novel algorithm (StormSpar) for the sparse phase retrieval. StormSpar start with a random initialization and employ a alternating minimization method for a changing objective function. The subproblem $ \min\limits_{x, \|x\|_0\leq s} \frac{1}{2}\|A^{\ell} x - \tilde{y}^{\ell}\|^2 $ is a standard compressed sensing problem, which can be solved by HTP method. Numerical examples show that the proposed algorithm requires only $ O(s\, \mathrm{log}\, n) $ samples to recover the $ s $-sparse signal with a random initial guess.