ASYMPTOTICS FOR REGULARIZED MATRIX REGRESSIONS


扩展功能
	加入收藏夹

	复制引文信息

	加入引用管理器

	Email Alert

	RSS
本文作者相关文章
	CHEN Wei

CHEN Wei

School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China

Received date: 2021-03-27; Accepted date: 2021-04-27

Biography: Chen Wei(1993-), female, born at Xianning, Hubei, postgraduate, major in probability theory and mathematical statistics.

Abstract: In this paper, we study the asymptotics properties of regularized matrix regressions estimators. We use Knight, Fu and Chatterjee, Lahiri's asymptotics method on vector regressions, and extend it to matrix regressions to study the asymptotics properties of nuclear norm regularized matrix regressions estimators. We have obtained the weak consistency and limiting distribution of nuclear norm estimators when the second order moments of random errors exist, i.e. $\mathbb{E}|\epsilon_{i}|^2 < \infty $. And under the condition of the lower order moments of random errors existing, i.e. $\mathbb{E}|\epsilon_{i}|^\alpha <\infty, 1 < \alpha < 2 $, we also have obtained the strong consistency and convergence rates of nuclear norm estimators.

Keywords: asymptotic theory linear regression nuclear norm

正则矩阵回归的渐近性质

陈伟

武汉大学数学与统计学院, 湖北武汉 430072

摘要：本文研究了正则化矩阵回归估计量的渐进性质等问题. 利用Knight, Fu和Chatterjee, Lahiri分别关于向量回归的Lasso估计量渐近性研究方法, 推广到矩阵回归, 研究核范数正则化矩阵回归估计量对应的渐近性质. 从而得到了核范数矩阵估计量在随机误差二阶矩存在即$\mathbb{E}|\epsilon_{i}|^2 < \infty $的条件下的弱相合性和极限分布, 以及在随机误差的低阶矩存在即$\mathbb{E}|\epsilon_{i}|^\alpha <\infty, 1 < \alpha < 2 $的条件下, 核范数矩阵参数估计量的强相合性以及对应的收敛速度.

关键词：渐近理论线性回归核范数

1 Introduction

For a rectangular matrix $ \textbf{B}\in \mathbb{R}^{p\times q}, \sigma_{j}(\bf{B}) $ denotes the largest singular value of it and is equal to the square root of the ith largest eigenvalue of $ \textbf{B}\textbf{B}^{T} . $ The rank of $ \textbf{B} $ will usually be denoted by $ \textbf{r} $ and is equal to the number of nonzero singular values. For matrices $ \textbf{B} $ and $ \textbf{X} $ of the same dimensions, we define the inner product in $ \mathbb{R}^{p\times q} $ as $ \langle\textbf{B}, \textbf{X}\rangle = {tr}\left(\textbf{X}^{T}\textbf{B}\right)=\langle {\textbf{B}}, {\textbf{X}}\rangle =\langle {\rm{vec}(\textbf{B}}), \rm{vec}({\textbf{X}})\rangle, $ where $ {\rm{vec}(\cdot)} $ is the vectorization operator that stacks the columns of a matrix into a vector. The norm associated with this inner product is called the Frobenius (or Hilbert–Schmidt) norm$ \|\cdot\|_{F} . $ The Frobenius norm is also equal to the Euclidean, or $ l_{1} $, norm of the vector of singular values, i.e.

$ \|\textbf{B}\|_{F}=\sqrt{\langle\textbf{B}, \textbf{B}\rangle}=\sqrt{{\rm{tr}}\left(\textbf{B}^{T}\textbf{B}\right)}=\left(\sum\limits_{i=1}^{p}\sum\limits_{i=1}^{q}B_{ij}^2 \right)^{1/2}=\left(\sum\limits_{j=1}^{r}\sigma_{j}^2 \right)^{1/2} , $

The nuclear norm of a matrix is equal to the sum of its singular values, i.e.

$ \|{\textbf B}\|_{*}:= \sum\limits_{i=1}^{r}\sigma_{i}(\textbf{B}), $

and is alternatively known by several others including the Scatten-1-norm, the Ky Fan r-norm, and the trace class norm. Since the singular values are all positive, the nuclear norm is also equal to the $ l_{1} $ norm of the vector of singular values. These two norms are related by the following inequalities which hold for any matrix $ \textbf{B} $ of rank at most $ \rm{r} $ (see [1]): $ \|{\textbf B}\|_{F}\leq\|{\textbf B}\|_{*}\leq \sqrt{\rm{r}}\|{\textbf B}\|_{F}. $

Consider the regularized matrix regression model (see [2])

$ \begin{align} {\rm Y}_{i}= \gamma^{T}\textbf{T}_{i}+\langle \textbf{B}, \textbf{X}_{i}\rangle + { \epsilon }_{i} \end{align} $

(1.1)

where $ \ { \epsilon }_{1}, \cdots , \ { \epsilon }_{n} $ are i.i.d. random variables with mean $ \rm{0} $ and variance $ {\sigma}^{2}, \gamma\in \mathbb{R}^m , \textbf{T}_{i}\in \mathbb{R}^m, \textbf{B}\in\mathbb{R}^{p\times q}, \textbf{X}_{i}\in\mathbb{R}^{p\times q}. $ This model is no longer limited to rank 1 matrices or low rank matrices, and is a generalization of the fixed rank matrix regression. Without loss of generality, we drop the vector covariate $ \textbf{T} $ and its associated parameter $ \gamma $ in subsequent discussion, that is

$ \begin{align} {\rm Y}_{i}=\langle \textbf{B}, \textbf{X}_{i}\rangle + { \epsilon }_{i}. \end{align} $

(1.2)

Then we estimate $ \textbf{B} $ by minimizing the penalized least squares criterion, where the penality is the nuclear norm of $ \textbf{B} $ (see [2]), i.e.

$ \begin{align} \sum\limits_{i=1}^{n}\left({ \rm Y}_{i}-\langle \textbf{B}, \textbf{X}_{i}\rangle \right)^2+\lambda\|\textbf{B}\|_{*}. \end{align} $

(1.3)

In order to facilitate the next work, we first introduce some symbols. For a given $ \lambda_{n} $, we will denote the estimator minimizing (1.3) by $ \hat{\textbf{B}}_{n} $. In particular, $ \lambda_{n}=0 $ corresponds to the ordinary LS. We denote this estimator by $ \hat{\textbf {B}}_{n}^{(0)}. $ We will assume the following regularity conditions for the design(see [3]),

$ \begin{align} C_{n}=\dfrac{1}{n} \sum\limits_{i=1}^{n}{\rm vec}(\textbf{X}_{i}){\rm vec}(\textbf{X}_{i})^T\longrightarrow C \end{align} $

(1.4)

where $ C $ is a nonnegative definite matrix and

$ \begin{align} \dfrac{1}{n}\max {\rm vec}({\textbf{X}_{i}})^T {\rm vec}({\textbf{X}_{i}})=\dfrac{1}{n}\max \|\textbf{X}_{i}\|_{F}^{2}\longrightarrow 0 \end{align} $

(1.5)

where $ \textbf{X}_{i}\in\mathbb{R}^{p\times q} $, and $ \textbf{X}_{i} $ is the $ i $th sample observation, then $ {\rm vec}(\textbf{X}_{i}) =(x_{11}^i, \cdots, x_{1q}^{i}, x_{21}^{i}, \cdots, x_{2q}^i $ $ , \cdots, x_{p1}^i, \cdots, x_{pq}^{i})^T $. In this paper, we assume that $ C_{n} $ is nonsingular for all $ n $.

The next of the paper is organized as follow. We formulate the limiting distribution about nuclear norm regularized matrix regression in Section 2. In Section 3, we discuss strong consistency about this regression estimators. While in Section 4, we give a discussion about the outlook of this paper.

2 Limiting Distribution

In this section, first we show the study of the limiting distribution of matrix regression based on the method of Knight and Fu. By studying the asymptotics behavior of the objective function (1.3) can determine the limiting behavior of the estimators $ \hat{\textbf{B}}_{n}. $ For instance, we will define the function

$ \begin{align} Z_{n}({\Phi})=\dfrac{1}{n}\sum\limits_{i=1}^{n}\left({\rm Y}_{i}-\langle \textbf{B}, \textbf{X}_{i}\rangle \right)^2+\dfrac{\lambda_{n}}{n}\| \Phi|_{*}, \end{align} $

(2.1)

to consider consistency of $ \hat{\textbf{B}}_{n}, $ when $ \Phi=\hat{\textbf{B}}_{n} , $ the function (2.1) gets the smallest value. The following result shows that is consistent under the condition of $ \lambda_{n}=o(n) $.

Theorem 2.1 If $ C $ in (1.4) is nonsingular and $ \lambda_{n}/n\to \lambda_{0}\geq n $, then $ \hat{\textbf{B}}_{n}\to _{p}\arg \min(Z) $ where $ Z(\Phi)={\rm vec}(\Phi-\textbf{B})^TC\rm{vec}(\Phi -\textbf{B})+\lambda_{0}\|\Phi \|_{*} . $

Thus if $ \lambda_{n}=o(n) , \arg\min (Z)=\textbf{B} $, and so $ \hat{\textbf{B}}_{n} $ is consistent.

Proof Define $ Z_{n} $ as in (2.1). Then let $ {\rm Y}_{i}= \langle \textbf{B}, \textbf{X}_{i}\rangle +{ \epsilon }_{i} $, so we have

$ \begin{align*} Z_{n}({\Phi})=\dfrac{1}{n}\sum\limits_{i=1}^{n}\left({ \epsilon }_{i}-{\rm vec}({\textbf{X}_{i}}){\rm vec}({\Phi}-\textbf{B})^T \right)^2+\dfrac{\lambda_{n}}{n}\|\Phi\|_{*}. \end{align*} $

Now we need to show that

$ \begin{align} \sup\limits_{\Phi\in K}\left|Z_{n}({\Phi})-Z({\Phi})-\sigma^{2} \right| \to _{p}0 \end{align} $

(2.2)

for any compact set $ K $ and that

$ \begin{align} \hat{\textbf{B}}_{n} =O_{p}(1). \end{align} $

(2.3)

Under (2.2) and (2.3), we have $ \arg\min Z_{n}\to _{p}\arg\min Z. $ Note that $ Z_{n} $ is convex; Thus (2.2)and (2.3) follow from the pointwise convergence in probability of $ Z_{n}(\Phi) $ to $ Z({\Phi})+\sigma^{2} $ by applying standard results(see [4]).

Theorem 2.2 If $ C $ in (1.4) is nonsingular and $ \lambda_{n}/\sqrt{n}\to \lambda_{0}\geq 0 $, then

$ \sqrt{n}(\hat{\bf{B}}_{n}-\bf{B})\to _{d}\arg \min(V) , $

where

$ V(U)=-2{\rm vec}({\textbf{U}})^{T}W+{\rm vec}({\textbf{U}})^{T}C{\rm vec}({\textbf{U}})+\lambda_{0}\max\limits_{d\in \partial\|B\|_*}\sum\limits_{i=1}^{\min(p, q)}d_{i}\alpha_{i}^T\textbf {U}\beta_{i} $

and W has a $ N({\textbf {0}}, \sigma_{2}C ) $ distribution.

Proof Define $ V_{n}(\textbf{U}) $ by

$ V_{n}(U)=\sum\limits_{i=1}^{n}\left[ (\epsilon_{i}-{\rm vec}(\textbf{U})^T\textbf{X}_{i}/\sqrt{n})^2-\epsilon_{i}^2\right]+\lambda_{n}\left[\|{\textbf {B}}+1/\sqrt{n} {\textbf {U}}\|_{*}-\| {\textbf {B}}\|_{*} \right], $

where $ \textbf{U}\in \mathbb{R}^{p\times q} , $ note that $ V_{n} $ is minimized at $ \sqrt{n}(\hat{\textbf{B}}_{n}-\textbf{B}) $.Because adding a constant does not change the position of the minimum point of the objection function. We rewrite the $ V_{n}(\textbf{U}) $, that is

$ \begin{align*} V_{n}(U)&=\left[ \sum\limits_{i=1}^{n}\left[ (\epsilon_{i}-{\rm vec}(\textbf{U})^T\textbf{X}_{i}/\sqrt{n})^2-\epsilon_{i}^2\right]+\lambda_{n}\|\textbf {B}+1/\sqrt{n}\textbf {U}\|_{*}\right] -\lambda_{n}\|\textbf {B}\|_{*}. \end{align*} $

First we have

$ \sum\limits_{i=1}^{n}\left[ (\epsilon_{i}-{\rm vec}(\textbf{U})^T\textbf{X}_{i}/\sqrt{n})^2-\epsilon_{i}^2\right]\to -2{\rm vec}({\textbf{U}})^TW+{\rm vec}({\bf{U}})^TC\rm{vec}({\textbf{U}}). $

Next using the theorem 2 (see [5]), let $ t=1/\sqrt{n} $, then the second term

$ \lambda_{n}\left[\|\textbf {B}+1/\sqrt{n}\textbf {U}\|_{*}-\lambda_{n}\|\textbf {B}\|_{*}\right]\to \lambda_{0}\max\limits_{d\in \partial\|\textbf {B}\|_{*}}\sum\limits_{i=1}^{\min(p, q)}d_{i}\alpha_{i}^T\textbf {U}\beta_{i} , $

where $ \alpha_{i} $ and $ \beta_{i} $ are singular vectors of B corresponding to the $ i $th largest singular value, $ \partial\|\textbf {B}\|_{*} $ denotes the subdifferential of $ \|\textbf {B}\|_{*} $. Therefore $ V_{n}(\textbf {U})\to_{d}V(\textbf {U}) $(as defined above) with the finite dimensional convergence holding trivially. Since $ V_{n} $ is convex and $ V $ has a unique minimum, it follows(see [6]) that $ \arg \min V_{n}(\textbf {U})= \sqrt{n}(\hat{\textbf {B}}_{n}-\textbf {B})\to _{d}\arg \min(V). $ In particular, when $ \lambda_{n}=0, V=-2{\rm vec}({\textbf{U}})^TW+{\rm vec}({\textbf{U}})^TC\rm{vec}({\textbf{U}}) $, we have

$ \sqrt{n}(\hat{\textbf {B}}_{n}-\textbf {B})= \arg \min(V)=C^{-1}W\sim N({\textbf{0}}, \sigma^{2}C^{-1}). $

3 Strong Consistency

Under the matrix regression model (1.2) with i.i.d. error variables $ \epsilon_{i} , $ where $ \mathbb{E}|\epsilon_{i}|<\infty $ and $ \mathbb{E}|\epsilon_{i}|=0 $. Now we consider the problem of strong consistency of the Lasso estimator assuming only finiteness of the first moment and some mild regularity conditions on the design matrix $ {\textbf{X}}_{i} $'s.

Theorem 3.1 Let $ \epsilon_{i} $ be i.i.d. random variables with $ \mathbb{E}|\epsilon_{i}|<\infty $ and $ \mathbb{E}|\epsilon_{i}|=0 $. If $ C $ in (1.4) is nonsingular and if $ \frac{\lambda_{n}}{n}\to 0 $, then $ \hat{\textbf{B}}_{n}\to \textbf{B} $ w.p.1.

Proof Note that

$ \begin{align*} \hat{\textbf{B}}_{n}&=\arg\min\left\lbrace\sum\limits_{i=1}^{n}\left( Y_{i}-\langle \Phi, \textbf{X}\rangle \right)^2+\lambda_{n}\| \Phi\|_{*} \right\rbrace \\ &=\arg\min\left\lbrace\sum\limits_{i=1}^{n}\left( \epsilon_{i}-\langle \Phi-\textbf{B}, \textbf{X}\rangle \right)^2+\lambda_{n}\|{\textbf{B}}+ \Phi-\textbf{B}\|_{*} \right\rbrace. \end{align*} $

Then $ \hat{\textbf{B}}_{n}-\textbf{B}=\arg\min{\sum_{i=1}^{n}\left( \epsilon_{i}-\langle \textbf{U}, \textbf{X}\rangle \right)^2+\lambda_{n}\|\textbf{B}+\textbf{U}\|_{*} } $. Recall that $ C_{n}=\frac{1}{n} \sum_{i=1}^{n}{\rm vec}(\textbf{X}_{i}) \cdot {\rm vec}(\textbf{X}_{i})^T\longrightarrow C $, let $ \gamma_{0, n}= $the smallest eigenvalue of $ C_{n} $ and $ \gamma_{0}= $ the smallest eigenvalue of $ C $ and let $ W_{n}=\frac{1}{n} \sum_{i=1}^{n}{\rm vec}(\textbf{X}_{i})\epsilon_{i} $. With lemma 3.1(see [7]),

$ \begin{align} W_{n}=\frac{1}{n} \sum\limits_{i=1}^{n}{\rm vec}(\textbf{X}_{i})\epsilon_{i}\to 0\ \ {\rm as} \ n \to \infty \quad w.p.1. \end{align} $

(3.1)

Since $ \sum_{i=1}^{n}\epsilon_{i}^2 $ does not involve $ \textbf{U} $, discarding this term from the criterion function above and dividing the resulting expression by $ \textbf{B} $, we have

$ \begin{align} \hat{\textbf{B}}_{n}-\textbf{B}&=\arg\min\left\lbrace {\rm vec}({\textbf{U}})^TC_{n}{\rm vec}({\textbf{U}})-2W_{n}^T{\rm vec}({\textbf{U}})+\dfrac{\lambda_{n}}{n}\left[ \|\textbf{B}+\textbf{U}\|_{*}-\|\textbf {B}\|_{*}\right] \right\rbrace \\ &=\arg\min V_{n}(\textbf{U}). \end{align} $

(3.2)

Note that for any $ \textbf{U}\in {\bf\mathbb{R}^{p\times q}} $,

$ \begin{align} V_{n}({\bf{U}})\geq \dfrac{\gamma_{0, n}}{\min\{p, q\}}\|{\textbf{U}}\|_{*}^2-2\|W_{n}\|_{*}\|{\textbf{U}}\|_{*}-\dfrac{\lambda_{n}}{n}\|{\textbf{U}}\|_{*}. \end{align} $

(3.3)

Next fix $ \eta\in(0, 1) $. Since $ \frac{\lambda_{n}}{n}\to o(1) $, there exists a $ n_{0}\in(0, \infty) $ such that $ \frac{\lambda_{n}}{n}\leq \eta $ and $ \gamma_{0, n}>\gamma_{0}/2 $ for all $ n\geq n_{0} $. On the set $ \left\lbrace\|W_{n}\|_{*}\leq \eta \right\rbrace $, by (3.3), for all $ {\textbf{U}}\in {\mathbb{R}^{p\times q}} $, with $ \|{\bf{U}}\|_{*}>6\min \{p, q\}\eta/\gamma_{0, n} $,

$ V_{n}({\textbf{U}})\geq \|{\textbf{U}}\|_{*}\left( \dfrac{\gamma_{0, n}}{\min\{p, q\}}\|{\textbf{U}}\|_{*}-3\eta\right)\geq \dfrac{\gamma_{0, n}}{2\min\{p, q\}}\|{\textbf{U}}\|_{*}^2>0. $

Since $ V_{n}(0)=0 $, it follows that for $ n\geq n_{0} $, the minimum of it cannot be attained in the set $ \left\lbrace {\bf{U}}: \|{\textbf{U}}\|_{*}>6\min \{p, q\}\eta/\gamma_{0, n} \right\rbrace $, whenever $ \left\lbrace\|W_{n}\|_{*}\leq \eta \right\rbrace $. Hence, it follows that for $ n\geq n_{0}, \left\lbrace\|W_{n}\|_{*}\leq \eta \right\rbrace $, implies $ \hat{{\textbf{B}}}_{n}-{\textbf{B}}=\arg\min V_{n}({\textbf{U}})\in \left\lbrace {\textbf{U}}: \|{\textbf{U}}\|_{*}\leq 6\min {p, q}\eta/\gamma_{0, n} \right\rbrace . $

In particular,

$ \begin{align*} P\left(\| \hat{{\textbf{B}}}_{n}-{\textbf{B}}\|_{*}>\frac{12\min \{p, q\}\eta}{\gamma_{0}}, i.o.\right) &\leq P\left(\| \hat{{\textbf{B}}}_{n}-{\textbf{B}}\|_{*}>\frac{6\min \{p, q\}\eta}{\gamma_{0, n}}, i.o.\right)\\ &\leq P\left( \|W_{n}\|_{*}> \eta , i.o. \right) =0, \end{align*} $

which follows from (3.1). Since $ \eta\in (0, \infty) $ is arbitrary, this completes the proof.

Theorem 3.2 Let $ \epsilon_{i} $ be i.i.d. random variables with $ \mathbb{E}|\epsilon_{i}|<\infty $ and $ \mathbb{E}|\epsilon_{i}|=0 $. Assume that (2.1) holds as $ n\to\infty $.

(a) if $ \frac{\lambda_{n}}{n}\to a\in (0, \infty) $ then

$ \hat{{\textbf{B}}}_{n}-{\textbf{B}}\to \arg\min V_{\infty}({\textbf{U}}, a), $

where $ V_{\infty}({\textbf{U}}, a)=\rm{vec}({\textbf{U}})^TC_{n} \rm{vec}({\textbf{U}}) +a\left[ \|\textbf{B}+\textbf{U}\|_{*}- \|\textbf{U}\|_{*}\right] $.

(b) if $ \frac{\lambda_{n}}{n}\to\infty $ then $ \hat{{\textbf{B}}}_{n}\to {\textbf{0}} $, w.p.1.

Proof First consider part (a), let $ V_{n}(\cdot) $ be as in (3.2). Note that $ \left| \|\textbf{B}+U\|_{*}- \|{\textbf{U}}\|_{*} \right|\leq \|{\textbf{U}}\|_{*} $, since $ \frac{\lambda_{n}}{n}\to a\in (0, \infty) $, for any compact set $ K\in{\mathbb{R}^{p\times q}} $,

$ \begin{align} &\sup\limits_{{U}\in K}\left|V_{n}({\textbf{U}})- V_{\infty}({\textbf{U}}, a)\right| \\ \leq&\sup\limits_{{U}\in K}\left[\|{\textbf{U}}\|_{*}^2\|C_{n}-C\|_{*}+2\|W_{n}\|_{*}\|{\textbf{U}}\|_{*} +\left|\frac{\lambda_{n}}{n}-a \right|\|{\textbf{U}}\|_{*} \right] \\ =&o(1)\ \quad {\rm as}\quad n\to \infty \quad w.p.1. \end{align} $

(3.4)

Let $ n_{0}\geq1 $ such that for all $ n\geq n_{0} , \lambda_{n}/{n}<2a $, and $ \gamma_{0, n}>\gamma_{0}/2 $. From (3.3), for all $ n\geq n_{0} $, on the set $ \left\lbrace \|W_{n}\|_{*}\leq a\right\rbrace $, we have

$ \begin{align*} V_{n}({\textbf{U}})&\geq \|{\textbf{U}}\|_{*}\left( \frac{\gamma_{0, n}}{\min \{p, q\}}\|{\textbf{U}}\|_{*}-2\|W_{n}\|_{*}-\lambda_{n}/{n}\right)\geq \frac{\|{\textbf{U}}\|_{*}}{2} \end{align*} $

for all $ \|{\textbf{U}}\|_{*}>(1+8a)/\gamma_{0}\equiv c_{0} $. Since $ V_{n}({\textbf{0}})=0 $, this implies $ \|\hat{{\textbf{B}}}_{n}-{\textbf{B}}\|_{*}\leq c_0 $, whenever $ n\geq n_{0} $ and $ \left\lbrace \|W_{n}\|_{*}\leq a\right\rbrace $. Thus, the minimizer of $ V_{n}({\textbf{U}}) $ lies in a compact set for all $ n\geq n_{0} $, provided $ \left\lbrace \|W_{n}\|_{*}\leq a\right\rbrace $. Since $ V_{\infty}(\cdot, a) $ is a convex function, by(3.4) and with (3.1), part (a) follows.

Next consider part (b). Let $ a_n^2=\lambda_{n}/{n} $, then $ a_n\to \infty $. Also, let

$ D_{n} =\left\lbrace {\textbf{U}}:\|{\textbf{U}}\|_{*}\leq a_n, \|{\textbf{B}}+{\textbf{U}}\|_{*}\geq a_n^{-1} \right\rbrace . $

With (3.1),

$ \begin{align} &\inf\limits_{U\in D_{n}}\left[ {\rm vec}({\textbf{U}})^TC_{n}{\rm vec}({\textbf{U}})-2W_{n}^T{\rm vec}({\textbf{U}})+a_{n}^2\| {\textbf{B}}+{\textbf{U}}\|_{*}\right] \\ \geq & \inf\limits_{U\in D_{n}}\left[ a_{n}^2\| {\textbf{B}}+{\textbf{U}}\|_{*}-2\|W_{n}\|_{*}\sup \left\lbrace \|{\textbf{U}}\|_{*}:{\textbf{U}}\in {\rm{D}}_{n}\right\rbrace \right] \\ \geq & a_n\left[1-2\|W_{n}\|_{*} \right] \to \infty \ {\rm as}\ n \to \infty\ w.p.1. \end{align} $

(3.5)

Also by (3.1),

$ \begin{align} &\inf\limits_{\|U\|_{*}\geq a_{n}}\left[ {\rm vec}({\textbf{U}})^TC_{n}{\rm vec}({\textbf{U}})-2W_{n}^T{\rm vec}({\textbf{U}})+a_{n}^2\| {\textbf{B}}+{\textbf{U}}\|_{*}\right] \\ \geq & \inf\limits_{\|U\|_{*}\geq a_{n}}\left[ \gamma_{0, n}\|{\textbf{U}}\|_{*}^2-2\|W_{n}\|_{*}\|{\textbf{U}}\|_{*} \right] \\ \geq & a_n\left[a_n\gamma_{0, n}-2\|W_{n}\|_{*} \right] \to \infty\ {\rm as} \ n \to \infty\ w.p.1. \end{align} $

(3.6)

Finally, with $ {\rm{D}}_{1, n}=\left\lbrace{\bf{U }}:\|{\textbf{B}}+{\textbf{U}}\|_{*}\leq a_n^{-1}\right\rbrace $

$ \begin{align} &\inf\limits_{U\in D_{1, n}}\left[ {\rm vec}({\textbf{U}})^TC_{n}{\rm vec}({\textbf{U}})-2W_{n}^T{\rm vec}({\textbf{U}})+a_{n}^2\|{\textbf{B}}+{\textbf{U}}\|_{*}\right] \\ \leq & {\textbf{U}}_{0}^TC_{n}{\textbf{U}}_{0}-2W_{n}^T{\textbf{U}}_{0}+a_{n}^2\| {\textbf{B}}+{\textbf{U}}_0\| \\ =& {\bf{U}}_{0}^TC_{n}{\bf{U}}_{0}-2W_{n}^T{\bf{U}}_{0} \\ \to& {\bf{B}}^TC{\bf{B}} \to \infty\ {\rm as} \ n \to \infty\ w.p.1 \end{align} $

(3.7)

where $ {\textbf{U}}_{0}=-{\textbf{B}}\in {\rm{D}}_{1, n}. $

Note that for any sequence $ \{{\textbf{U}}_{n}\}_{n\geq1} $, with $ {\textbf{U}}_{n}\in {\rm{D}}_{1, n}\|{\textbf{B}}+{\textbf{U}}_{n}\|_{*}\leq a_n^{-1} \to 0 $ as $ n\to \infty $. Hence, from (3.5)–(3.7) and (3.2), it follows that there sxists a set A with $ p(A)=1 $ and for all $ \omega\in A $, there exists a $ n_{\omega}\geq1 $ such that for all $ n\geq n_{\omega} $,

$ \begin{align*} \hat{{\textbf{B}}}_{n}-{\textbf{B}}&={\arg\min}_{U} V_{n}({\textbf{U}})\\ &={\arg\min}_{U\in D_{1, \infty} }\left[ {\rm vec}({\textbf{U}})^TC_{n}{\rm vec}({\textbf{U}})-2W_{n}^T{\rm vec}({\textbf{U}})+a_{n}^2\| {\textbf{B}}+{\textbf{U}}\|_{*}\right]\\ &\to -{\textbf{B}} \ {\rm as} \ n \to \infty . \end{align*} $

This completes the proof of part (b).

4 Discussion

An important contribution of this study is that, based on the regularized matrix regression of Zhou and Li (see [2]), we provide the corresponding statistical justification. Such that we derive the asymptotic normality of the proposed estimators. But there still have other issues. For example, when $ C_{n} $ [defined in (1.4)] is not nonsingular or nearly singular for each $ n $, then the parametrization in (1.2) is not unique. So a new singular design need to be brought to solve this problem.

References

[1]	Recht B, Fazel M, Parrilo P A. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization[J]. SIAM Review, 2010, 52(3): 471-501. DOI:10.1137/070697835
[2]	Zhou H, Li L. Regularized matrix regression[J]. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2014, 76(2): 463-483.
[3]	Knight K, Fu W. Asymptotics for lasso-type estimators[J]. Annals of Statistics, 2000, 28(2): 1356-1378.
[4]	Anderson P K, Gill R D. Cox' s regression model for counting processes: a large sample study[J]. Ann. Statist, 1982, 10(3): 1100-1120.
[5]	Watson G A. Characterization of the subdifierential of some matrix norms[J]. Linear Algebra and Its Applications, 1992, 170(3): 33-45.
[6]	Geyer C J. On the asymptotics of convex stochastic optimization[R]. Technical Report, School of Statistics, University of Minnesota, 1996.
[7]	Chatterjee A, Lahiri S N. Strong consistency of Lasso estimators[J]. Sankhya A, 2011, 73(1): 55-78. DOI:10.1007/s13171-011-0006-0