数学杂志  2018, Vol. 38 Issue (6): 1023-1030   PDF    
扩展功能
加入收藏夹
复制引文信息
加入引用管理器
Email Alert
RSS
本文作者相关文章
LI Xiang-li
ZHANG Wen
YU Jiang-lan
A MODIFIED STRATEGY IN ALTERNATING NON-NEGATIVE LEAST SQUARES FOR NON-NEGATIVE MATRIX FACTORIZATION
LI Xiang-li, ZHANG Wen, YU Jiang-lan    
School of Mathematics and Computing Science; Guangxi Key Laboratory of Cryptography and Information Security; Guangxi Key Laboratory of Automatic Detecting Technology and Instruments, Guilin University of Electronic Technology, Guilin 541004, China
Abstract: In this paper, we study the global convergence of non-negative least squares (ANLS) for NMF. By applying a modified strategy to guarantee the existence of the limit point, we obtain the limit point is a stationary point of NMF. In addition, we give generalized modified strategies. Numerical experimental results show the above strategies are effective.
Keywords: non-negative matrix factorization (NMF)     alternating non-negative least squares (ANLS)     modified strategy    
求解非负矩阵分解的交替非负最小二乘法的一种修正策略
李向利, 张雯, 余江兰    
桂林电子科技大学数学与计算科学学院; 广西密码学与信息安全重点实验室; 广西自动检测技术与仪器重点实验室, 广西 桂林 541004
摘要:本文研究了关于求解非负矩阵分解的交替非负最小二乘法的全局收敛性.利用一种修正策略保证了极限点的存在性,得到了极限点为非负矩阵分解问题的稳定点.此外,给出了推广的修正策略.数值实验结果表明上述修正策略是有效的.
关键词非负矩阵分解    交替非负最小二乘法    修正策略    
1 Introduction

Non-negative matrix factorization (NMF) [1] is different to other factorization methods because non-negative constraints are imposed. This makes factorization results more sense in the real word.

NMF attempts to decompose a given nonnegative matrix $A$ into the product of a non-negative basis matrix $W$ and a non-negative coefficient matrix $H$, which can be interpreted as follow

$ \begin{equation} \label{NMF} A\approx W\times H, \end{equation} $ (1.1)

where $A \in R^{m\times n}$, $W \in R^{m\times r}$ and $H \in R^{r\times n}$. The factorization results show that NMF can generate a reduced representation of the original date. In this sense, NMF is useful for extracting parts-based representations.

In recent years, NMF, as a parts-based representation of data, received considerable attentions in machine learning, signal processing, computer vision and so on [2-5].

In order to decrease the approximation error between $A$ and $W H$, NMF be expressed as a constrained optimization problem

$ \begin{equation} \begin{split} \min\limits_{W, H}F(W, H)=\frac{1}{2}\|A-WH\|^2_F~~~~ {\rm s.t.}~~~~ W\geq0, ~~ H\geq 0, \end{split} \end{equation} $ (1.2)

where $\|\cdot\|$ is the Frobenius norm, and $W\geq0~ (H\geq0)$ means that all elements of the matrix $W (H)$ are nonnegative. Kush-Kuhn-Tucker (KKT) [6] conditions for problem (1.2) are expressed as follows

$ \begin{equation} \begin{split} &W\geq0, ~~H\geq0, ~~ \bigtriangledown_{W}F(W, H)=(WH-A)H^{T}\geq0, \\ &\bigtriangledown_{H}F(W, H)=W^{T}(WH-A)\geq0, \\ &\langle W, \bigtriangledown_{W}F(W, H)\rangle=0, ~~ \langle H, \bigtriangledown_{H}F(W, H)\rangle=0. \end{split} \end{equation} $ (1.3)

Alternating non-negative least squares (ANLS) [2, 7, 8] works well in practice, whose convergent speed is fast and elements not be locked. Iterate the following until a stopping criterion is satisfied

$ \begin{eqnarray} &&H^{k+1}= \arg \min\limits_{H\geq0}F(W^k, H), \\ &&W^{k+1}= \arg \min\limits_{W\geq0}F(W, H^{k+1}). \end{eqnarray} $ (1.4)

Clearly, $F(W^k, H)$ and $F(W, H^{k+1})$ is convex, respectively.

ANLS is widely used as an efficient computational method for NMF, but the ANLS has a serious problem that its global convergence is not guaranteed in theory. In other words, for any initial solution, the sequence of ANLS contains at least one limit point and this limit point is a stationary point of NMF. The main difficulty in proving global convergence is that the existence of the limit point.

In [9], authors proposed a modified strategy and applied it to ANLS. The modified strategy is valid for proving global convergence. Motivated by the works of [9], we present a modified strategy to guarantee the existence of the limit point, and the limit point is a stationary point of NMF. In addition, we give generalized modified strategies. We can apply it in reality.

The rest of this paper is organized as follows. Our modified strategy is given in Section 2. Convergence analysis is established in Section 3. In Section 4, we give two generalized modified strategies. Finally, we conclude this paper in Section 5.

We sum up here briefly our notations. Let

$ A = {\left( {{a_{ij}}} \right)_{m \times n}}, {\left\| A \right\|_F} = \sqrt {\sum\limits_i {\sum\limits_j {a_{ij}^2} } } {\rm{, Tr}}\left( A \right) = \sum\limits_i {{a_{ii}}} . $

Let $B=R^{m\times n}$, $\langle A, B\rangle={\rm Tr}(AB^T)$. $A_{\cdot i}$ denotes the $i$-th column of $A$, $A_{i\cdot}$ denotes the $i$-th row of $A$. Let $\|x\|$ denote any norm of a vector, and $\|x\|_{2}$ denote 2-norm of a vector $x$.

2 Algorithm for Updating and Element of Matrix

In this section, based on the literature [9], we will present our modified strategy. Let $(W^{k}, H^{k})$ be generated by alternating non-negative least squares (ANLS).

First, we state the modified strategy of [9] be given as follows

$ \bar{W}^{k}=W^{k}\Lambda_{1}, \bar{H}^{k}=\Lambda_{2}H^{k}, $

where $\Lambda_{1}={\rm diag}(\lambda_{j}^{1})$, $\Lambda_{2}={\rm diag}(\lambda_{j}^{2})$, $j=1, 2, \cdots, r, $

$ \begin{equation} \lambda_{j}^{1}=\begin{cases} 0, &\text{ if $\|W_{\cdot j}^{k}\|\cdot\|H_{j \cdot}^{k}\|=0$}, \\ \sqrt{\alpha\frac{\|H_{j \cdot}^{k}\|}{\|W_{\cdot j}^{k}\|}}, &\text{otherwise}, \end{cases} \end{equation} $ (2.1)
$ \begin{equation} \lambda_{j}^{2}=\begin{cases} 0, &\text{ if $\|W_{\cdot j}^{k}\|\cdot\|H_{j \cdot}^{k}\|=0$}, \\ \sqrt{\frac{\|W_{\cdot j}^{k}\|}{\alpha\|H_{j \cdot}^{k}\|}}, &\text{otherwise}, \end{cases} \end{equation} $ (2.2)

where $\alpha$ is a positive constant. The above strategy can ensure ANLS is convergent.

Motivated by the work of [9], we give another modified strategy. In NMF, we have the fact that

$ \begin{eqnarray*} WH=\sum^{r}_{j=1}W_{\cdot j}H_{j\cdot} =\sum\limits_{\|W_{\cdot j}^{k}\|\cdot\|H_{j \cdot}^{k}\|\neq0}\lambda_{j}W_{\cdot j}\cdot\frac{1}{\lambda_{j}}H_{j \cdot}, ~~\forall\lambda_{j}>0. \end{eqnarray*} $

For $\|W_{\cdot j}\|\cdot\|H_{j \cdot}\|\neq0$, let $\bar{W}_{\cdot j}=\lambda_{j}W_{\cdot j}, ~~\bar{H}_{j\cdot}=\frac{1}{\lambda_{j}}H_{j \cdot}.$ Furthermore, $\|\bar{W}_{\cdot j}\|=\|\bar{H}_{j\cdot}\|+\alpha$ ($\alpha>0$). We have

$ \lambda_{j}=\frac{\alpha+\sqrt{\alpha^{2}+4\|W_{\cdot j}\|\cdot\|H_{j \cdot}\|}}{2\|W_{\cdot j}\|}. $

Hence, our modified strategy be described as follows: $\bar{W}^{k}=W^{k}\Lambda_{1}, \bar{H}^{k}=\Lambda_{2}H^{k}, $ where $\Lambda_{1}={\rm diag}(\lambda_{j}^{1})$, $\Lambda_{2}={\rm diag}(\lambda_{j}^{2})$, $j=1, 2, \cdots, r, $

$\begin{eqnarray} &&\lambda_{j}^{1}=\begin{cases} 0, &\text{ if $\|W_{\cdot j}^{k}\|\cdot\|H_{j \cdot}^{k}\|=0$}, \\ \frac{\alpha+\sqrt{\alpha^{2}+4\|W_{\cdot j}^{k}\|\cdot\|H_{j \cdot}^{k}\|}}{2\|W_{\cdot j}^{k}\|}, &\text{otherwise}, \end{cases}\\ \end{eqnarray} $ (2.3)
$\begin{eqnarray} &&\lambda_{j}^{2}=\begin{cases} 0, &\text{ if $\|W_{\cdot j}^{k}\|\cdot\|H_{j \cdot}^{k}\|=0$}, \\ \frac{2\|W_{\cdot j}^{k}\|}{\alpha+\sqrt{\alpha^{2}+4\|W_{\cdot j}^{k}\|\cdot\|H_{j \cdot}^{k}\|}}, &\text{otherwise}. \end{cases} \end{eqnarray} $ (2.4)
3 Algorithm

Based on the above analysis, in this section, we report ANLS with Strategy 1 for NMF as follows.

Algorithm 1 (s.1) Give the starting point $(\bar{W}^{0}, \bar{H}^{0})\geq0$ and $\alpha\geq0$. Set $k=1$.

(s.2) Stop if $(\bar{W}^{k}, \bar{H}^{k})$ satisfies (1.3),

$ \begin{eqnarray} &&W^{k+1}= \arg \min\limits_{W\geq0}F(W, \bar{H}^{k}), \end{eqnarray} $ (3.1)
$ \begin{eqnarray} &&H^{k+1}= \arg \min\limits_{H\geq0}F(W^{k+1}, H). \end{eqnarray} $ (3.2)

(s.3) Modified $(W^{k+1}, H^{k+1})$ to $(\bar{W}^{k+1}, \bar{H}^{k+1})$ using Strategy 1.

(s.4) Let $k=k+1$. Go to s.2.

4 Convergence Analysis

Similar to [9], the function of (1.4) ((1.5)) has a global minimizer on $R^{n\times r}_{+}$. In this section, we come to study the global convergence of the algorithm.

Lemma 4.1 Let $\{\bar{W}^{k}, \bar{H}^{k}\}$ be generated by Algorithm 1. Then $\{\bar{W}^{k}, \bar{H}^{k}\}$ is bounded.

Proof $H^{k}$ is a stationary of $F(W^{k}, H)$, we have $\langle(W^{k})^{T}(W^{k}H^{k}-A), H^{k}=0\rangle, $

$ \begin{eqnarray*} \|A-W^{k}H^{k}\|^{2}_{F}&=&\|A\|^{2}_{F}-2\langle A, W^{k}H^{k}\rangle+\|W^{k}H^{k}\|^{2}_{F}\\ &=&\|A\|^{2}_{F}-2\langle A-W^{k}H^{k}, W^{k}H^{k}\rangle-\|W^{k}H^{k}\|^{2}_{F}\\ &=&\|A\|^{2}_{F}-\|W^{k}H^{k}\|^{2}_{F} \geq0. \end{eqnarray*} $

Therefore, $\|\bar{W}^{k}\bar{H}^{k}\|_{F}=\|W^{k}H^{k}\|_{F}\leq\|A\|_{F}.$

Based on $\|\bar{W}_{\cdot j}\|=\|\bar{H}_{j\cdot}\|+\alpha$, we deduce $\|\bar{W}_{\cdot j}^{k}\|=\|\bar{H}_{j\cdot}^{k}\|+\alpha$. Meanwhile there exist $c_{1}, c_{2}, c_{3}, c_{4}>0$, such that

$ c_{2}\|\bar{W}_{\cdot j}^{k}\|_{2}\leq\|\bar{W}_{\cdot j}^{k}\|\leq c_{1}\|\bar{W}_{\cdot j}^{k}\|_{2}, ~~ c_{3}\|\bar{H}_{j\cdot}^{k}\|_{2}\leq\|\bar{H}_{j\cdot}^{k}\|\leq c_{4}\|\bar{H}_{j\cdot}^{k}\|_{2}. $

We can get

$ \|\bar{W}_{\cdot j}^{k}\|_{2}\geq \frac{\|\bar{W}_{\cdot j}^{k}\|}{c_{1}}=\frac{\|\bar{H}_{j\cdot}^{k}\|+\alpha}{c_{1}}\geq\frac{\|c_{3}\bar{H}_{j\cdot}^{k}\|_{2}+\alpha}{c_{1}} $

and

$ \|\bar{H}_{j\cdot}^{k}\|_{2}\geq \frac{\|\bar{H}_{j\cdot}^{k}\|}{c_{4}}=\frac{\|\bar{W}_{\cdot j}^{k}\|-\alpha}{c_{1}}\geq\frac{\|c_{2}\bar{W}_{\cdot j}^{k}\|_{2}-\alpha}{c_{4}}. $

Furthermore,

$ \begin{eqnarray*} \|\bar{W}^{k}\bar{H}^{k}\|^{2}_{F} &=&\langle\sum^{r}_{j=1}\bar{W}^{k}_{\cdot j}\bar{H}^{k}_{j\cdot}, \sum^{r}_{j=1}\bar{W}^{k}_{\cdot j}\bar{H}^{k}_{j\cdot}\rangle \geq\sum^{r}_{j=1}\|\bar{W}^{k}_{\cdot j}\bar{H}^{k}_{j\cdot}\|^{2}_{F}=\sum^{r}_{j=1}\|\bar{W}^{k}_{\cdot j}\|\|\bar{H}^{k}_{j\cdot}\|^{2}_{2}\\ &\geq&\sum^{r}_{j=1}(\frac{\|c_{3}\bar{H}_{j\cdot}^{k}\|_{2}+\alpha}{c_{1}})^{2}\|\bar{H}^{k}_{j\cdot}\|^{2}_{2}=\sum^{r}_{j=1}\frac{(\|c_{3}\bar{H}_{j\cdot}^{k}\|_{2}^{2}+\alpha\|\bar{H}_{j\cdot}^{k}\|_{2})^{2}}{c_{1}^{2}}, \end{eqnarray*} $

where

$ \begin{eqnarray*} &&\sum\limits^{r}_{j=1}(\|c_{3}\bar{H}_{j\cdot}^{k}\|_{2}^{2}+\alpha\|\bar{H}_{j\cdot}^{k}\|_{2})^{2} \geq\sum\limits^{r}_{j=1}(c_{3}^{2}\|\bar{H}_{j\cdot}^{k}\|_{2}^{4}+\alpha^{2}\|\bar{H}_{j\cdot}^{k}\|_{2}^{2})\\ &=&c_{3}^{2}\sum\limits^{r}_{j=1}\|\bar{H}_{j\cdot}^{k}\|_{2}^{4}+\alpha^{2}\sum\limits^{r}_{j=1}\|\bar{H}_{j\cdot}^{k}\|_{2}^{2} \geq c_{3}^{2}\cdot\frac{(\sum\limits^{r}_{j=1}\|\bar{H}_{j\cdot}^{k}\|_{2}^{2})^{2}}{r}+\alpha^{2}\sum\limits^{r}_{j=1}\|\bar{H}_{j\cdot}^{k}\|_{2}^{2}. \end{eqnarray*} $

Suppose $\sum\limits^{r}_{j=1}\|\bar{H}_{j\cdot}^{k}\|_{2}^{2}=m$, we have $c^{2}_{3}\cdot\frac{m^{2}}{r}+\alpha^{2}\cdot m\leq c_{1}^{2}\|A\|^{2}_{F}.$ By the nature of the quadratic function, there must be a positive constant $M$, such that $m\leq M.$ Hence,

$ \|\bar{H}^{k}\|_{F}=\sqrt{\sum\limits^{r}_{j=1}\|\bar{H}_{j\cdot}^{k}\|_{2}^{2}}=m\leq M. $

Similarly, we can get the conclusion that $\|\bar{W}^{k}\|_{F}$ is bounded. Therefore, $\{\bar{W}^{k}, \bar{H}^{k}\}$ is bounded.

Lemma 4.2 Let $\{\bar{W}^{k}, \bar{H}^{k}\}$ be generated by Algorithm 1, and $(W^{*}, H^{*})$ be a limit point of $\{\bar{W}^{k}, \bar{H}^{k}\}$. Then $(W^{*}, H^{*})$ is a stationary point of problem (1.2).

Since the above theorem corresponds to [9, Theorem 2] and the proof is the same as that in [9, Theorem 2], we will omit it here.

5 Generalized Modifled Strategies

From the above results, $\alpha$ is a positive constant no matter Strategy 1 or the strategy of [9]. We might as well replace $\alpha$ with $g(\alpha)$, in which $g(\alpha)$ is a function of $\alpha$ and $g(\alpha)>0$. Thus, we can get two generalized modified strategies.

The convergence of ANLS also is established when $g(\alpha)>0$. In reality, we can let $g(\alpha)=\frac{1}{\alpha^{2}}$ or $g(\alpha)=e^{-\alpha}$ and so on. For different application problems, the choose of $g(\alpha)$ is different.

6 Numerical Experiments

In this section, we give some numerical experiments for the proposed modified strategies. We apply these strategies to the following algorithms: SBBNMF [9], BBNMF [10] and PGNMF [11]. All codes of the computer procedures are written in MATLAB and run in MATLAB 7.10, and are carried out on a PC (CPU 2.13GHz, 2G memory) with Windows 7 operation system environment.

In experiments, we will compare the following statistics: the number of inner iterations (Iter), the number of outer iterations (Oter), the objective function value (Fun), the residual function value (Res), and the CPU time in second (Time). The Res formula is similar to [9], and the initial parameters

$ \epsilon=10^{-6}, ~{\rm Oter}\leq10000, ~{\rm Iter}\leq10000, ~{\rm Time}\leq300. $

In order to avoid the initial point affect the numerical result, in every experiment, we use 30 initial points, and every initial point is randomly generated. We list the following four experimental tables for different $\alpha$.

The three tables indicates the proposed modified strategies are utility for these algorithms (SBBNMF, BBNMF, and PGNMF) of ANLS framework. In some experiments, Iter (Oter) is not an integer, because we test the average value the 30 initial point. In other words, Iter (Oter) represents the average number of iterations.

Table 1
$A={\rm abs}({\rm rand}n(m, r))*{\rm abs}({\rm rand}n(r, n)), \alpha=1$

Table 2
$A={\rm abs}({\rm rand}n(m, r))*{\rm abs}({\rm rand}n(r, n)), \alpha=0$

Table 3
$A={\rm abs}({\rm rand}n(m, r))*{\rm abs}({\rm rand}n(r, n)), g(\alpha)=e^{-\alpha}$
7 Convergence Analysis

Nonnegative matrix factorization (NMF) is not only a well-known matrix decomposition approach but also an utility and efficient feature extraction technique. In this paper, we propose a modified strategy, which can ensure the convergence of ANLS. In addition, we give generalized modified strategies.

References
[1] Lee D D, Seung H S. Learning the parts of objects by non-negative matrix factorization[J]. Nature, 1999, 401(6755): 788–791. DOI:10.1038/44565
[2] Andrzej C, Rafal Z, Shun I A. New algorithms for non-negative matrix factorization in applications to blind source separation[J]. IEEE Intern. Confer. Acoustics, Speech Signal Proc., 2006: 870–879.
[3] Gao B, Woo W L, Dlay S S. Variational regularized 2-d nonnegative matrix factorization[J]. Neural Networks Learn. Sys. IEEE Trans., 2012, 23(5): 703–716. DOI:10.1109/TNNLS.2012.2187925
[4] Shahnaz F, Berry M W, Pauca V P, Plemmons R J. Document clustering using nonnegative matrix factorization[J]. Inform. Proc. Mgt., 2006, 42(2): 373–386. DOI:10.1016/j.ipm.2004.11.005
[5] Wang Yuan, Jia Yunde, Hu Changbo, Matthewturk. Non-negative matrix factorization framework for face recognition[J]. Intern. J. Patt. Rec. Arti. Intel., 2008, 19(4): 495–511.
[6] Bo yd, Vandenberghe, Faybusovich. Convex optimization[J]. IEEE Trans. Auto. Control, 2006, 51(11): 1859–1859. DOI:10.1109/TAC.2006.884922
[7] Paatero P, Tapper U. Positive matrix factorization:A non-negative factor model with optimal utilization of error estimates of data values[J]. Environmetrics, 1994, 5(2): 111–126. DOI:10.1002/(ISSN)1099-095X
[8] Rafal Zdunek, Andrzej Cichocki. Non-negative matrix factorization with quasi-newton optimization[J]. Intern. Confer. Arti. Intel. Soft Comp., 2006: 870–879.
[9] Li Xiangli, Liu Hongwei. Modified subspace barzilai-borwein gradient method for non-negative matrix factorization[J]. Kluwer Academic Publishers, 2013, 55: 173–196.
[10] Xiao Yunhai, Hu Qingjie. Subspace Barzilai-Borwein gradient method for large-scale bound constrained optimization[J]. Appl. Math. Optim., 2008, 58: 275–290. DOI:10.1007/s00245-008-9038-9
[11] Chih-Jen Lin. Projected gradient methods for nonnegative matrix factorization[J]. Neural Comput., 2014, 19(10): 2756–2779.