数学杂志  2020, Vol. 40 Issue (3): 323-331   PDF    
扩展功能
加入收藏夹
复制引文信息
加入引用管理器
Email Alert
RSS
本文作者相关文章
李春梅
王翠方
段雪峰
求解一类矩阵迹极小化问题的非线性共轭梯度法
李春梅1,2, 王翠方2, 段雪峰2    
1. 贵州师范大学数学科学学院, 贵州 贵阳 550001;
2. 桂林电子科技大学数学与计算科学学院, 广西 桂林 541004
摘要:本文研究了图分割问题中的矩阵迹极小化问题.利用半正定矩阵的Gramian表示,将该问题转化为无约束优化问题,设计了Armijo线搜索下的非线性共轭梯度方法进行求解.数值例子表明新方法是可行的.
关键词矩阵迹极小化    Gramian表示    非线性共轭梯度法    
THE NONLINEAR CONJUGATE GRADIENT METHOD FOR SOLVING A CLASS OF THE MATRIX TRACE MINIMIZATION PROBLEM
LI Chun-mei1,2, WANG Cui-fang2, DUAN Xue-feng2    
1. School of Mathematical Science, Guizhou Normal University, Guiyang 550001, China;
2. School of Mathematics and Computational Science, Guilin University of Electronic Technology, Guilin 541004, China
Abstract: In this paper, we consider the trace minimization problem in graph partitioning. Using the Gramian representation of the positive semidefinite matrix, the problem can be formulated as the unconstrained optimization problem, then the nonlinear conjugate method with the Armijo line search is used to solve it. Numerical experiments illustrate the feasibility of the new method.
Keywords: matrix trace minimization     Gramian representation     nonlinear conjugate gradient method    
1 引言

本文用$ {\bf{R}}^{n\times n} $$ {\bf{S R}}_{+}^{n \times n} $分别表示$ n \times n $阶实矩阵和$ n \times n $阶实对称半正定矩阵构成的集合.用$ \operatorname{diag}(\boldsymbol{Y}) $表示矩阵$ Y $的对角线元素组成的向量.用$ \operatorname{tr}(\boldsymbol{Y}), \quad \boldsymbol{Y}^{\mathrm{T}} $$ \operatorname{rank}(\boldsymbol{Y}) $分别表示矩阵的迹, 转置和秩. $ e $表示全1向量.本文主要研究图分割中的矩阵迹极小化问题

$ \begin{align} {\min \operatorname{tr}(\boldsymbol{Q Y})}, \quad {\text {s.t.} \operatorname{diag}(\boldsymbol{Y}) = \boldsymbol{e}} , \quad {\boldsymbol{Y} \in {\bf{S R}}_{+}^{n \times n}}. \end{align} $ (1.1)

该问题来自于加权无向图的最小割问题, 具体如下:给定一个加权无向图$ G = (V, E) $, 最小割问题在于寻找一种节点的划分, 使得切边的权重之和最小[1], 即

$ \begin{align} {\min y^{\mathrm{T}} \boldsymbol{Q} y, } \quad \quad {\text {s.t.} \quad y_{i}^{2} = 1, i = 1, \cdots, n, } \end{align} $ (1.2)

其中$ \boldsymbol{Q} = \frac{1}{4}(\operatorname{diag}(\boldsymbol{A} e)-\boldsymbol{A}) $是该图的负Laplace矩阵, $ \boldsymbol{A} $为加权无向图的邻接矩阵.又对于任意的对称矩阵$ \boldsymbol{M} $都可以表示为图的Laplace矩阵加上对角阵$ \boldsymbol{D} $的形式, 即$ \boldsymbol{M} = \boldsymbol{Q}+\boldsymbol{D} $, 且对于任意的$ y \in\{-1, 1\}^{n} $都有$ y^{\mathrm{T}} \boldsymbol{D} y = \operatorname{tr}(\boldsymbol{D}) $, 所以问题(1.2)等价于如下问题.

$ \begin{align} {\min y^{\mathrm{T}} \boldsymbol{M} y, } \quad \quad {\text {s.t. } y \in\{-1, 1\}^{n}.} \end{align} $ (1.3)

$ \boldsymbol{Y} = y y^{\mathrm{T}} $, 则问题(1.3)可等价写成(1.1).另外, 矩阵迹极小化问题还来源于图像还原[2]、数据降维[3]、机器学习[4]等领域.

近年来, 许多人研究了矩阵迹极小化问题, 许多数值方法被提出.比如, 1992年, Henk[5]等提出了一类矩阵迹极小化的算法, 它包含了Procrustes rotation加权算法, 有效的提高了图像分割中的矩阵迹极小化问题的计算效率. 1996年, Bental[6]等人实现了由双边约束下的不确定二次矩阵迹极小化问题到简单线约束的凸矩阵迹极小化问题的转化. 2000年, Benson[7]等提出了一个双尺度内点法, 解决了布尔约束条件下组合矩阵迹极小化问题和二次矩阵迹极小化问题的半正定松弛问题. 2009年, Burer[8]等提出了二级松弛, 解决了计算量与矩阵规模不平衡的问题, 从另一个层面上提高了解决迹函数极小化的计算效率.随后, 2011年Grippo[1]等人提出一个连续可微的精确价值函数, 并利用该价值函数设计了具有全局收敛性的低秩半正定松弛的精确罚算法, 从而解决了图像分割中的二次等式约束的矩阵迹极小化问题. 2016年, Liu[9]等人设计了一个新的Riemannian信赖域方法来解决Stiefel流形上的矩阵迹极大化问题, 即相对于典型的Riemannian信赖域方法提出了交替方向法以及高斯-赛德尔方法的初值策略, 这一方法在收敛速度和节省储存量方面有明显优势.然而, 对于求解问题(1.1)的数值方法较少.

本文利用Gramian表示和三角变换将矩阵迹极小化问题(1.1)转化为无约束优化问题.然后设计了Armijo线搜索下的非线性共轭梯度法求解无约束优化问题.最后用数值例子验证了新方法的可行性.

2 主要结果

本节先利用Gramian表示刻画问题(1.1)的可行集, 将该问题转化为无约束优化问题, 再设计非线性共轭梯度法求解.先给出两个引理.

引理2.1 (Gramian表示) [10]$ Y $$ n \times n $阶对称半正定矩阵, 则$ \operatorname{rank}(\boldsymbol{Y}) \leq r $当且仅当存在一个矩阵$ \boldsymbol{V} \in {\bf{R}}^{n \times k} $满足$ \boldsymbol{Y} = \boldsymbol{V} \boldsymbol{V}^{\mathrm{T}} $.

由引理2.1知可利用$ \boldsymbol{Y} = \boldsymbol{V} \boldsymbol{V}^{\mathrm{T}}, \boldsymbol{V} \in {\bf{R}}^{n\times r} $刻画问题(1.1)的可行集, 即

$ \Omega = \left\{\boldsymbol{Y} | \boldsymbol{Y} \in {\bf{S} \bf{R}}_{+}^{n \times n}, \operatorname{rank}(\boldsymbol{Y}) \leqslant r\right\}. $

因此问题$ (1.1) $可等价转化为下述优化问题:

$ \begin{align} {\min \operatorname{tr}\left(\boldsymbol{Q} \boldsymbol{V} \boldsymbol{V}^{\mathrm{T}}\right)}, \quad \quad {\text {s.t. } \operatorname{diag}\left(\boldsymbol{V} \boldsymbol{V}^{\mathrm{T}}\right) = e, } \quad \quad {\boldsymbol{V} \in {\bf{R}}^{n \times r}}. \end{align} $ (2.1)

为了刻画问题(2.1)的可行集, 引入如下引理.

引理2.2 [11]令矩阵$ \boldsymbol{V} = \left[V_{1}, V_{2}, \cdots V_{r}\right] = \left[ \begin{array}{cccc}{v_{11}} & {v_{12}} & {\cdots} & {v_{1 r}} \\ {v_{21}} & {v_{22}} & {\cdots} & {v_{2 r}} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ {v_{n 1}} & {v_{n 2}} & {\cdots} & {v_{n \pi}}\end{array}\right] \in {\bf{R}}^{n \times r}, $

$ V_{1} = \left[ \begin{array}{c}{\cos \alpha_{11}} \\ {\cos \alpha_{21}} \\ {\vdots} \\ {\cos \alpha_{n 1}}\end{array}\right], \quad V_{2} = \left[ \begin{array}{c}{\cos \alpha_{11} \sin \alpha_{11}} \\ {\cos \alpha_{21} \sin \alpha_{21}} \\ {\vdots} \\ {\cos \alpha_{n 1} \sin \alpha_{n 1}}\end{array}\right], \; \ldots, \; V_{r-1} = \left[ \begin{array}{c}{\cos \alpha_{1, r-1} \prod\limits_{l = 1}^{r-2} \sin \alpha_{1 l}} \\ {\cos \alpha_{2, r-1} \prod\limits_{l = 1}^{r-2} \sin \alpha_{2 l}} \\ {\vdots} \\ {\cos \alpha_{n, r-1} \prod\limits_{l = 1}^{r-2} \sin \alpha_{n l}}\end{array}\right], $
$ V_{r} = \left[ \begin{array}{c}{\prod\limits_{l = 1}^{r-1} \sin \alpha_{1 l}} \\ {\prod\limits_{l = 1}^{r-1} \sin \alpha_{2 l}} \\ {\vdots}\\ {\prod\limits_{l = 1}^{r-1} \sin \alpha_{n l}}\end{array}\right], $

其中$ \alpha_{i j} \in {\bf{R}}, i = 1, 2, \cdots r-1, $$ \boldsymbol{Y} = \boldsymbol{V} \boldsymbol{V}^{\mathrm{T}} \in {\bf{S} \bf{R}}_{+}^{\mathrm{n} \times n}, $$ \operatorname{rank}(\boldsymbol{Y}) \leqslant r, \operatorname{diag}(\boldsymbol{Y}) = e. $

注2.1 引用一个简单的$ 3\times2 $阶矩阵解释引理2.2.令

$ \boldsymbol{V} = \left[V_{1}, V_{2}\right] = \left[ \begin{array}{ll}{\cos \alpha_{11}} & {\sin \alpha_{11}} \\ {\cos \alpha_{21}} & {\sin \alpha_{21}} \\ {\cos \alpha_{31}} & {\sin \alpha_{31}}\end{array}\right]. $

易知

$ Y = \boldsymbol{V} \boldsymbol{V}^{T} = \left[ \begin{array}{cc}{\cos \alpha_{11}} & {\sin \alpha_{11}} \\ {\cos \alpha_{21}} & {\sin \alpha_{21}} \\ {\cos \alpha_{31}} & {\sin \alpha_{31}}\end{array}\right| \left[ \begin{array}{cc}{\cos \alpha_{11}} & {\sin \alpha_{11}} \\ {\cos \alpha_{21}} & {\sin \alpha_{21}} \\ {\cos \alpha_{31}} & {\sin \alpha_{31}}\end{array}\right]^{\mathrm{T}} $
$ { = \left[ \begin{array}{ccc}{1} & {\cos \alpha_{11} \cos \alpha_{21}+\sin \alpha_{11} \sin \alpha_{21}} & {\cos \alpha_{11} \cos \alpha_{31}+\sin \alpha_{11} \sin \alpha_{31}} \\ {\cos \alpha_{11} \cos \alpha_{21}+\sin \alpha_{11} \sin \alpha_{21}} & {1} & {\cos \alpha_{21} \cos \alpha_{31}+\sin \alpha_{21} \sin \alpha_{31}} \\ {\cos \alpha_{11} \cos \alpha_{31}+\sin \alpha_{11} \sin \alpha_{31}} & {\cos \alpha_{21} \cos \alpha_{31}+\sin \alpha_{21} \sin \alpha_{31}} & {1}\end{array}\right].} $

显然, 矩阵$ Y $不仅对称半正定, 且满足$ \operatorname{rank}(\boldsymbol{Y}) \leqslant 2, $以及$ \operatorname{diag}(Y) = e $.

由引理2.2知矩阵$ Y $的每个位置上的元素可以表述为

$ {y_{ij}} = \left\{ {\begin{array}{*{20}{c}} {\sum\limits_{p = 1}^{r - 1} {\cos {\alpha _{ip}}\cos {\alpha _{jp}}\mathop \prod \limits_{l = 1}^{p - 1} \sin {\alpha _{il}}\sin {\alpha _{jl}} + \mathop \prod \limits_{l = 1}^{r - 1} \sin {\alpha _{il}}\sin {\alpha _{jl}}} , i \ne j}, \\ {0, \qquad \qquad \qquad \qquad \qquad \qquad \qquad\qquad\qquad\qquad\qquad i = j.} \\ \end{array}} \right. $

由引理2.2可将问题(2.1)等价转化为如下无约束优化问题

$ \begin{align} \min _{\alpha \in R^{n \times (r-1)}} F(\boldsymbol{\alpha}), \end{align} $ (2.2)

其中

$ \begin{aligned} F(\boldsymbol{\alpha}) & = \operatorname{tr}(\boldsymbol{Q} \boldsymbol{Y}) = \sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} q_{j i} y_{i j} \\ & = \sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} q_{i j}(\sum\limits_{p = 1}^{r-1} \cos \alpha_{i p} \cos \alpha_{j p} \prod\limits_{l = 1}^{p-1} \sin \alpha_{i l} \sin \alpha_{j l}+\prod\limits_{l = 1}^{r-1} \sin \alpha_{i l} \sin \alpha_{j l}). \end{aligned} $

下面将利用非线性共轭梯度法求解问题(2.2).先给出目标函数的梯度如下.

定理2.1 $ F(\boldsymbol{\alpha}) $的梯度为

$ \nabla F(\boldsymbol{\alpha}) = (\frac{\partial F(\boldsymbol{\alpha})}{\partial \alpha_{11}}, \frac{\partial F(\boldsymbol{\alpha})}{\partial \alpha_{21}}, \cdots, \frac{\partial F(\boldsymbol{\alpha})}{\partial \alpha_{n 1}}, \cdots, \frac{\partial F(\boldsymbol{\alpha})}{\partial \alpha_{1, r-1}}, \frac{\partial F(\boldsymbol{\alpha})}{\partial \alpha_{2, r-1}}, \cdots \frac{\partial F(\boldsymbol{\alpha})}{\partial \alpha_{\bf{n}, \bf{r}-1}})^{\mathrm{T}}, $

其中

$ \begin{aligned} &\frac{\partial F(\boldsymbol{\alpha})}{\partial \alpha_{\mu \nu}}\\ = &\sum\limits_{i = 1, i\neq \mu}^{n}[\left(q_{\mu i}+q_{j \mu}\right)(-\sin \alpha_{\mu \nu} \cos \alpha_{i \nu} \prod\limits_{l = 1}^{v-1} \sin \alpha_{\mu l} \sin \alpha_{i l}+\cos \alpha_{\mu \nu} \sin \alpha_{i \nu} \prod\limits_{l = 1, l \neq \nu}^{r-1} \sin \alpha_{\mu l} \sin \alpha_{j l} \\ &\; +\sum\limits_{p = \nu+1}^{r-1} \cos \alpha_{\mu p} \cos \alpha_{i p} \cos \alpha_{\mu \nu} \cos \alpha_{i \nu} \prod\limits_{l = 1, l \neq \mu}^{p-1} \sin \alpha_{\mu l} \sin _{i l} ) ] \\ &\; +q_{\mu \mu}(-2 \cos \alpha_{\mu \nu} \sin \alpha_{\mu \nu} \prod\limits_{l = 1}^{\nu-1} \sin ^{2} \alpha_{\mu l}+2 \sin \alpha_{\mu \nu} \cos \alpha_{\mu l} \prod\limits_{l = 1, l \neq \nu}^{r-1} \sin ^{2} \alpha_{\mu l}\\ &\; +\sum\limits_{p = \nu+1}^{r-1} 2 \cos \alpha_{\mu p} \sin \alpha_{\mu \nu} \cos \alpha_{\mu \nu} \prod\limits_{l = 1, l \neq \mu}^{p-1} \sin ^{2} \alpha_{\mu l} ). \end{aligned} $

$ \begin{aligned} \frac{\partial F(\boldsymbol{\alpha})}{\partial \alpha_{\mu \nu}} = &\frac{\partial}{\partial \alpha_{\mu \nu}}[\sum\limits_{j = 1, j \neq \mu}^{n} q_{j \mu}(\sum\limits_{p = 1}^{r-1} \cos \alpha_{\mu p} \cos \alpha_{j p} \prod\limits_{l = 1}^{p-1} \sin \alpha_{\mu l} \sin \alpha_{j l}+\prod\limits_{l = 1}^{r-1} \sin \alpha_{\mu l} \sin \alpha_{j l})] \\ &\; +\frac{\partial}{\partial \alpha_{\mu \nu}}[\sum\limits_{i = 1, i \neq \mu}^{n} q_{\mu i}(\sum\limits_{p = 1}^{r-1} \cos \alpha_{\mu p} \cos \alpha_{i p} \prod\limits_{l = 1}^{p-1} \sin \alpha_{\mu l} \sin \alpha_{i l}+\prod\limits_{l = 1}^{r-1} \sin \alpha_{\mu l} \sin \alpha_{i l})] \\ &\; +\frac{\partial}{\partial \alpha_{\mu \nu}}[q_{\mu \mu}(\sum\limits_{P = 1}^{r-1} \cos ^{2} \alpha_{\mu p} \prod\limits_{l = 1}^{r-1} \sin ^{2} \alpha_{\mu l}+\prod\limits_{l = 1}^{r-1} \sin ^{2} \alpha_{\mu l})]\\ = &\frac{\partial}{\partial \alpha_{\mu \nu}}[\sum\limits_{j = 1}^{\mu-1} \mathrm{q}_{j \mu}(\sum\limits_{p = 1}^{r-1} \cos \alpha_{\mu p} \cos \alpha_{j p} \prod\limits_{l = 1}^{p-1} \sin \alpha_{\mu l} \sin \alpha_{j l}+\prod\limits_{l = 1}^{r-1} \sin \alpha_{\mu l} \sin \alpha_{j l}) \\ &\; +\sum\limits_{j = \mu+1}^{n} q_{j \mu}(\prod\limits_{p = 1}^{r-1} \cos \alpha_{\mu_{p}} \cos \alpha_{j p} \prod\limits_{l = 1}^{p-1} \sin \alpha_{\mu l} \sin \alpha_{j l}+\prod\limits_{l = 1}^{r-1} \sin \alpha_{\mu l} \sin \alpha_{j l}) ]\\ &\; +\frac{\partial}{\partial \alpha_{\mu \nu}}[\sum\limits_{i = 1}^{\mu-1} q_{\mu i}(\prod\limits_{p = 1}^{r-1} \cos \alpha_{\mu p} \cos \alpha_{i p} \prod\limits_{l = 1}^{p-1} \sin \alpha_{\mu l} \sin \alpha_{i l}+\prod\limits_{l = 1}^{r-1} \sin \alpha_{\mu l} \sin \alpha_{i l}) \\ &\; +\sum\limits_{j = \mu+1}^{n} q_{\mu i}(\prod\limits_{p = 1}^{r-1} \cos \alpha_{\mu_{p}} \cos \alpha_{i p} \prod\limits_{l = 1}^{p-1} \sin \alpha_{\mu l} \sin \alpha_{i l}+\prod\limits_{l = 1}^{r-1} \sin \alpha_{\mu l} \sin \alpha_{i l}) ]\\ &\; +\frac{\partial}{\partial \alpha_{\mu \nu}}[q_{\mu \mu}(\prod\limits_{p = 1}^{r-1} \cos ^{2} \alpha_{\mu p} \prod\limits_{l = 1}^{p-1} \sin ^{2} \alpha_{\mu l}+\prod\limits_{l = 1}^{r-1} \sin ^{2} \alpha_{\mu l})]\\ = &\{\sum\limits_{j = 1}^{\mu-1}[q_{j \mu}(-\sin \alpha_{\mu \nu} \cos \alpha_{j \nu} \prod\limits_{l = 1}^{\nu-1} \sin \alpha_{\mu l} \sin \alpha_{j l}+\cos \alpha_{\mu \nu} \sin \alpha_{j v} \prod\limits_{l = 1, l \neq \nu}^{r-1} \sin \alpha_{\mu l} \sin \alpha_{j l}\\ \end{aligned} $
$ \begin{aligned} &\; +\sum\limits_{p = \nu+1}^{r-1} \cos \alpha_{\mu p} \cos \alpha_{j p} \cos \alpha_{\mu \nu} \cos \alpha_{j \nu} \prod\limits_{l = 1 , l \neq \mu}^{p-1} \sin \alpha_{\mu l} \sin _{j l} ) ] \\ &\; +\sum\limits_{j = \mu+1}^{n} q_{j \mu}[(-\sin \alpha_{\mu \nu} \cos \alpha_{j \nu} \prod\limits_{l = 1}^{\nu-1} \sin \alpha_{\mu l} \sin \alpha_{j l}+\cos \alpha_{\mu \nu} \sin \alpha_{j v} \prod\limits_{l = 1 , l \neq \mu}^{r-1} \sin \alpha_{\mu l} \sin \alpha_{j l}\\ &\; +\sum\limits_{p = v+1}^{r-1} \cos \alpha_{\mu p} \cos \alpha_{j p} \cos \alpha_{\mu \nu} \cos \alpha_{j v} \prod\limits_{l = 1, l \neq \mu}^{p-1} \sin \alpha_{\mu l} \sin _{j l} ) ] \} \\ &\; +\{\sum\limits_{i = 1}^{\mu-1}[q_{i \mu}(-\sin \alpha_{\mu \nu} \cos \alpha_{i v} \prod\limits_{l = 1}^{\nu-1} \sin \alpha_{\mu l} \sin \alpha_{i l}+\cos \alpha_{\mu \nu} \sin \alpha_{i v} \prod\limits_{l = 1, l \neq \nu}^{r-1} \sin \alpha_{\mu l} \sin \alpha_{i l}\\ &\; +\sum\limits_{p = \nu+1}^{r-1} \cos \alpha_{\mu p} \cos \alpha_{i p} \cos \alpha_{\mu \nu} \cos \alpha_{i v} \prod\limits_{l = 1, l \neq \mu}^{p-1} \sin \alpha_{\mu l} \sin _{i l} ) ]\\ &\; +\sum\limits_{i = \mu+1}^{n}[q_{i \mu}(-\sin \alpha_{\mu \nu} \cos \alpha_{i \nu} \prod\limits_{l = 1}^{\nu-1} \sin \alpha_{\mu l} \sin \alpha_{i l}+\cos \alpha_{\mu \nu} \sin \alpha_{i v} \prod\limits_{l = 1, l \neq \nu}^{r-1} \sin \alpha_{\mu l} \sin \alpha_{i l}\\ &\; +\sum\limits_{p = \nu+1}^{r-1} \cos \alpha_{\mu p} \cos \alpha_{i p} \cos \alpha_{\mu \nu} \cos \alpha_{i v} \prod\limits_{l = 1, l \neq \mu}^{p-1} \sin \alpha_{\mu l} \sin _{i l} ) ] \} \\ &\; +[q_{\mu \mu}(-2 \cos \alpha_{\mu l} \sin \alpha_{\mu \nu} \prod\limits_{l = 1}^{\nu-1} \sin ^{2} \alpha_{\mu l}+2 \sin \alpha_{\mu \nu} \cos \alpha_{\mu \nu} \prod\limits_{l = 1 , l \neq \nu}^{r-1} \sin ^{2} \alpha_{\mu l}\\ &+\sum\limits_{p = \nu+1}^{r-1} 2 \cos ^{2} \alpha_{\mu p} \sin \alpha_{\mu \nu} \cos \alpha_{\mu \nu} \prod\limits_{l = 1, l \neq \mu}^{p-1} \sin ^{2} \alpha_{\mu l} ) ] \\ = &\sum\limits_{i = 1, i\neq\mu}^{n}[\left(q_{\mu i}+q_{j \mu}\right)(-\sin \alpha_{\mu \nu} \cos \alpha_{i u} \prod\limits_{l = 1}^{\nu-1} \sin \alpha_{\mu l} \sin \alpha_{i l}+\cos \alpha_{\mu \nu} \sin \alpha_{i v} \prod\limits_{l = 1, l \neq \nu}^{r-1} \sin \alpha_{\mu l} \sin \alpha_{j l}\\ &\; +\sum\limits_{p = \nu+1}^{r-1} \cos \alpha_{\mu p} \cos \alpha_{i p} \cos \alpha_{\mu \nu} \cos \alpha_{i \nu} \prod\limits_{l = 1 , l \neq \mu}^{p-1} \sin \alpha_{\mu l} \sin _{i l} ) ] \\ &\; +q_{\mu \mu}(-2 \cos \alpha_{\mu \nu} \sin \alpha_{\mu \nu} \prod\limits_{l = 1}^{\nu-1} \sin ^{2} \alpha_{\mu l}+2 \sin \alpha_{\mu \nu} \cos \alpha_{\mu l} \prod\limits_{l = 1, 1 \neq \nu}^{r-1} \sin ^{2} \alpha_{\mu l}\\ &\; +\sum\limits_{p = \nu+1}^{r-1} 2 \cos \alpha_{\mu p} \sin \alpha_{\mu \nu} \cos \alpha_{\mu \nu} \prod\limits_{l = 1, l = \mu}^{p-1} \sin ^{2} \alpha_{\mu l} ), \end{aligned} $

从而利用Armijo线搜索下的非线性共轭梯度方法求解问题(2.2)的算法如下.

算法2.1 (Armijo线搜索下的非线性共轭梯度方法求解问题$ (2.2)) $

步0 给定参数$ \rho \in(0, 1), \delta \in(0, 0.5), \sigma \in(\delta, 0.5) $, 迭代精度$ 0 \leqslant \varepsilon<1 $和初始点$ \alpha_{0} $.计算$ g_{0} = \nabla F\left(\alpha_{0}\right). $$ k : = 0 $.

步1 若$ \left\|g_{k}\right\| \leqslant \varepsilon $, 停机, 输出$ \boldsymbol{\alpha}^{*} \approx \boldsymbol{\alpha}_{k} $.

步2 计算搜索方向

$ d_{k} = \left\{\begin{aligned}-g_{k}, k = 0, \\-g_{k}+\frac{g_{k}^{T} g_{k}}{g_{k-1}^{T} g_{k-1}} d_{k-1}, k>0. \end{aligned}\right. $

步3 利用Armijo法则确定搜索步长$ b_{k} $即找一个最小的非负整数$ m_{k} $, 使得$ F\left(\boldsymbol{\alpha}_{k+1}\right) \leqslant F\left(\boldsymbol{\alpha}_{k}\right)+\sigma \rho^{m_{k}} g_{k}^{T} d_{k} $.令$ b_{k} = \rho^{m_{k}}, \boldsymbol{\alpha}_{k+1} = \boldsymbol{\alpha}_{k}+b_{k} d_{k} $.

步4 令$ k : = k+1 $, 转步1.

由文[12]的定理2.6得到算法2.1的全局收敛性定理.

定理2.2 设问题$ (2.2) $目标函数$ F(\boldsymbol{\alpha}) $连续可微, 如果问题$ (2.2) $的梯度是Lipschitz连续的, 即存在常数$ L>0 $, 使得$ \left\|\nabla F\left(\boldsymbol{\alpha}_{1}\right)-\nabla F\left(\boldsymbol{\alpha}_{2}\right)\right\|_{\mathrm{F}} \leqslant L\left\|\boldsymbol{\alpha}_{1}-\boldsymbol{\alpha}_{2}\right\|_{\mathrm{F}}, \forall \boldsymbol{\alpha}_{1}, \boldsymbol{\alpha}_{2} \in {\bf{R}}^{\operatorname{m}\times(\mathrm{r}-1)}, $那么由算法2.1产生的序列$ \left\{\boldsymbol{\alpha}_{k}\right\} $满足$ \lim\inf\limits _{k \rightarrow \infty}\left\|\nabla F\left(\boldsymbol{\alpha}_{k}\right)\right\|_{\mathrm{F}} = 0. $

3 数值例子

本节利用数值例子验证算法$ 2.1 $求解问题$ (2.2) $是可行的.所有程序都用Matlab R2014a编写.令梯度范数为$ \left\|\nabla F_{k}\right\|_{\mathrm{F}} = \left\|\nabla F\left(\boldsymbol{\alpha}_{k}\right)\right\|_{\mathrm{F}}, $停机标准为$ \left\|\nabla F_{k}\right\|_{\mathrm{F}}<1 \times 10^{-3}, $其中$ \alpha_{k} $表示算法$ 2.1 $的第$ k $步迭代值.

例3.1 考虑问题$ (2.2) $, 随机选取加权无向图(见图 1), 其邻接矩阵为

图 1 加权无向图(各点权重相同)
$ A = \left[ \begin{array}{llllll}{0} & {1} & {0} & {0} & {1} & {0} \\ {1} & {0} & {1} & {0} & {1} & {0} \\ {0} & {1} & {0} & {1} & {0} & {0} \\ {0} & {0} & {1} & {0} & {1} & {1} \\ {1} & {1} & {0} & {1} & {0} & {0} \\ {0} & {0} & {0} & {1} & {0} & {0}\end{array}\right], $

从而图 1的负Laplace矩阵为

$ Q = \frac{1}{4}(\operatorname{diag}(A e)-A) = \left[ \begin{array}{cccccc}{0.50} & {-0.25} & {0} & {0} & {-0.25} & {0} \\ {-0.25} & {0.75} & {-0.25} & {0} & {-0.25} & {0} \\ {0} & {-0.25} & {0.50} & {-0.25} & {0} & {0} \\ {0} & {0} & {-0.25} & {0.75} & {-0.25} & {-0.25} \\ {-0.25} & {-0.25} & {0} & {-0.25} & {0.75} & {0} \\ {0} & {0} & {0} & {0.25} & {0} & {0.25}\end{array}\right], $

选取初值

$ \boldsymbol{\alpha}_{0} = \left[ \begin{array}{ccc}{0.059\;4} & {0.092\;4} & {0.108\;8} \\ {0.315\;8} & {0.007\;8} & {0.631\;8} \\ {0.772\;7} & {0.423\;1} & {0.126\;5} \\ {0.696\;4} & {0.655\;6} & {0.134\;3} \\ {0.125\;3} & {0.722\;9} & {0.098\;6} \\ {0.130\;2} & {0.531\;2} & {0.142\;0}\end{array}\right]. $

利用算法$ 2.1 $求解问题$ (2.2) $.其目标函数值和梯度范数$ \left\|\nabla F_{k}\right\|_{\mathrm{F}} $的曲线如图 2.

图 2 目标函数值和梯度范数$ \left\|\nabla F_{k}\right\|_{\mathrm{F}} $的曲线

例3.2 考虑问题$ (2.2) $, 随机选取一幅加权无向图并按例子$ 3.1 $所述方法求得其负Laplace矩阵为Q, 再利用算法$ 2.1 $求解, 其中

$ Q = {\left[ {\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.380}}\, {\rm{0}}} & {{\rm{0}}{\rm{.011}}\, {\rm{0}}} & {{\rm{0}}{\rm{.007}}\, {\rm{1}}} & {{\rm{0}}{\rm{.001}}\, {\rm{0}}} & {{\rm{0}}{\rm{.002}}\, {\rm{0}}} & {{\rm{ 0}}{\rm{.006}}\, {\rm{0}}} & {{\rm{0}}{\rm{.001}}\, {\rm{1}}} & {{\rm{ 0}}{\rm{.008}}\, {\rm{7}}} & {{\rm{0}}{\rm{.006}}\, {\rm{7}}} & {{\rm{0}}{\rm{.098}}\, {\rm{0}}} \\ {{\rm{0}}{\rm{.011}}\, {\rm{0}}} & {{\rm{0}}{\rm{.083}}\, {\rm{0}}} & {{\rm{0}}{\rm{.008}}\, {\rm{5}}} & {{\rm{0}}{\rm{.010}}\, {\rm{0 }}} & {{\rm{0}}{\rm{.001}}\, {\rm{2}}} & {{\rm{0}}{\rm{.010}}\, {\rm{0}}} & {{\rm{0}}{\rm{.100}}\, {\rm{0 }}} & {{\rm{0}}{\rm{.059}}\, {\rm{0}}} & {{\rm{0}}{\rm{.001}}\, {\rm{8}}} & {{\rm{0}}{\rm{.006}}\, {\rm{4}}} \\ {{\rm{0}}{\rm{.046}}\, {\rm{0}}} & {{\rm{0}}{\rm{.003}}\, {\rm{3}}} & {{\rm{0}}{\rm{.032}}\, {\rm{0}}} & {{\rm{0}}{\rm{.010}}\, {\rm{0}}} & {{\rm{0}}{\rm{.006}}\, {\rm{8}}} & {{\rm{0}}{\rm{.008}}\, {\rm{0}}} & {{\rm{0}}{\rm{.011}}\, {\rm{5}}} & {{\rm{0}}{\rm{.029}}\, {\rm{0}}} & {{\rm{0}}{\rm{.004}}\, {\rm{4}}} & {{\rm{0}}{\rm{.024}}\, {\rm{0}}} \\ {0.090\;0} & {0.008\;3} & {0.087\;0} & {0.048\;0} & {0.006\;0} & {0.066\;0} & {0.007\;7} & {0.086\;0} & {0.006\;5} & {0.074\;0} \\ {0.052\;0} & {0.009\;0} & {0.009\;4} & {0.034\;0} & {0.056\;0} & {0.004\;6} & {0.007\;1} & {0.060\;0} & {0.091\;0} & {0.009\;9} \\ {0.049\;0} & {0.025\;0} & {0.012\;0} & {0.005\;6} & {0.023\;0} & {0.096\;0} & {0.009\;5} & {0.039\;0} & {0.007\;7} & {0.002\;4} \\ {0.036\;0} & {0.009\;4} & {0.033\;0} & {0.010\;0} & {0.006\;8} & {0.020\;0} & {0.069\;0} & {0.078\;0} & {0.004\;7} & {0.006\;0} \\ {0.093\;0} & {0.002\;4} & {0.009\;3} & {0.005\;9} & {0.007\;6} & {0.002\;2} & {0.130\;0} & {0.054\;0} & {0.006\;2} & {0.008\;4} \\ {0.045\;0} & {0.030\;0} & {0.059\;0} & {0.007\;8} & {0.000\;9} & {0.003\;4} & {0.000\;6} & {0.027\;0} & {0.081\;0} & {0.005\;1} \\ {0.070\;0} & {0.006\;1} & {0.003\;2} & {0.001\;1} & {0.042\;0} & {0.004\;0} & {0.074\;0} & {0.021\;0} & {0.004\;2} & { - 0.077\;0} \\ \end{array}} \right].} $

选取初值

$ {\alpha _0} = \left[ {\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.146}}\;{\rm{7}}} & {0.752\;3} & {0.860\;0} & {0.504\;7} & {0.881\;8} \\ {0.351\;3} & {0.095\;9} & {0.749\;7} & {0.512\;0} & {0.915\;7} \\ {0.362\;3} & {0.341\;8} & {0.315\;0} & {0.859\;4} & {0.732\;0} \\ {0.205\;2} & {0.321\;3} & {0.509\;6} & {0.046\;7} & {0.859\;8} \\ {0.580\;2} & {0.348\;9} & {0.783\;1} & {0.701\;5} & {0.803\;4} \\ {0.538\;5} & {0.491\;7} & {0.372\;1} & {0.803\;1} & {0.272\;7} \\ {0.792\;0} & {0.587\;7} & {0.789\;7} & {0.107\;6} & {0.631\;5} \\ {0.290\;0} & {0.203\;8} & {0.208\;4} & {0.861\;6} & {0.034\;7} \\ {0.956\;9} & {0.272\;5} & {0.120\;0} & {0.410\;1} & {0.139\;9} \\ {0.080\;0} & {0.239\;7} & {0.362\;5} & {0.567\;2} & {0.033\;5} \\ \end{array}} \right], $

利用算法$ 2.1 $求解问题$ (2.2) $, 可得其目标函数值和梯度范数的曲线如下图 3.

图 3 目标函数值和梯度范数$ \left\|\nabla F_{k}\right\|_{\mathrm{F}} $的曲线

例3.3 考虑问题$ (2.2) $, 其中$ \boldsymbol{Q} \in {\bf{R}}^{n \times n}, \boldsymbol{\alpha}_{0} \in {\bf{R}}^{n \times(r-1)} $随机选取, 且$ r \leqslant n $.利用算法$ 2.1 $求解问题$ (2.2) $, 实验结果如表 1, 其中IT表示迭代步数, CPU表示CPU运行时间, VAL表示目标函数值, GN表示梯度范数.

表 1 $n$和$r$取不同值时算法$3.2$的结果

数值例子3.1、3.2和3.3说明利用算法2.1求解问题(2.2)是可行的.

4 结论

本文考虑图像处理中的最小割问题, 利用Gramian表示和三角函数变换将图像处理中的最小割问题转化为无约束优化问题, 再利用非线性共轭梯度法求解无约束优化问题, 最后用数值实验验证了迭代方法是可行的.

参考文献
[1] Grippo L, Palagi L, Piccialli V. An unconstrained minimization method for solving low-rank SDP relaxations of the maxcut problem[J]. Mathematical Programming (Series B), 2011, 126(1): 119–146. DOI:10.1007/s10107-009-0275-8
[2] 罗希平, 田捷, 诸葛婴, 等. 图像分割方法综述[J]. 模式识别与人工智能, 1999, 12(3): 300–312.
[3] Kokiopoulou E, Chen J, Saad Y. Trace optimization and eigenproblems in dimension reduction methods[J]. Numerical Linear Algebra with Applications, 2011, 18(3): 565–602. DOI:10.1002/nla.743
[4] Weinberger K. Q., Saul L K. Distance metric learning for large margin nearest neighbor classification[J]. Journal of Machine Learning Research, 2009, 10(1): 207–244.
[5] Kiers H A L, Berge J M F T. Minimization of a class of matrix trace functions by means of refined majorization[J]. Psychometrika, 1992, 57(3): 371–382. DOI:10.1007/BF02295425
[6] Bental A, Teboulle M. Hidden convexity in some nonconvex quadratically constrained quadratic programming[J]. Mathematical Programming, 1996, 72(1): 51–63. DOI:10.1007/BF02592331
[7] Benson S J, Ye Y, Zhang X. Solving large-scale sparse semidefinite programs for combinatorial optimization[J]. SIAM Journal on Optimization, 2000, 10(2): 443–461. DOI:10.1137/S1052623497328008
[8] Burer S, Monteiro R D C, et al. Rank-two relaxation Heuristics for MAX-CUT and other binary quadratic programs[J]. SIAM Journal on Optimization, 2006, 12(2): 503–521.
[9] Liu X G, Wang X F, Wang W G. Maximization of matrix trace function of product stiefel manifolds[J]. SIAM Journal on Matrix Analysis and Applications, 2015, 36(4): 1489–1506. DOI:10.1137/15M100883X
[10] Xu S, Gao L, Zhang P. Numerical linear algebra[M]. .
[11] Duan X F, Bai J C, Zhang M J. On the generalized low rank approximation of the correlation matrices arising in the asset portfolio[J]. Linear Algebra and Its Applications, 2014, 461: 1–17. DOI:10.1016/j.laa.2014.07.026
[12] Adrian S, Lewis, Malick J. Alternating projections on manifolds[J]. Mathematics of Operations Research, 2008, 33(1): 216–234. DOI:10.1287/moor.1070.0291