数学杂志  2017, Vol. 37 Issue (2): 231-238   PDF    
扩展功能
加入收藏夹
复制引文信息
加入引用管理器
Email Alert
RSS
本文作者相关文章
DONG Xiao-liang
HE Yu-bo
KONG Xiang-yu
LI Wei-jun
A NEW CONJUGATE GRADIENT METHOD WITH STRONGLY GLOBAL CONVERGENCE AND SUFFICIENT DESCENT CONDITION
DONG Xiao-liang1,2, HE Yu-bo3, KONG Xiang-yu1, LI Wei-jun1     
1. School of Mathematics and Information, Beifang University of Nationalities, Yinchuan, 710021, China;
2. Department of Mathematics and Application Mathematics, Huaihua University, Huaihua, 418008, China;
3. Network Information Technology Center, Beifang University of Nationalities, Yinchuan, 710021, China
Abstract: In this paper, we study the WYL conjugate gradient method for unconstrained optimization problems.By making use of the modified iterative scheme, the sufficient descent conditions are satisfied at each iteration independent of the line search used.Also, by removing the original restriction on the parameter of the Wolfe conditions, we establish the strongly global convergence property for the general function.Numerical results illustrate that our method is efficient for the test problems.
Key words: conjugate gradient method     sufficient descent condition     strongly global convergence     Wolfe line search    
一类新的具有充分下降条件和强收敛性的共轭梯度法
董晓亮1,2, 何郁波3, 孔翔宇1, 李卫军1     
1. 北方民族大学数学与信息学院, 宁夏 银川 750021;
2. 怀化学院数学与应用数学系, 湖南 怀化 418008;
3. 北方民族大学网络信息技术中心, 宁夏 银川 750021
摘要:本文研究了求解无约束优化问题的WYL共轭梯度法.利用修正迭代格式,得到了算法在每步迭代能产生不依赖于搜索条件的充分下降方向.同时,在原算法中关于Wolfe条件中参数去掉的情况下,获得了本文算法是强收敛的.数值实验说明本文算法可以有效求解测试问题.
关键词共轭梯度法    充分下降条件    强收敛性    Wolfe搜索    
1 Introduction

Consider the following unconstrained optimization problem

$\begin{equation}\label{mubiao} \min f(x), %\min\limits_{x\in \mathbb{R}^{n} } f(x), \end{equation}$ (1.1)

where $f:\mathbb{R}^{n}\rightarrow \mathbb{R}$ is a smooth and nonlinear function, and its gradient $g_{k}=\nabla{f(x_{k})}$ is available. The conjugate gradient (CG) methods represent an important class of unconstrained optimization algorithms with strong local and global convergence properties and modest memory requirements.

The iterative formula of the CG method is given by

$\begin{equation}\label{1.1} x_{k+1}=x_{k}+s_{k}, s_{k}=\alpha_{k}d_{k}.\\ \end{equation}$ (1.2)

In (1.2), $d_{k}$ is a search direction updated by

$\begin{equation}\label{1.2} d_{k}=\left\{\begin{array}{lc} -g_{k},&k=1, \\ -g_{k}+\beta_{k}d_{k-1},&k\geq2, \\ \end{array}\right. \end{equation}$ (1.3)

and the steplength $\alpha_{k}>0$ is commonly chosen to satisfy certain line search conditions. Among them, the so-called Wolfe conditions have attracted special attention in the convergence analyses and the implementations of CG methods, requiring that

$\begin{eqnarray}\label{Wolfe1} & & f(x_{k}+\alpha_{k}d_{k})-f(x_{k})\leq\rho\alpha_{k}g_{k}^{T}d_{k}, \\ \end{eqnarray}$ (1.4)
$g{\left( {{x_k} + {\alpha _k}{d_k}} \right)^T}{d_k} \ge \sigma g_k^T{d_k},$ (1.5)

where $0 < \rho < \sigma < 1$ are often imposed on the line search.

The well-known formulas $\beta_{k}$ are Fletcher-Reeves, Hestenes--Stiefel, Polak-Ribière-Polyak and Dai-Yuan formulas, which are specified by

$\begin{eqnarray}\label{4lei} & & \beta_{k}^{FR}=\dfrac{\|g_{k}\|^2}{\|g_{k-1}\|^2}, \beta_{k}^{HS}=\dfrac{g_{k}^{T}(g_{k}-g_{k-1})}{d_{k-1}^{T}y_{k-1}}, \nonumber\\ & & \beta_{k}^{PRP}=\dfrac{g_{k}^{T}(g_{k}-g_{k-1})}{\|g_{k-1}\|^2}, \beta_{k}^{DY}=\dfrac{\|g_{k}\|^2}{d_{k-1}^{T}y_{k-1}}, \end{eqnarray}$ (1.6)

where $y_{k-1}=g_{k}-g_{k-1}$ . Please refer to [1] for more details of their convergence properties.

Little is known concerning global convergence of the HS method for nonconvex minimization problems. However, the HS method, which resembles the PRP method in practical computation and is often recommended due to its superior numerical properties.

Recently, various modifications of the HS method received growing interests, in which sufficient descent condition is important in the convergence analysis of CG method.

Hager and Zhang [2] proposed the CG$\_$ DESCENT method and proved its convergence with the Wolfe search. Based on the MBFGS method [3], Zhang [4] and Dai [5] introduced modified HZ methods with the Armijo search for nonconvex objective functions. Zhang, Zhou and Li [6] proposed a three$\cdot$ term modified PRP method, in which the property $-d_{k}^{T}g_{k}=\|g_{k}\|^2 $ always holds without any line searches. To seek the CG direction that is closest to the direction of the scaled memoryless BFGS method, Dai and Kou [7] proposed a family of CGOPT methods. Yu et al.[8, 9] proposed several modified spectral CG methods.For further study on the some recent advances, we may refer to [10-15].

In this paper, we will continue to study the HS-type method. One feature of our method is that our method guarantee sufficient descent condition and strongly global convergence.

The rest of this paper is organized as follows. In the next section, the new method is presented formally. Meanwhile, we prove the proposed method possesses sufficient descent and strongly global convergence properties with the Wolfe line search. In Section 4, we report numerical comparisons with existing methods by using some test problems.

2 A New CG Method and Its Properties

Recently, Wei, Yao and Huang [16] proposed the MHS, WYL and MLS methods, in which the parameters $\beta_{k}$ are given by

$\begin{equation} \label{WYL}\beta_{k}^{MHS}=\dfrac{g_{k}^{T} \overline{y_{k-1}}}{d_{k-1}^{T}y_{k-1}}, \beta_{k}^{WYL}=\dfrac{g_{k}^{T} \overline{y_{k-1}}}{||g_{k-1}||^2}, \beta_{k}^{MLS}=-\dfrac{g_{k}^{T} \overline{y_{k-1}}}{d_{k-1}^{T}g_{k-1}}, \end{equation}$ (2.1)

where $\overline{y_{k-1}}=g_{k}-\dfrac{\|g_{k}\|}{\|g_{k-1}\|}g_{k-1}$ .

The global convergence of these methods with the strong Wolfe line search was proved for cases that the parameter $ \sigma$ in (1.5) is constrained to $ \sigma \in \left[0, \dfrac{1}{3}\right]$ , $ \sigma \in \left[0, \dfrac{1}{4}\right]$ and $ \sigma \in \left[0, \dfrac{1}{4}\right]$ , respectively. Furthermore, with the same restriction on the parameter $\sigma $ , these methods above possessed the sufficient descent condition.

The above discussion prompts naturally us to raise the following question:

Is it possible to modify the direction of the MHS method in a suitable way, thereby %retaining and enlarging its sufficient descent properties and ensuring the global convergence for the general functions?

By taking the theoretical advantage of the MHS method into consideration, we give another method to guarantee sufficient descent condition, in which strongly global convergence is satisfied. With our choices, the additional restriction on the parameter $\sigma$ in (1.5) can be removed. The direction $d_{k}$ in our method is given by

$\begin{equation}\label{DDHS} d_{k}^{DHS }=\left\{\begin{array}{lc} -g_{k},& d_{k-1}^{T}y_{k-1}\leq\varepsilon_{1}\|y_{k-1}|| ||d_{k-1}||, \\ -g_{k}+\beta_{k}^{DHS }d_{k-1}+ \varphi_{k}g_{k-1}, & d_{k-1}^{T}y_{k-1}>\varepsilon_{1}\|y_{k-1}|| ||d_{k-1}||, \\ \end{array}\right. \end{equation}$ (2.2)

where $\varepsilon_1>0$ is a constant.

In (2.2), the parameters $\beta_{k}^{DHS }$ and $\varphi_{k}$ are chosen as

$\begin{eqnarray}\label{BETADHS} & & \beta_{k}^{DHS }=\dfrac{g_{k}^{T}\left(g_{k}- \dfrac{\|g_{k}\|}{\|g_{k-1}\|}g_{k-1}\right)}{\max\{d_{k-1}^{T} y_{k-1}, \lambda|d_{k-1}^{T} g_{k}|\} }, \\ \end{eqnarray}$ (2.3)
${{\varphi }_{k}}=\frac{g_{k}^{T}{{d}_{k-1}}}{\max \{d_{k-1}^{T}{{y}_{k-1}},\lambda |d_{k-1}^{T}{{g}_{k}}|\}}\cdot \frac{\|{{g}_{k}}\|}{\|{{g}_{k-1}}\|},$ (2.4)

where $\lambda>1$ and $ \varepsilon_{1}>0 $ are two given constants.

For convenience, we call our method (2.2) as DHS method in later part of this paper and formally state the steps of this method as follows.

Algorithm 2.1 (DHS method)

Step 1 Choosing constants $\lambda>1$ , $\varepsilon_{1}>0$ and $\varepsilon>0$ . Select an initial point $x_{1} \in R^{n}$ , set $k=1$ .

Step 2  Test a criterion for stopping the iterations. If $\|g_{k}\| < \varepsilon$ , then stop, otherwise calculate $ d_{k}$ by (2.2).

Step 3  Determine the steplength by the Wolfe conditions.

Step 4 Calculate the new iterate by $x_{k+1}=x_{k}+\alpha_{k}d_{k}$ , set $k=k+1$ and goto Step 2.

The descent property is an indispensable factor in the convergence analysis of CG method. More exactly, if there exists a constant $ c_{1}>0$ such that

$\begin{equation}\label{our chongfen} d_{k}^{T}g_{k}\leq - c_{1}\|g_{k}\|^2, \forall k \in N \end{equation}$ (2.5)

holds for all $ k \in N $ , then the so-called sufficient descent condition holds.

Property (2.5) is guaranteed in our method, as proved in the following lemma.

Lemma 2.1 Let$\{x_{k}\}$ and $\{d_{k}\}$ be generated by the DHS method. Then the direction $d_{k}$ satisfies the sufficient descent condition (2.5), which is independent of any line search.

Proof  For $ k=1$ , it follows that $ d_{1}^{T}g_{1}=-\|g_{1}\|^2$ . Now we mainly consider the case where

$$ d_{k-1}^{T}y_{k-1}>\varepsilon_{1}\|y_{k-1}|| ||d_{k-1}||$$

is satisfied. It follows from (2.2) that

$\begin{eqnarray}\label{benwenchongfen} g_{k}^{T}d_{k} & = & -||g_{k}||^{2}+ \dfrac{g_{k}^{T} d_{k-1}}{ \max\{d_{k-1}^{T} y_{k-1}, \lambda|d_{k-1}^{T} g_{k}|\} } ||g_{k}||^{2} \nonumber\\ & \leq & -||g_{k}||^{2}+ \dfrac{|g_{k}^{T} d_{k-1}|}{ \max\{d_{k-1}^{T} y_{k-1}, \lambda|d_{k-1}^{T} g_{k}|\} } ||g_{k}||^{2} \nonumber\\ & \leq & -\left ( 1-\dfrac{ 1} { \lambda} \right )||g_{k}||^{2}. \end{eqnarray}$ (2.6)

Then the conclusion holds by letting $c_{1}=1-\lambda^{-1}$ .

3 Convergence Analysis

In this section, we prove the global convergence of the proposed CG method.We need the following assumptions, which are generally assumed in the literature.

Assumption 3.1

Boundedness Assumption: the level set defined by $\Omega =\{ x\in R^{n}|f(x)\leq f(x_{1})\}$ with $x_{1}$ to be the initial point, is bounded;

Lipschitz Assumption: in some neighborhood ${{\Omega }_{0}}$ of $\Omega $, the objective function f is continuously difierentiable, and its gradient g is Lipschitz continuous, namely, there exist a constant L > 0 such that

$\begin{equation}\label{Lipschitz} ||g(x)-g(y)||\leq L||x-y||, \forall x, y\in \Omega_{0}. \end{equation}$ (3.1)

Theorem 3.1 Suppose that Assumption 3.1 holds. Let $\{x_{k}\}$ and $\{d_{k}\}$ be generated by Algorithm 2.1. Then there exists a constant $\overline{c}$ such that

$\begin{equation}\label{dk_youjie} ||d_{k}|| \leq\overline{ c} ||g_{k}||, \forall k\in N. \end{equation}$ (3.2)

Proof  Since $ d_{1} =-g_{1} $ , the result obviously holds for $k=1$ .

If $k>1$ , it suffice to consider the case where $ d_{k-1}^{T}y_{k-1}>\varepsilon_{1}\|y_{k-1}|| ||d_{k-1}|| $ holds.

It follows form the definition $\beta_{k}^{DHS}$ in (2.3) and the Cauchy inequality that

$\begin{eqnarray}\label{betak} |\beta_{k}^{DHS }| & \leq & ||g_{k}|| \dfrac{\left||g_{k}-g_{k-1}\right||+\left\|g_{k-1}- \dfrac{\|g_{k}\|}{\|g_{k-1}\| }g_{k-1}\right\|}{d_{k-1}^{T}(g_{k}-g_{k-1})} \nonumber\\ & \leq & 2\dfrac{||g_{k}||||y_{k-1}||}{|d_{k-1}^{T} y_{k-1}|} \nonumber \leq \dfrac{2 }{\varepsilon_{1}}\dfrac{||g_{k}|| }{||d_{k-1}| |}. \end{eqnarray}$

Also, we can estimate the upper bound for $|\varphi_{k}|$ , presented by

$\begin{equation}\label{faik} |\varphi_{k}|=\left|\dfrac{g_{k}^{T}d_{k-1}}{\max\{d_{k-1}^{T} y_{k-1}, \lambda|d_{k-1}^{T} g_{k}|\} }\right|\cdot \dfrac{\|g_{k}\|}{\|g_{k-1}\|}\leq \dfrac{1 }{\lambda}\dfrac{||g_{k}|| }{||g_{k-1}| |}. \end{equation}$ (3.3)

Combining this with (3.3) yields

$\begin{eqnarray}\label{gujidk} ||d_{k}|| & \leq&||g_{k}||+|\beta_{k}^{DHS }|||d_{k-1}||+ |\varphi_{k}| \|g_{k-1}\| % \dfrac{g_{k}^{T}d_{k-1}}{d_{k-1}^{T}(g_{k}-g_{k-1})}\|g_{k}\| \nonumber\\ & \leq & \left(1+\dfrac{2 }{\varepsilon_{1}}+\dfrac{1 }{\lambda}\right) \|g_{k}\|. \end{eqnarray}$ (3.4)

Let $\overline{c}= 1+\dfrac{2 }{\varepsilon_{1}}+\dfrac{1 }{\lambda}$ , we complete the proof.

The conclusion of the following theorem, called the Zoutendijk condition, is often used to prove global convergence of nonlinear CG method.

Theorem 3.2 (see [17])  Suppose that Assumption 3.1 holds. Consider the general CG method, where $d_k$ is a descent direction and $\alpha_k$ satisfies the Wolfe conditions. Then we have

$\begin{equation}\label{zou} \sum\limits_{k\geq1} \dfrac {(g_{k}^{T}d_{k})^2} {||d_{k}||^2}<\infty. \end{equation}$ (3.5)

For the general function, we can develop a strongly global convergence result as follows.

Theorem 3.3   Suppose that Assumption 3.1 holds. Let $\{x_{k}\}$ be generated by Algorithm 2.1. Then

$\begin{equation}\label{jixian} \lim\limits_{k\rightarrow\infty}||g_{k}||=0. \end{equation}$ (3.6)

Proof  The bound for $||d_{k}||$ in (3.2) coupled with (2.5) indicates that

$\begin{equation}\label{jishu} \sum\limits_{k\geq1} \left(\overline{ c}\right )^{-2} ||g_{k}||^2 \leq \sum\limits_{k\geq1} \dfrac {||g_{k}||^4} {||d_{k}||^2}<\left(1-\dfrac{1}{ \lambda}\right)^{-2}\sum\limits_{k\geq1} \dfrac {(g_{k}^{T}d_{k})^2} {||d_{k}||^2}<+\infty. \end{equation}$ (3.7)

Equation (3.7) leads to (3.6).

4 Numerical Experiments

In this section, we provide the implementation detail of the new methods to verify the numerical performance. Our tests problems come form Moré [18].

We stop the iteration if the inequality $||g_{k}||\leq 10^{-6}$ is satisfied. We use "F" to denote the numbers of iteration exceed 2000. All algorithms use exactly the same implementation of the strong Wolfe conditions with $\rho=10^{-3}, \sigma=0.5.$ The parameters in (2.2) are specified by $\varepsilon_{1}=10^{-12}$ , $\mu=10$ .

In order to rank the CG methods, we subsequently use the method of Dai and Ni [19] to compare the performance of our methods with that of the other methods in [16].

First, we compute the total number of the function and its gradient evaluations by the formula $N_{{\rm total}}=NF+m*NG$ , where $m$ =5.

Second, for all our two methods, we evaluate their efficiency with respect to the WYL method in the following way. For each problem $i$ , compute the ratio

$$ r_{i}=\dfrac{N_{{\rm total}, i}({\rm a method in} [16])}{N_{{\rm total}, i}({\rm DHS} )}$$

and the geometric mean of these ratios over all the test problems $ r_{{\rm total}}=(\prod\limits_{i\in S}r_{i})^{ \frac{1}{|S|}}, % and %\gamma_{total}=(\prod_{i\in S}\gamma_{i})^{ \frac{1}{|S|}} $ where $S$ denotes the set of the problems and $|S|$ the number of elements in $S$ .

Table 4.1
Numerical Results

Finally, we present the rtotal in the Table 4.2:

Table 4.2
Comparison of e–ciency with other algorithm
References
[1] Dai Y H, Yuan Y X. Nonlinear conjugate gradient methods[M]. Shanghai: Shanghai Science and Technology Press, 2000.
[2] Hager W W, Zhang H C. A new conjugate gradient method with guaranteed descent and an efficient line search[J]. SIAM J. Optim., 2005, 16(1): 170–192. DOI:10.1137/030601880
[3] Li D H, Fukushima M. A modified BFGS method and its global convergence in nonconvex minimizationp[J]. J. Comput. Appl. Math., 2001, 129(1): 15–35.
[4] Zhang L, Zhou W J. On the global convergence of Hager-Zhang conjugate gradient with Armijo line search[J]. Math. Numer. Sin., 2008, 28(5): 840–845.
[5] Dai Z F, Wen F H. Global convergence of a modified Hestenes-Stiefel nonlinear conjugate gradient method with Armijo line search[J]. Numer. Algor., 2012, 59(1): 79–93. DOI:10.1007/s11075-011-9477-2
[6] Zhang L, Zhou W J, Li D H. A descent modified Polak-Ribi`ere-Polyak conjugate gradient method and its global convergence[J]. IMA J. Numer. Anal., 2006, 26(4): 629–640. DOI:10.1093/imanum/drl016
[7] Dai Y H, Kou C X. A nonlinear conjugate gradient algorithm with an optimal property and an improved Wolfe line search[J]. SIAM J. Optim., 2013, 23(1): 296–320. DOI:10.1137/100813026
[8] Yu G H, Guan L T, Chen W F. Spectral conjugate gradient methods with sufficient descent property for large-scale unconstrained optimization[J]. Optim. Meth. Softw., 2008, 23(2): 275–293. DOI:10.1080/10556780701661344
[9] Yu G H. New descent nonlinear conjugate gradient methods for large-scale unconstrained optimization, Tech. Rep. Department of scientific computation and computer applications[D]. Guangzhou: Sun Yat-Sen University, 2005.
[10] Dong X, Liu H, He Y, Yang X. A modified Hestenes-Stiefel conjugate gradient method with sufficient descent condition and conjugac condition[J]. J. Comp. Appl. Math., 2015, 281: 239–249. DOI:10.1016/j.cam.2014.11.058
[11] Dong X, Liu H, Xu Y, Yang X. Some nonlinear conjugate gradient methods with sufficient descent condition and global convergence[J]. Optim. Lett., 2015, 9(7): 1421–1432. DOI:10.1007/s11590-014-0836-5
[12] Dong X. Comment on"A new three-term conjugate gradient method for unconstrained problem"[J]. Numer. Algorithms, 2016, 72(1): 173–179. DOI:10.1007/s11075-015-0039-x
[13] Dong X, Liu H, He Y. A self-adjusting conjugate gradient method with sufficient descent condition and conjugacy condition[J]. J. Optim. Theory Appl., 2015, 165(1): 225–241. DOI:10.1007/s10957-014-0601-z
[14] Dong X, Li C, He Y, Yu G. A relaxed two-stage multi-spllting algorithm for bi-obstacle problems[J]. J. Math., 2011, 31(2): 323–330.
[15] He Y, Dong X. On the convergence of Levenberg-Marquardt method for nonlinear inequalities[J]. J. Math., 2012, 32(1): 25–34.
[16] Yao S W, Wei Z X, Huang H. A note about WYL's conjugate gradient method and its applications[J]. Appl. Math. Comput., 2007, 191(2): 381–388.
[17] Wolfe, P. Convergence conditions for ascent methods[J]. SIAM Rev., 1969, 11(2): 226–235. DOI:10.1137/1011036
[18] Moré J J, Garbow B S, Hillstrom K E. Testing unconstrained optimization software[J]. ACM Trans. Math. Softw., 1981, 7(1): 17–41. DOI:10.1145/355934.355936
[19] Dai Y H, Ni Q. Testing different conjugate gradient methods for large-scale unconstrained optimization[J]. J. Comput. Math., 2003, 21(3): 311–320.