Consider the following unconstrained optimization problem
where $f:\mathbb{R}^{n}\rightarrow \mathbb{R}$ is a smooth and nonlinear function, and its gradient $g_{k}=\nabla{f(x_{k})}$ is available. The conjugate gradient (CG) methods represent an important class of unconstrained optimization algorithms with strong local and global convergence properties and modest memory requirements.
The iterative formula of the CG method is given by
In (1.2), $d_{k}$ is a search direction updated by
and the steplength $\alpha_{k}>0$ is commonly chosen to satisfy certain line search conditions. Among them, the so-called Wolfe conditions have attracted special attention in the convergence analyses and the implementations of CG methods, requiring that
where $0 < \rho < \sigma < 1$ are often imposed on the line search.
The well-known formulas $\beta_{k}$ are Fletcher-Reeves, Hestenes--Stiefel, Polak-Ribière-Polyak and Dai-Yuan formulas, which are specified by
where $y_{k-1}=g_{k}-g_{k-1}$ . Please refer to [1] for more details of their convergence properties.
Little is known concerning global convergence of the HS method for nonconvex minimization problems. However, the HS method, which resembles the PRP method in practical computation and is often recommended due to its superior numerical properties.
Recently, various modifications of the HS method received growing interests, in which sufficient descent condition is important in the convergence analysis of CG method.
Hager and Zhang [2] proposed the CG$\_$ DESCENT method and proved its convergence with the Wolfe search. Based on the MBFGS method [3], Zhang [4] and Dai [5] introduced modified HZ methods with the Armijo search for nonconvex objective functions. Zhang, Zhou and Li [6] proposed a three$\cdot$ term modified PRP method, in which the property $-d_{k}^{T}g_{k}=\|g_{k}\|^2 $ always holds without any line searches. To seek the CG direction that is closest to the direction of the scaled memoryless BFGS method, Dai and Kou [7] proposed a family of CGOPT methods. Yu et al.[8, 9] proposed several modified spectral CG methods.For further study on the some recent advances, we may refer to [10-15].
In this paper, we will continue to study the HS-type method. One feature of our method is that our method guarantee sufficient descent condition and strongly global convergence.
The rest of this paper is organized as follows. In the next section, the new method is presented formally. Meanwhile, we prove the proposed method possesses sufficient descent and strongly global convergence properties with the Wolfe line search. In Section 4, we report numerical comparisons with existing methods by using some test problems.
Recently, Wei, Yao and Huang [16] proposed the MHS, WYL and MLS methods, in which the parameters $\beta_{k}$ are given by
where $\overline{y_{k-1}}=g_{k}-\dfrac{\|g_{k}\|}{\|g_{k-1}\|}g_{k-1}$ .
The global convergence of these methods with the strong Wolfe line search was proved for cases that the parameter $ \sigma$ in (1.5) is constrained to $ \sigma \in \left[0, \dfrac{1}{3}\right]$ , $ \sigma \in \left[0, \dfrac{1}{4}\right]$ and $ \sigma \in \left[0, \dfrac{1}{4}\right]$ , respectively. Furthermore, with the same restriction on the parameter $\sigma $ , these methods above possessed the sufficient descent condition.
The above discussion prompts naturally us to raise the following question:
Is it possible to modify the direction of the MHS method in a suitable way, thereby %retaining and enlarging its sufficient descent properties and ensuring the global convergence for the general functions?
By taking the theoretical advantage of the MHS method into consideration, we give another method to guarantee sufficient descent condition, in which strongly global convergence is satisfied. With our choices, the additional restriction on the parameter $\sigma$ in (1.5) can be removed. The direction $d_{k}$ in our method is given by
where $\varepsilon_1>0$ is a constant.
In (2.2), the parameters $\beta_{k}^{DHS }$ and $\varphi_{k}$ are chosen as
where $\lambda>1$ and $ \varepsilon_{1}>0 $ are two given constants.
For convenience, we call our method (2.2) as DHS method in later part of this paper and formally state the steps of this method as follows.
Algorithm 2.1 (DHS method)
Step 1 Choosing constants $\lambda>1$ , $\varepsilon_{1}>0$ and $\varepsilon>0$ . Select an initial point $x_{1} \in R^{n}$ , set $k=1$ .
Step 2 Test a criterion for stopping the iterations. If $\|g_{k}\| < \varepsilon$ , then stop, otherwise calculate $ d_{k}$ by (2.2).
Step 3 Determine the steplength by the Wolfe conditions.
Step 4 Calculate the new iterate by $x_{k+1}=x_{k}+\alpha_{k}d_{k}$ , set $k=k+1$ and goto Step 2.
The descent property is an indispensable factor in the convergence analysis of CG method. More exactly, if there exists a constant $ c_{1}>0$ such that
holds for all $ k \in N $ , then the so-called sufficient descent condition holds.
Property (2.5) is guaranteed in our method, as proved in the following lemma.
Lemma 2.1 Let$\{x_{k}\}$ and $\{d_{k}\}$ be generated by the DHS method. Then the direction $d_{k}$ satisfies the sufficient descent condition (2.5), which is independent of any line search.
Proof For $ k=1$ , it follows that $ d_{1}^{T}g_{1}=-\|g_{1}\|^2$ . Now we mainly consider the case where
is satisfied. It follows from (2.2) that
Then the conclusion holds by letting $c_{1}=1-\lambda^{-1}$ .
In this section, we prove the global convergence of the proposed CG method.We need the following assumptions, which are generally assumed in the literature.
Assumption 3.1
Boundedness Assumption: the level set defined by $\Omega =\{ x\in R^{n}|f(x)\leq f(x_{1})\}$ with $x_{1}$ to be the initial point, is bounded;
Lipschitz Assumption: in some neighborhood ${{\Omega }_{0}}$ of $\Omega $, the objective function f is continuously difierentiable, and its gradient g is Lipschitz continuous, namely, there exist a constant L > 0 such that
Theorem 3.1 Suppose that Assumption 3.1 holds. Let $\{x_{k}\}$ and $\{d_{k}\}$ be generated by Algorithm 2.1. Then there exists a constant $\overline{c}$ such that
Proof Since $ d_{1} =-g_{1} $ , the result obviously holds for $k=1$ .
If $k>1$ , it suffice to consider the case where $ d_{k-1}^{T}y_{k-1}>\varepsilon_{1}\|y_{k-1}|| ||d_{k-1}|| $ holds.
It follows form the definition $\beta_{k}^{DHS}$ in (2.3) and the Cauchy inequality that
Also, we can estimate the upper bound for $|\varphi_{k}|$ , presented by
Combining this with (3.3) yields
Let $\overline{c}= 1+\dfrac{2 }{\varepsilon_{1}}+\dfrac{1 }{\lambda}$ , we complete the proof.
The conclusion of the following theorem, called the Zoutendijk condition, is often used to prove global convergence of nonlinear CG method.
Theorem 3.2 (see [17]) Suppose that Assumption 3.1 holds. Consider the general CG method, where $d_k$ is a descent direction and $\alpha_k$ satisfies the Wolfe conditions. Then we have
For the general function, we can develop a strongly global convergence result as follows.
Theorem 3.3 Suppose that Assumption 3.1 holds. Let $\{x_{k}\}$ be generated by Algorithm 2.1. Then
Proof The bound for $||d_{k}||$ in (3.2) coupled with (2.5) indicates that
Equation (3.7) leads to (3.6).
In this section, we provide the implementation detail of the new methods to verify the numerical performance. Our tests problems come form Moré [18].
We stop the iteration if the inequality $||g_{k}||\leq 10^{-6}$ is satisfied. We use "F" to denote the numbers of iteration exceed 2000. All algorithms use exactly the same implementation of the strong Wolfe conditions with $\rho=10^{-3}, \sigma=0.5.$ The parameters in (2.2) are specified by $\varepsilon_{1}=10^{-12}$ , $\mu=10$ .
In order to rank the CG methods, we subsequently use the method of Dai and Ni [19] to compare the performance of our methods with that of the other methods in [16].
First, we compute the total number of the function and its gradient evaluations by the formula $N_{{\rm total}}=NF+m*NG$ , where $m$ =5.
Second, for all our two methods, we evaluate their efficiency with respect to the WYL method in the following way. For each problem $i$ , compute the ratio
and the geometric mean of these ratios over all the test problems $ r_{{\rm total}}=(\prod\limits_{i\in S}r_{i})^{ \frac{1}{|S|}}, % and %\gamma_{total}=(\prod_{i\in S}\gamma_{i})^{ \frac{1}{|S|}} $ where $S$ denotes the set of the problems and $|S|$ the number of elements in $S$ .
Finally, we present the rtotal in the Table 4.2: