Numerical Computation of Second Derivatives with Applications to Optimization Problems

semanticscholar（2013）

引用 0|浏览7

暂无评分

摘要

Newton’s method is applied to the minimization of a computat ionally expensive objective function. Various methods for computing the exact Hessian are examined, notab ly adjoint-based methods and the hyper-dual method. The hyper-dual number method still requires O(N) function evaluations to compute the exact Hessian during ea ch optimization iteration. The adjoint-based methods all req uire O(N) evaluations. In particular, the fastest method is the direct-adjoint method as it requires N evaluations as opposed to the adjoint-adjoint and adjointdirect methods which both require2N evaluations. Applications to a boundary-value problem are presented. Index Terms Second derivative, Hessian, adjoint, direct, hyper-dual, optimization I. I NTRODUCTION NEWTON’S method is a powerful optimization algorithm which i s well known to converge quadratically upon the root of the gradient of a given objective function. The dr awback of this method, however, is the calculation of the second derivative matrix of the objective function, o r Hessian. When the objective function is computationally expensive to evaluate, the adjoint method is suitable to com pute its gradient. Such expensive function evaluations may, for example, depend on some nonlinear partial differen tial equation such as those enountered in aerodynamics [7], [12]. With this gradient information, steepest-desce nt or conjugate-gradient algorithms can then be applied to the optimization problem; however, convergence will often be slow [9]. Quasi-Newton methods will successively approximate the He ssian matrix with either Broyden-Fletcher-GoldfarbShanno (BFGS) or Davidon-Fletcher-Powell (DFP) updates [1 3] which can dramatically improve convergence. However, this enhanced convergence is dependent on the line search algorithm used to find the optimal step in the search direction; this translates to an increased number of function evaluations to satisfy the first Wolfe condition and gradient evaluations to satisfy the second Wolfe condit ion. The goal of this paper is to examine various methods used to nu merically compute the Hessian matrix. Approximate methods include finite difference or complex-step techniqu es [10],[15]. Johnson also presents a method of computing second derivatives with the Fast-Fourier Transform [8]. Ex act methods include the use of hyper-dual numbers [2],[3],[4],[5],[6] which requiresO(N) function evaluations. Papadimitriou and Giannakoglou exa mine adjoint and direct methods for exactly computing the Hessian matrix [14 ]. This paper largely follows the methods presented by the latter authors with the exception that the direct-direc t method is not examined since it requires O(N) equivalent function evaluations. Instead, the adjoint-adjoint, adjo int-direct and direct-adjoint methods are presented as the y all provide theexact Hessian inO(N) equivalent function evaluations. The hyper-dual method is also presented since it is a fair comparison to the adjoint-based methods for comp uting exact Hessians. The function, here, is obtained from discretizing a linear d ifferential equation. The theory presented in Section 2, however, is more generally derived for nonlinear systems of equations. A few terms, notably the second-order partial derivatives of the residual equation, vanish in the case of a linear operator. 18.335 Final Project submitted to Professor Steven G. Johns on, Massachusetts Institute of Technology, 12 December 201 2 II. COMPUTING THE HESSIAN MATRIX A. Motivation Suppose we want to solve the unconstrained optimization pro blem

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要