Uncontraied optimization#

minimize \(f(x)\)

  • \(f\) convex, twice continuously differentiable (hence \(dom f\) open)

  • we assume optimal value \(p^*=\inf_x f(x)\) is attained (and finite) solving the uncontrained optimization = find the solution to: \(\nabla f(x^*)=0\) $(n equations)

usually need iterative algorithms:

  • produce a sequence of \(x^{(k)}\in dom(f), k=1,2,\dots\) with \(f(x^{(k)}) \to \min f(x)\) (i.e., \(\nabla f(x^{(k)}) \))

Descent methods

  • start with \(x^{(0)}\)

  • update \(x^{(k+1)} = x^{(k)} + t^{(k)} \Delta x^{(k)}\) s.t \(f(x^{(k+1)}< f(x^{(k)})\)

  • other notations: \(x^+=x+ t \Delta x\), \(x:=x+t\Delta x\)

  • \(\Delta x\) is the step, or search direction; \(t\) is the step size, or step length

  • from convexity, \(f(x^+) < f(x)\) implies \(\nabla f(x)^T \Delta x < 0\) (i.e., \(\Delta x\) is a descent direction)

General descent method.

  • given a starting point \(x\in dom(f)\).

  • repeat

  1. determie a descent direction \(\Delta x\).

  2. line search

  3. Update: \(x:=x+t \Delta \)

Determine a descent direction

  • one approach is gradient descent \(\Delta x = -\nabla f(x)\)

  • choose a step size

  1. line search methods: exact line search \(t=argmin_{t>0} f(x+t \Delta x)\)

  2. backtracking

find the steepest descent direction - steepest descent method normalized steepest descent direction \(\Delta x_{nsd} = argmin \{ \nabla f(x)^T v \mid ||v|| = 1 \}\) \(\Delta x = -\nabla f(x)\) is the steepest descent direction with respect to Euclidean norm

so performance of steepest descent similar to graident descent

Newton’s method#

Newton’s direction \(\Delta x_{nt} = - \nabla^2 f(x)^{-1} \nabla f(x)\)

One interpretation

  • \(x+\Delta x_{nt}\) minimizes second order approximation \(f(x+v)=f(x) + \nabla f(x)^T v + \frac{1}{2} v^T \nabla^2 f(x) v\)

intuition: \(f\) twice differentiable, so this quadratic model is very accurate when \(x\) is near \(x^*\).

Newton’s descrement (stopping criteria): \(\lambda(x) = (\nabla f(x)^T \nabla^2 f(x)^{(-1)} \nabla f(x))^{1/2}\) a measure of the proximity of \(x\) to \(x^*\)

  • gives an estimate of \(f(x) - p^*\), using quadratic approximation \(\hat{f}\): \(f(x) - \inf_v \hat{f} (x+v) = \frac{1}{2} \lambda(x)^2\)

  • directional derivative in the Newton direction: \(\nabla f(x)^T \Delta x_{nt} = -\lambda(x)^2\)

(damped or guarded) Newton’s method:

given a starting point , tolerance \(\epsilon\) repeat

  1. computer the Newton step and decrement.

  2. stopping criterion.

  3. linea search

  4. update

  • damped Newton phase: \(t<1\)

  • quadratically convergent phase: \(t=1\)

  • pros: convergence is rapid in general and quadratic near \(x^*\)

  • cons: compute and store \(\nabla^2 f(x)\) and \(\nabla^2f(x)^{-1}\) (cost)

Missing one lecture [todo]#

infeasible start Newton method#

A generalization that deals with infeasible initial points and iterates

  • let x be a point that we do not assume to be feasible

  • find \(x+\Delta x_{nt}\) that solves the second-order approximation \(\min \hat{f}(x+v)=f(x)+\nabla f(x)^T v + (1/2)v^T\nabla^2 f(x) v\) s.t. \(A(x+v) =b\)

primal Newton step, dual Newton step

primal-dual Newton step (an alternative way to derive)

  • write optimality condition as \(r(y) = 0\), where \(y=(x,v)\), \(r(y) = (\nabla f(x) + A^T v, Ax - b )\) can be understood as \((r_{dual} (x,v) , r_{pri} (x,v)\)

  • linearizing \(r(y) = 0\) gives \(r(y+ \Delta y) \approx r(y) + D r(y) \Delta y = 0\) first-order approx of first-order approx = second-order

  • given starting point \(x\in dom f, v\), tolerance \(\epsilon > 0\), \(\alpha\in (0, 1/2)\), \(\beta \in (0,1)\).

  • repeat

  1. Compute primal aand dual Newton steps \(\Delta x_{nt}\), \(\Delta v_{nt}\)

  2. Backtracking line search on \(||r||_2\). \(t:=1\). while \(||r(x+t\Delta x_{nt}, v+t \Delta v_{nt})||_2 > (1-\alpha t) || r(x,v)||_2\), \(t:=\beta t\)

  3. Update. \(x:=x+t\Delta x_{nt}\), \(v:=v+ t\Delta v_{nt}\).

  • until \(Ax = b\) and \(||r(x,v) || \le \epsilon\).

\(r(y) + D r(y) \Delta y = 0\)

  • not a decent method:

  • the norm of \(r\) decreases in the Newton’s direction:

  • if \(t=1\), the next iterate will be feasible, and all the following iterates will be feasible

Algorithms for Equality Constrained Optimization#