Uncontraied optimization

Uncontraied optimization#

minimize $f(x)$

$f$ convex, twice continuously differentiable (hence $dom f$ open)
we assume optimal value $p^*=\inf_x f(x)$ is attained (and finite) solving the uncontrained optimization = find the solution to: $\nabla f(x^*)=0$ $(n equations)

usually need iterative algorithms:

produce a sequence of $x^{(k)}\in dom(f), k=1,2,\dots$ with $f(x^{(k)}) \to \min f(x)$ (i.e., $\nabla f(x^{(k)}) $)

Descent methods

start with $x^{(0)}$
update $x^{(k+1)} = x^{(k)} + t^{(k)} \Delta x^{(k)}$ s.t $f(x^{(k+1)}< f(x^{(k)})$
other notations: $x^+=x+ t \Delta x$, $x:=x+t\Delta x$
$\Delta x$ is the step, or search direction; $t$ is the step size, or step length
from convexity, $f(x^+) < f(x)$ implies $\nabla f(x)^T \Delta x < 0$ (i.e., $\Delta x$ is a descent direction)

General descent method.

given a starting point $x\in dom(f)$.
repeat

determie a descent direction $\Delta x$.
line search
Update: $x:=x+t \Delta $

Determine a descent direction

one approach is gradient descent $\Delta x = -\nabla f(x)$
choose a step size

line search methods: exact line search $t=argmin_{t>0} f(x+t \Delta x)$
backtracking

find the steepest descent direction - steepest descent method normalized steepest descent direction $\Delta x_{nsd} = argmin \{ \nabla f(x)^T v \mid ||v|| = 1 \}$ $\Delta x = -\nabla f(x)$ is the steepest descent direction with respect to Euclidean norm

so performance of steepest descent similar to graident descent

Newton’s method#

Newton’s direction $\Delta x_{nt} = - \nabla^2 f(x)^{-1} \nabla f(x)$

One interpretation

$x+\Delta x_{nt}$ minimizes second order approximation $f(x+v)=f(x) + \nabla f(x)^T v + \frac{1}{2} v^T \nabla^2 f(x) v$

intuition: $f$ twice differentiable, so this quadratic model is very accurate when $x$ is near $x^*$.

Newton’s descrement (stopping criteria): $\lambda(x) = (\nabla f(x)^T \nabla^2 f(x)^{(-1)} \nabla f(x))^{1/2}$ a measure of the proximity of $x$ to $x^*$

gives an estimate of $f(x) - p^*$, using quadratic approximation $\hat{f}$: $f(x) - \inf_v \hat{f} (x+v) = \frac{1}{2} \lambda(x)^2$
directional derivative in the Newton direction: $\nabla f(x)^T \Delta x_{nt} = -\lambda(x)^2$

(damped or guarded) Newton’s method:

given a starting point , tolerance $\epsilon$ repeat

computer the Newton step and decrement.
stopping criterion.
linea search
update

damped Newton phase: $t<1$
quadratically convergent phase: $t=1$
pros: convergence is rapid in general and quadratic near $x^*$
cons: compute and store $\nabla^2 f(x)$ and $\nabla^2f(x)^{-1}$ (cost)

Missing one lecture [todo]#

infeasible start Newton method#

A generalization that deals with infeasible initial points and iterates

let x be a point that we do not assume to be feasible
find $x+\Delta x_{nt}$ that solves the second-order approximation $\min \hat{f}(x+v)=f(x)+\nabla f(x)^T v + (1/2)v^T\nabla^2 f(x) v$ s.t. $A(x+v) =b$

primal Newton step, dual Newton step

primal-dual Newton step (an alternative way to derive)

write optimality condition as $r(y) = 0$, where $y=(x,v)$, $r(y) = (\nabla f(x) + A^T v, Ax - b )$ can be understood as $(r_{dual} (x,v) , r_{pri} (x,v)$
linearizing $r(y) = 0$ gives $r(y+ \Delta y) \approx r(y) + D r(y) \Delta y = 0$ first-order approx of first-order approx = second-order
given starting point $x\in dom f, v$, tolerance $\epsilon > 0$, $\alpha\in (0, 1/2)$, $\beta \in (0,1)$.
repeat

Compute primal aand dual Newton steps $\Delta x_{nt}$, $\Delta v_{nt}$
Backtracking line search on $||r||_2$. $t:=1$. while $||r(x+t\Delta x_{nt}, v+t \Delta v_{nt})||_2 > (1-\alpha t) || r(x,v)||_2$, $t:=\beta t$
Update. $x:=x+t\Delta x_{nt}$, $v:=v+ t\Delta v_{nt}$.

until $Ax = b$ and $||r(x,v) || \le \epsilon$.

$r(y) + D r(y) \Delta y = 0$

not a decent method:
the norm of $r$ decreases in the Newton’s direction:
if $t=1$, the next iterate will be feasible, and all the following iterates will be feasible

Uncontraied optimization

Contents

Uncontraied optimization#

Newton’s method#

Missing one lecture [todo]#

infeasible start Newton method#

Algorithms for Equality Constrained Optimization#