Uncontraied optimization#
minimize \(f(x)\)
\(f\) convex, twice continuously differentiable (hence \(dom f\) open)
we assume optimal value \(p^*=\inf_x f(x)\) is attained (and finite) solving the uncontrained optimization = find the solution to: \(\nabla f(x^*)=0\) $(n equations)
usually need iterative algorithms:
produce a sequence of \(x^{(k)}\in dom(f), k=1,2,\dots\) with \(f(x^{(k)}) \to \min f(x)\) (i.e., \(\nabla f(x^{(k)}) \))
Descent methods
start with \(x^{(0)}\)
update \(x^{(k+1)} = x^{(k)} + t^{(k)} \Delta x^{(k)}\) s.t \(f(x^{(k+1)}< f(x^{(k)})\)
other notations: \(x^+=x+ t \Delta x\), \(x:=x+t\Delta x\)
\(\Delta x\) is the step, or search direction; \(t\) is the step size, or step length
from convexity, \(f(x^+) < f(x)\) implies \(\nabla f(x)^T \Delta x < 0\) (i.e., \(\Delta x\) is a descent direction)
General descent method.
given a starting point \(x\in dom(f)\).
repeat
determie a descent direction \(\Delta x\).
line search
Update: \(x:=x+t \Delta \)
Determine a descent direction
one approach is gradient descent \(\Delta x = -\nabla f(x)\)
choose a step size
line search methods: exact line search \(t=argmin_{t>0} f(x+t \Delta x)\)
backtracking
find the steepest descent direction - steepest descent method normalized steepest descent direction \(\Delta x_{nsd} = argmin \{ \nabla f(x)^T v \mid ||v|| = 1 \}\) \(\Delta x = -\nabla f(x)\) is the steepest descent direction with respect to Euclidean norm
so performance of steepest descent similar to graident descent
Newton’s method#
Newton’s direction \(\Delta x_{nt} = - \nabla^2 f(x)^{-1} \nabla f(x)\)
One interpretation
\(x+\Delta x_{nt}\) minimizes second order approximation \(f(x+v)=f(x) + \nabla f(x)^T v + \frac{1}{2} v^T \nabla^2 f(x) v\)
intuition: \(f\) twice differentiable, so this quadratic model is very accurate when \(x\) is near \(x^*\).
Newton’s descrement (stopping criteria): \(\lambda(x) = (\nabla f(x)^T \nabla^2 f(x)^{(-1)} \nabla f(x))^{1/2}\) a measure of the proximity of \(x\) to \(x^*\)
gives an estimate of \(f(x) - p^*\), using quadratic approximation \(\hat{f}\): \(f(x) - \inf_v \hat{f} (x+v) = \frac{1}{2} \lambda(x)^2\)
directional derivative in the Newton direction: \(\nabla f(x)^T \Delta x_{nt} = -\lambda(x)^2\)
(damped or guarded) Newton’s method:
given a starting point , tolerance \(\epsilon\) repeat
computer the Newton step and decrement.
stopping criterion.
linea search
update
damped Newton phase: \(t<1\)
quadratically convergent phase: \(t=1\)
pros: convergence is rapid in general and quadratic near \(x^*\)
cons: compute and store \(\nabla^2 f(x)\) and \(\nabla^2f(x)^{-1}\) (cost)
Missing one lecture [todo]#
infeasible start Newton method#
A generalization that deals with infeasible initial points and iterates
let x be a point that we do not assume to be feasible
find \(x+\Delta x_{nt}\) that solves the second-order approximation \(\min \hat{f}(x+v)=f(x)+\nabla f(x)^T v + (1/2)v^T\nabla^2 f(x) v\) s.t. \(A(x+v) =b\)
primal Newton step, dual Newton step
primal-dual Newton step (an alternative way to derive)
write optimality condition as \(r(y) = 0\), where \(y=(x,v)\), \(r(y) = (\nabla f(x) + A^T v, Ax - b )\) can be understood as \((r_{dual} (x,v) , r_{pri} (x,v)\)
linearizing \(r(y) = 0\) gives \(r(y+ \Delta y) \approx r(y) + D r(y) \Delta y = 0\)
first-order approx of first-order approx = second-ordergiven starting point \(x\in dom f, v\), tolerance \(\epsilon > 0\), \(\alpha\in (0, 1/2)\), \(\beta \in (0,1)\).
repeat
Compute primal aand dual Newton steps \(\Delta x_{nt}\), \(\Delta v_{nt}\)
Backtracking line search on \(||r||_2\). \(t:=1\). while \(||r(x+t\Delta x_{nt}, v+t \Delta v_{nt})||_2 > (1-\alpha t) || r(x,v)||_2\), \(t:=\beta t\)
Update. \(x:=x+t\Delta x_{nt}\), \(v:=v+ t\Delta v_{nt}\).
until \(Ax = b\) and \(||r(x,v) || \le \epsilon\).
\(r(y) + D r(y) \Delta y = 0\)
not a decent method:
the norm of \(r\) decreases in the Newton’s direction:
if \(t=1\), the next iterate will be feasible, and all the following iterates will be feasible