Algorithms

Algorithms#

Interior-Point methods#

basic ideas#

move inequality constraint to objective function via indicator functions

$\min f_0(x) + \sum_{i=1}^m I_- (f_i(x))$
s.t. $Ax=b$

where $I_-(u)=0$ if $u\le 0$, $I_-(u)=\infty$ otherwise (indicator function of $R_-$).

Logarithmic barrier function#

approximation via logarithmic barrier: fix some $t>0$

$\min f_0(x) - (1/t) \sum_{i=1}^m \log (-f_i(x))$
s.t. $Ax=b$

$\phi(x)=-\sum_{i=1}^m \log(-f_i(x))$, $dom \phi = \{x\mid f_1(x) \le 0, \dots, f_m(x)<0 \}$

convex (follows from composition rules)
twice continuously differentiable, with gradient Hessian

becomes

$\min f_0(x) - (1/t)\sum \log(-f_i(x))$
s.t. $Ax=b$
difficult to minimize using Newton’s method (from a random starting point) when t is large

because Hessian varies rapidly near boundary of feasibility set

can be circumvented by solving a sequence of problems with increasing $t$

startnig each Newton minmization from the solution to the problem with previous $t$

Central path#

central paths: $\{ x^*(t) \mid t>0 \}$
$x^*(t)$: central points

example: central path for an LP

$\min c^T x$
s.t. $a_i^T x \le b_i$, $i=1,\dots,6$

hyperplane $c^T x = c^T x^*(t)$ is tangent to level curve of $\phi$ through $x^*(t)$

take the central path through interior of the feasible set

analytic center of a set of convex inequalities and linear equations $f_i(x) \le 0$, $i=1,\dots,m$, $Fx=g$ is defined as the optimal point of

$\min -\sum_i^m \log(-f_i(x))$
s.t. $Fx=g$ analytic center of linear inequalities $a_i^T x\le b_i$, $i=1,\dots,m$

$x_{ac}$ is minimizer of $\phi(x) = -\sum_i^m \log(b_i-a_i^T x)$

Barrier method (one interior-point method)#

given strictly feasible $x$, $t:=t^{(0)}>0$, $\mu>1$, tolerance $\epsilon > 0$. repeat

Centering step. Compute $x^*(t)$ by minimizing $t f_0 + \phi$, subject to $Ax=b$
Update $x:=x^*(t)$
Stopping criterion. quit if
increase $t$. $t:=\mu t$

choice of $\mu$ involves a trade-off: large $\mu$ means fewer outer interations, more inner (Newton) iterations; typical values: $\mu=10-20$ (for more practical choices of parameters, pp.570, textbook)

dual points from central path#

every $x^*(t)$ corresponds to a dual feasible point (of the original inequality constrained problem) $\lambda_i^*(t) = 1/(-tf_i(x^*(t)))$ and $\nu^*(t) = w/t$

verification: $x^*(t)$ solves

$\min t f_0(x) + \phi(x)$
s.t. $Ax=b$

implies

$Ax^* = b$, $f_i(x^*) < 0$, $i=1, \dots, m$
$\exists w$, $t\nabla f_0(x^*) + \sum \frac{1}{-f_i(x^*)} \nabla f_i(x^*) + A^T w=0$

implies

$x^*(t)$ minimizes the Lagrangian (of the original problem)
$L(x,\lambda^*(t), \nu^*(t))=f_0(x) + \sum \lambda_i^*(t) f_i(x) + \nu^*(t)^T (Ax-b)$
at: $\lambda_i^*(t) = 1/(-t f_i(x^*(t)))$ and $\nu^*(t) = w/t$

Duality gap $m/t$: $f_0(x^*(t)) \ge p^* \ge d^* \ge g(\lambda^*(t), \nu^*(t)) = L(x^*(t), \lambda^*(t),\nu^*(t))=f_0(x^*(t))-m/t$

Interpretation via KKT condition#

$x=x^*(t)$, $\lambda=\lambda^*(t)$, $\nu=\nu^*(t)$ satsify

primal constraints: $f_i(x)\le 0$, $i=1,\dots,m$, $Ax=b$
dual constraints: $\lambda\succeq 0$
approximate complementary slackness:

Convergnce#

The number of steps to converge within tolerance $\epsilon$:

plus the initial centering step (to compute $x^*(t^{(0)})$)

Example: geometric program

Basic phase I method#

$\min (over x,s) s$
s.t. $f_i(x)\le s, i=1,\dots,m$
$Ax=b$

if $x$, $s$ feasible, with $s<0$, then $x$ is strictly feasible for (1)
if optimal value $p^*$ of (2) is positive, then problem (1) is feasible
if $p^* =0$ and attained, then problem (1) is feasible (but not strictly); if $p^* =0$ and not attained, then problem (1) is infeasible

Sum of infeasibilities phase I method

$\min 1^T s$
s.t. $s\succeq 0$, $f_i(x)\le s_i$, $i=1,\dots,m$
$Ax=b$

The optimal value of (3) is zero and achieved iff problem (1) is feasible

Phase I via Newton method#

Primal-dual interior-point method#

Update both primal and dual variables in the Newton method to solve the modified KKT conditions

no distinction between inner and outer iterations
the primal iterates are not necessarily feasible
$\hat{\eta}(x,\lambda) = -f(x)^T \lambda$: surrogate duality gap
backtracking line search based on the norm of the residuals

backtracking line search#

based on the norm of the residuals, modified to ensure $f(x)\prec 0$, $\lambda \succeq 0$.

compute the largest positive step length that does not exceed one and gives $\lambda^+\succeq 0$:

Subgradient Method#

$g$ is a subgradient of $f$ (not necessarily convex) at $x$ if $f(y)\ge f(x) + g^T (y-x) \forall y$

if $f$ is convex and differentiable, $\nabla f(x)$

Example 38 (subgradient)

$f=\max\{f_1,f_2\}$ with $f_1$, $f_2$ convex and differentiable

$f_1(x_0)> f_2(x_0)$: unique subgradient $g=\nabla f_1(x_0)$
$f_2(x_0)> f_1(x_0)$: unique subgradient $g=\nabla f_2(x_0)$
$f_1(x_0) = f_2(x_0)$: subgradient form a line segment $[\nablaf_1(x_0),\nabla f_2(x_0)]$

Subdifferential#

set of all subgradients of $f$ at $x$ is called the subdifferential of $f$ at $x$, denoted $\partial f(x)$
$\partial f(x)$ is a closed convex set (can be empty)

if $f$ is convex,

$\partial f(x)$ is nonempty, for $x\in dom f$
$\partial f(x) = \{ \nabla f(x) \}$, if $f$ is differentiable at $x$
if $\partial f(x) = \{ g \}$, …

recall for $f$ convex, differentiable $f(x^*)=\inf_x f(x) \Longleftrightarrow 0 = \nabla f(x^*)$

Optimality condition for unconstrained problem#

$\min f_0(x)$
s.t. $f_i(x) \le 0, i\in [m$

we assume

$f_i$ convex, defined on $R^n$ (hence subdiffereentiable)
strict feaisbility (Slater’s condition)

$x^*$ is primal optimal ($\lambda^*$ is dual optimal) iff

$f_i(x^*)\le 0$, $\lambda^*_1 \le 0$
..
..

the algorithm is similar to …

Step size#

step size rules (step sizes are fixed before algorithm execution)

constant stetp size
constant step length
square summable

Convergence#

assumption

$f^*= \inf_x f(x) > -\infty$ , with $f(x^*)=f^*$
$||g||_2 \le G \forall g\in \partial f$
$R\ge ||x^{(1)} - x^*||_2$

convergence results: define $\bar{f}=\lim_{k\to \infty} f_{best}^{(k)}$

constant step size: $\bar{h} - f^* \le G^2 \alpha/2$, i.e., converges to $G^2\alpha/2$-suboptimal (converges to $f^*$ if $\alpha$ small enough)
constant step length: $G\gamma /2$-suboptimal

Example 39

minimize $f(x) = \max_{i=1,\dots,m} (a_i^T x + b_i)$

Subgradient method for constrained problems#

Projection#

projection: $s=argmin_{s\in C} ||x-s||_s$

example: linear equality constrained problem

$\min f(x)$
s.t. $Ax=b$

projection of $z$ onto $\{ x\mid Ax=b \}$ is

\[P(z) = z- A^T(AA^T)^{-1} (Az-b) = (I - A^T (AA^T)^{-1} A) z + A^T(AA^T)^{-1}b)\]

Example 40 (Linear equality constrained problem)

minimize $||x||_1$
s.t. $Ax = b$

subgradient of objective is $g= sign(x)$

Projected subgradient method for dual problem#

(convex) primal: (Slater’s condition holds)

minimize $f_0(x)$
s.t. $f_i(x) \le 0 $

solve dual problem

maximize $g(\lambda)$
s.t. $\lambda \succeq 0$

via projected subgradient method: $\lambda^{(k+1)} =... $

Given a starting point $\lambda$…. Repeat

iterpretation:

$\lambda_i$ is price for resource $f_i(x)$
price update $\lambda_i^{(k+1)}=\qty(\lambda_i^{(k)}+\alpha_k f_i(x^{(k)}))_+$
increase price $\lambda_i$ if resource $i$ is over-utilized (i.e., $f_i(x) >0$)
decrease price $\lambda_i$ if … is under-utilizedo
not negative

Example 41

minimize strictly convex quadratic $(P\succ 0)$ over unit box:

minimize $(1/2)x^T P x -q^T x$
s.t. $x_i^2 \le 1$
$L(x,\lambda) = (1/2)x^T (P+diag(2\lambda))x - q^T x - 1^T \lambda$
$x^*(\lambda) = (P+diag(2\lambda))^{-1} q$
projected

Cutting plane oracle#

Oracle: a device that can answer question for us; we make no assumption on how an answer is found by the device

Cutting plane oracle (also called separation oracle), once queried at $x$, either

asserts $x\in X$ ($X\subseteq R^n$)
or return a separating hyperplane between $x$ and $X$: $a\neq 0$, $a^T z \le b$ for $z\in X$, $a^T x \ge b$ $(a,b)$ called a cutting-plane, or cut, since it eliminates the halfspace $\{ z\mid a^T z > b \}$ from our search for a point in $X$
neutral cut
deep cut

unconstrained optimization

$f_0:R^n\to R$, convex; $x^*$ is the optimal solution; $X$ is the set of optimal points
Given $x$, find $g\in \partial f_0 (x)$
$x^*$ belongs to the halfspace: $\{ z\mid g^T(z-x) \le 0 \},\forall x$
$g^T(z-x)\le 0$ defines a cutting plane at $x (a=g,b=g^Tx)$

inequality constrained optimization#

If $x$ is feasible, we have a (neural) objective cut $g_0^T(z-x)\le 0$, $g_0\in\partial f_0(x)$
If $x$ is not feasible, e.g., $f_j(x)>0$, we have a (deep) feasibility cut $f_j(x) + g_j^T (z-x) \le 0$, $g_j \in \partial f_j(x)$

basic (conceptual) cutting-plane/localzation algorithm

choosing the center point (specific localization methods)#

center of gravity
analytic centering
Chebyshev center
Center of the maximum volume epplisoid (MVE)

Bisection method on R#

minimize convex $f:R\to R$
$P_k$ is an interval
obvious choice for query point: $x^{(k+1)}:=midpoint(P_k)$

Bisection algorithm for one-dimensional search

given an initial interval $[l,u]$ known to contain $x^*$; a required tolerance $r>0$ repeat - $x:=(l+u)/2$. - query the oracle at $x$. - if the oracle determines that $x^*\le x, u:=x$ - if the oracle determines that $x^*\ge x, l:=x$ until $u-l\e 2r$

for differentiable $f$: evaluate $f'(x)$ if $f'(x)<0$, $l:=x$; else $u:=x$

Ellipsoid method (for unconstrained convex problems)#

Idea: Use ellipsoids to approximate polyhedron, find $x^*$ in an ellipsoid instead of a polyhedron

Algorithm sketch:

at iteration $k$ we know
set
hence we know

Basic ellipsoid algorithm (for unconstrained convex problems)

given ellipsoid $E(x,P)$ containing
$\sqrt{g^T P g}$

For inequaltiy constrained problems

If $x^{(k)}$ feasible, update ellipsoid with objective cut

a (deep) cut

If $x^{(k)}$ infeasible, update ellipsoid with feasible cut

minimum volume ellipsoid containing ellipsoid intersected with halfspace

where $\tilde{g} = \frac{g}{\sqrt{g^T P g}}$.

Stopping criteria

if $x^{(k)}$ is feasible and $\sqrt{g_0^{(k)T} P^{(k)} g_0^{(k)} }\le \epsilon$
if $f_j(x^{(k)}) - \sqrt{g_j^{(k)T} P^{(k)} g_j^{(k)} } > 0$

Decomposition methods#

Break a problem into smaller ones and solve each of the smaller ones separably, either in parallel or sequentially

Separable problems#

minimize $f_1(x_1)+ f_2(x_2)$
s.t. $x_1\in C_1$, $x_2\in C_2$

we can solve for $x_1 and $x_2$ separately (in parallel)

consider uncontrained problem,

minimize $f(x) = f_1(x_1,y) + f_2(x_2,y)$, $x=(x_1,x_2,y)$

$y$ is the complicating variable or coupling variable; when it is fixed the problem is separable in $x_1$ and $x_2$

Primal decomposition method#

fix $y$ and define

subproblem1: $\min_{x_1} f_1(x_1,y)$
subproblem2: $\min_{x_2} f_1(x_2,y)$

with optimal values $\phi_1(y)$ and $\phi_2(y)$

master problme $\min_{y} \phi_1(y) + \phi_2(y)$

can solve master problem using

bisection (if $y$ is scalar) 2 gradient or Newton method (if $\phi_i$ differentiable) 3 subgradient, cutting-plane, or ellipsoid method

each

Algorithm sketch

Given $y^{(0)}$, $k=0$;
Repeat
solve two subproblems to derive $x_1$, $x_2$
update $y^{(k)}$ to $y^{(k+1)}$ based on the algorithm to solve the master problem

Dual decomposition method#

Step 1:
Step 2: form dual problem

\[ \begin{align}\begin{aligned}L(x_1, y_1, x_2, y_2) = f_1(x_1,y_1) + f_2(x_2,y_2) + \nu^T (y_1 - y_2)\\separable; can minimize over $(x_1,y_1)$ and $(x_2,y_2)$ separately - $g_1(\mu) = \inf_{x_1,y_1} (f_1(x_1,y_1)+\nu^T y_1) = -f_1^*(0,-\nu)$ - $g_2(\mu) = \inf_{x_2,y_2} (f_2(x_2,y_2)-\nu^T y_2) = -f_2^*(0,\nu)$\\dual problem is: maximize $g(\nu) = g_1(\nu) + g_2(\nu)$\\- compuing $g_i(\nu)$ are the dual subproblems - can be done in parallel - a subgradient of $-g$ is $y_2-y_1$ (from solutions of sub)\end{aligned}\end{align} \]

Dual decomposition algorithm

(using subgradient algorithm for master)

repeat

solve the dual subproblems (in parallel) find
Update dual variables (prices). $\nu:=\nu - \alpha_k(y_2-y_1)$

step lenght $\alpha_k$ can be chosen in standard ways
at each step we have a lowe bound $g(\nu)$ on $p^*$
iterates are generally infeasible, i.e., $y_1\neq y_2$

Finding feasible iterate

reasonable guess of feasible point from $(x_1,y_1)$, $(x_2,y_2)$:
a better feasible point:

Problems with complicating constraints#

$\min f_1(x_1) + f_2(x_2)$
s.t. $x_1\in C_1$, $x_2\in C_2$, $h_1(x_1)+h_2(x_2)\preceq 0$
$f_i, h_i, C_i$ convex
can interpret coupling constaints as limits on resources shared between two subproblems

fix $t\in R^p$ and define

$t$ is the quantity of resources allocated to first subproblem ($-t$ is allocated to second subproblem)
master problem: minimize $\phi_1(t) + \phi_2(t)$ (optimal values of subproblems) over $t$
subproblems can be solve separately (in parallel) when $t$ is fixed

Lagrangian relaxation and subgradient method

$\alpha_k$ is an appropriate step size
iterates need not be feasible
can again construct feasible primal variables using projection

Example 42

Problem setup

$n$ flows, with fixed routes, in a network with $m$ links
variable $f_i\ge 0 $ denotes the rate of flow $j$
flow utility is $U_j: R\to R$, strictly concave, increasing
traffic $t_i$ on link $i$ is the sum of flows passing through it
$t=Rf$, where $R$ is the routing matrix $R_{ij} = 1$ flow j passes over link $i$, 0 otherwise
link capacity constraint: $t\preceq c$

Rate control problem:

maximize $U(f) = \sum_j^n U_j(f_j)$
s.t. $Rf \preceq c$
convex problem
dual decomposition gives decentralized method

Lagrangian: $L(f,\lambda) = -\sum_j^n U_j(f_j) + \lambda^T (Rf - c)$

Lagrangian dual

\[g(\lambda) = \inf_f(\sum_j^n -U_j(f_j) + \lambda^T (Rf-c)) = -\lambda^T c + \sum_j^n \inf_{f_j} (-U_j(f_j)+(r_j^T \lambda) f_j) = -\lambda&^T c - \sum_j^n(-U_j)^* (-r_j^T \lambda)\]

dual problem: $\max -\lambda^T c - \sum_{j=1}^n (-U_j)^* (-r_j^T \lambda)$
s.t. $\lambda \succeq 0$

a subgradient of $-g$ is given by $c-R\bar{f}$, where $\bar{f}_j$ is a solution of the subproblem $\min_{f_j} -U_j(f_j)+(r_j^T \lambda) f_j$

algorithm

given
repeat
Sum link prices along each route. Calculate $\Lambda_j = r_j^T \lambda$.
Optimize flows (separately) using flow prices. $f_j=argmax(U_j(f_j)-\Lambda_j f_j)$
Calculate link capacity margins: $s:=c- Rf$
update link prices $\lambda: = (\lambda - \alpha_k s)_+$

decentralized:

links only need to know the flows that pass through them
flows only need to know prices on link they pass through

Generating feasible flow rates:

iterates can be (and often are) infeasible, i.e., $Rf not \preceq c$ (but we do have $Rf \preceq c$ in the limit)
define $\eta_i = t_i/c_i = (Rf)_i /c_j$
$\eta_i < 1$ means link $i$ is under capacity
$\eta_i > 1$ means link $i$ is over capcacity
define $f^{feas}$ as

Convergence

$n=10$ flows, $m=12$ links: 3 or 4 links per flow
link capacities chosen randomly, uniform on [0,1, 1]
$U_j(f_j) = \log f_j$
optimal flow as a function of price: $\bar{f}_j=argmax ...$
initial prices: $\lambda = 1$
constant stepsize $\alpha_k = 3$

Methods for nonconvex optimization problems#

Convex optimization algorithms are in general

global (find global minimum)
fast (run in polynomial-time)

For general nonconvex problems, we have to give up one

local optimization methods are fast, but may not find global minimum (and even when they do, cannot certify it)
global optimization methods find global minimum (and certify it), but are often

Branch and bounds#

methods for global optimization for nonconvex problems
basic idea
partition feasible set into convex sets, and find lower/upper bounds for each
maintain global lower and upper bounds: quit if they are close enough to each other
else, refine partitino and repeat
Often slow; exponential

find global minimum of function $f: R^m \to R$ over a $m$-dimensional rectangle $Q_{init}$, to within some preset accuracy $\epsilon$

algorithm sketch:

computer lower and upper bounds on $f^*$
partition (split) $Q_{init}$ into two rectangles $Q_{init}=Q_1\cup Q_2$
compute
update lower and upper bounds on $f^*$ ….
refine partition, by splitting $Q_1$ or $Q_2$, and repeat step 3 and 4

Local search#

A heuristic method to solve computationally hard problems

moves from solution to solution in the space of candidate solutions (i.e., search space) by applying local changes, until a solution deemed optimal is found or a bound on the number of steps /time taken is reached

Problems that local search has been applied for

minimum vertex cover problem
TSP
Boolean SAT problem

how to choose a good neighbourhoood for the problem

guided by intuition
very little theory avaiable as a guide

which solution in the neighborhood to move to

decided using only information in the neighborhood
can get stuck in a local optimal point

Sequential convex programming (SCP)#

A local optimization method for nonconvex problems based on solving a sequence of convex problems
SCP is a heuristic
SCP often works well, i.e., finds a feasible point with good, if not optimal, objective value

Basic idea: trust region

For differentiable functions

Use first-order Taylor expansion for the affine approximation
Use (convex part of) second-order Taylor expansion for the convex approximation