Algorithms#

Interior-Point methods#

basic ideas#

move inequality constraint to objective function via indicator functions

  • \(\min f_0(x) + \sum_{i=1}^m I_- (f_i(x))\)

  • s.t. \(Ax=b\)

where \(I_-(u)=0\) if \(u\le 0\), \(I_-(u)=\infty\) otherwise (indicator function of \(R_-\)).

Logarithmic barrier function#

approximation via logarithmic barrier: fix some \(t>0\)

  • \(\min f_0(x) - (1/t) \sum_{i=1}^m \log (-f_i(x))\)

  • s.t. \(Ax=b\)

\(\phi(x)=-\sum_{i=1}^m \log(-f_i(x))\), \(dom \phi = \{x\mid f_1(x) \le 0, \dots, f_m(x)<0 \}\)

  • convex (follows from composition rules)

  • twice continuously differentiable, with gradient Hessian

becomes

  • \(\min f_0(x) - (1/t)\sum \log(-f_i(x))\)

  • s.t. \(Ax=b\)

  • difficult to minimize using Newton’s method (from a random starting point) when t is large

because Hessian varies rapidly near boundary of feasibility set

  • can be circumvented by solving a sequence of problems with increasing \(t\)

startnig each Newton minmization from the solution to the problem with previous \(t\)

Central path#

  • central paths: \(\{ x^*(t) \mid t>0 \}\)

  • \(x^*(t)\): central points

example: central path for an LP

  • \(\min c^T x\)

  • s.t. \(a_i^T x \le b_i\), \(i=1,\dots,6\)

hyperplane \(c^T x = c^T x^*(t)\) is tangent to level curve of \(\phi\) through \(x^*(t)\)

  • take the central path through interior of the feasible set

analytic center of a set of convex inequalities and linear equations \(f_i(x) \le 0\), \(i=1,\dots,m\), \(Fx=g\) is defined as the optimal point of

  • \(\min -\sum_i^m \log(-f_i(x))\)

  • s.t. \(Fx=g\) analytic center of linear inequalities \(a_i^T x\le b_i\), \(i=1,\dots,m\)

\(x_{ac}\) is minimizer of \(\phi(x) = -\sum_i^m \log(b_i-a_i^T x)\)

Barrier method (one interior-point method)#

  • given strictly feasible \(x\), \(t:=t^{(0)}>0\), \(\mu>1\), tolerance \(\epsilon > 0\). repeat

  1. Centering step. Compute \(x^*(t)\) by minimizing \(t f_0 + \phi\), subject to \(Ax=b\)

  2. Update \(x:=x^*(t)\)

  3. Stopping criterion. quit if

  4. increase \(t\). \(t:=\mu t\)

choice of \(\mu\) involves a trade-off: large \(\mu\) means fewer outer interations, more inner (Newton) iterations; typical values: \(\mu=10-20\) (for more practical choices of parameters, pp.570, textbook)

dual points from central path#

every \(x^*(t)\) corresponds to a dual feasible point (of the original inequality constrained problem) \(\lambda_i^*(t) = 1/(-tf_i(x^*(t)))\) and \(\nu^*(t) = w/t\)

verification: \(x^*(t)\) solves

  • \(\min t f_0(x) + \phi(x)\)

  • s.t. \(Ax=b\)

implies

  • \(Ax^* = b\), \(f_i(x^*) < 0\), \(i=1, \dots, m\)

  • \(\exists w\), \(t\nabla f_0(x^*) + \sum \frac{1}{-f_i(x^*)} \nabla f_i(x^*) + A^T w=0\)

implies

  • \(x^*(t)\) minimizes the Lagrangian (of the original problem)

  • \(L(x,\lambda^*(t), \nu^*(t))=f_0(x) + \sum \lambda_i^*(t) f_i(x) + \nu^*(t)^T (Ax-b)\)

  • at: \(\lambda_i^*(t) = 1/(-t f_i(x^*(t)))\) and \(\nu^*(t) = w/t\)

Duality gap \(m/t\): \(f_0(x^*(t)) \ge p^* \ge d^* \ge g(\lambda^*(t), \nu^*(t)) = L(x^*(t), \lambda^*(t),\nu^*(t))=f_0(x^*(t))-m/t\)

Interpretation via KKT condition#

\(x=x^*(t)\), \(\lambda=\lambda^*(t)\), \(\nu=\nu^*(t)\) satsify

  1. primal constraints: \(f_i(x)\le 0\), \(i=1,\dots,m\), \(Ax=b\)

  2. dual constraints: \(\lambda\succeq 0\)

  3. approximate complementary slackness:

Convergnce#

The number of steps to converge within tolerance \(\epsilon\):

plus the initial centering step (to compute \(x^*(t^{(0)})\))

Example: geometric program

Basic phase I method#

  • \(\min (over x,s) s\)

  • s.t. \(f_i(x)\le s, i=1,\dots,m\)

  • \(Ax=b\)

  • if \(x\), \(s\) feasible, with \(s<0\), then \(x\) is strictly feasible for (1)

  • if optimal value \(p^*\) of (2) is positive, then problem (1) is feasible

  • if \(p^* =0\) and attained, then problem (1) is feasible (but not strictly); if \(p^* =0\) and not attained, then problem (1) is infeasible

Sum of infeasibilities phase I method

  • \(\min 1^T s\)

  • s.t. \(s\succeq 0\), \(f_i(x)\le s_i\), \(i=1,\dots,m\)

  • \(Ax=b\)

  • The optimal value of (3) is zero and achieved iff problem (1) is feasible

Phase I via Newton method#

Primal-dual interior-point method#

Update both primal and dual variables in the Newton method to solve the modified KKT conditions

  • no distinction between inner and outer iterations

  • the primal iterates are not necessarily feasible

  • \(\hat{\eta}(x,\lambda) = -f(x)^T \lambda\): surrogate duality gap

  • backtracking line search based on the norm of the residuals

Subgradient Method#

\(g\) is a subgradient of \(f\) (not necessarily convex) at \(x\) if \(f(y)\ge f(x) + g^T (y-x) \forall y\)

  • if \(f\) is convex and differentiable, \(\nabla f(x)\)

Example 38 (subgradient)

\(f=\max\{f_1,f_2\}\) with \(f_1\), \(f_2\) convex and differentiable

  • \(f_1(x_0)> f_2(x_0)\): unique subgradient \(g=\nabla f_1(x_0)\)

  • \(f_2(x_0)> f_1(x_0)\): unique subgradient \(g=\nabla f_2(x_0)\)

  • \(f_1(x_0) = f_2(x_0)\): subgradient form a line segment \([\nablaf_1(x_0),\nabla f_2(x_0)]\)

Subdifferential#

  • set of all subgradients of \(f\) at \(x\) is called the subdifferential of \(f\) at \(x\), denoted \(\partial f(x)\)

  • \(\partial f(x)\) is a closed convex set (can be empty)

if \(f\) is convex,

  • \(\partial f(x)\) is nonempty, for \(x\in dom f\)

  • \(\partial f(x) = \{ \nabla f(x) \}\), if \(f\) is differentiable at \(x\)

  • if \(\partial f(x) = \{ g \}\), …

recall for \(f\) convex, differentiable \(f(x^*)=\inf_x f(x) \Longleftrightarrow 0 = \nabla f(x^*)\)

Optimality condition for unconstrained problem#

  • \(\min f_0(x)\)

  • s.t. \(f_i(x) \le 0, i\in [m\)

we assume

  • \(f_i\) convex, defined on \(R^n\) (hence subdiffereentiable)

  • strict feaisbility (Slater’s condition)

\(x^*\) is primal optimal (\(\lambda^*\) is dual optimal) iff

  • \(f_i(x^*)\le 0\), \(\lambda^*_1 \le 0\)

  • ..

  • ..

the algorithm is similar to …

Step size#

step size rules (step sizes are fixed before algorithm execution)

  • constant stetp size

  • constant step length

  • square summable

Convergence#

assumption

  • \(f^*= \inf_x f(x) > -\infty\) , with \(f(x^*)=f^*\)

  • \(||g||_2 \le G \forall g\in \partial f\)

  • \(R\ge ||x^{(1)} - x^*||_2\)

convergence results: define \(\bar{f}=\lim_{k\to \infty} f_{best}^{(k)}\)

  • constant step size: \(\bar{h} - f^* \le G^2 \alpha/2\), i.e., converges to \(G^2\alpha/2\)-suboptimal (converges to \(f^*\) if \(\alpha\) small enough)

  • constant step length: \(G\gamma /2\)-suboptimal

Example 39

minimize \(f(x) = \max_{i=1,\dots,m} (a_i^T x + b_i)\)

Subgradient method for constrained problems#

Projection#

projection: \(s=argmin_{s\in C} ||x-s||_s\)

example: linear equality constrained problem

  • \(\min f(x)\)

  • s.t. \(Ax=b\)

projection of \(z\) onto \(\{ x\mid Ax=b \}\) is

\[P(z) = z- A^T(AA^T)^{-1} (Az-b) = (I - A^T (AA^T)^{-1} A) z + A^T(AA^T)^{-1}b)\]

Example 40 (Linear equality constrained problem)

  • minimize \(||x||_1\)

  • s.t. \(Ax = b\)

subgradient of objective is \(g= sign(x)\)

Projected subgradient method for dual problem#

(convex) primal: (Slater’s condition holds)

  • minimize \(f_0(x)\)

  • s.t. \(f_i(x) \le 0 \)

solve dual problem

  • maximize \(g(\lambda)\)

  • s.t. \(\lambda \succeq 0\)

via projected subgradient method: \(\lambda^{(k+1)} =... \)

Given a starting point \(\lambda\)…. Repeat

iterpretation:

  • \(\lambda_i\) is price for resource \(f_i(x)\)

  • price update \(\lambda_i^{(k+1)}=\qty(\lambda_i^{(k)}+\alpha_k f_i(x^{(k)}))_+\)

  • increase price \(\lambda_i\) if resource \(i\) is over-utilized (i.e., \(f_i(x) >0\))

  • decrease price \(\lambda_i\) if … is under-utilizedo

  • not negative

Example 41

minimize strictly convex quadratic \((P\succ 0)\) over unit box:

  • minimize \((1/2)x^T P x -q^T x\)

  • s.t. \(x_i^2 \le 1\)

  • \(L(x,\lambda) = (1/2)x^T (P+diag(2\lambda))x - q^T x - 1^T \lambda\)

  • \(x^*(\lambda) = (P+diag(2\lambda))^{-1} q\)

  • projected

Cutting plane oracle#

Oracle: a device that can answer question for us; we make no assumption on how an answer is found by the device

Cutting plane oracle (also called separation oracle), once queried at \(x\), either

  • asserts \(x\in X\) (\(X\subseteq R^n\))

  • or return a separating hyperplane between \(x\) and \(X\): \(a\neq 0\), \(a^T z \le b\) for \(z\in X\), \(a^T x \ge b\) \((a,b)\) called a cutting-plane, or cut, since it eliminates the halfspace \(\{ z\mid a^T z > b \}\) from our search for a point in \(X\)

  • neutral cut

  • deep cut

unconstrained optimization

  • \(f_0:R^n\to R\), convex; \(x^*\) is the optimal solution; \(X\) is the set of optimal points

  • Given \(x\), find \(g\in \partial f_0 (x)\)

  • \(x^*\) belongs to the halfspace: \(\{ z\mid g^T(z-x) \le 0 \},\forall x\)

  • \(g^T(z-x)\le 0\) defines a cutting plane at \(x (a=g,b=g^Tx)\)

inequality constrained optimization#

  • If \(x\) is feasible, we have a (neural) objective cut \(g_0^T(z-x)\le 0\), \(g_0\in\partial f_0(x)\)

  • If \(x\) is not feasible, e.g., \(f_j(x)>0\), we have a (deep) feasibility cut \(f_j(x) + g_j^T (z-x) \le 0\), \(g_j \in \partial f_j(x)\)

basic (conceptual) cutting-plane/localzation algorithm

choosing the center point (specific localization methods)#

  • center of gravity

  • analytic centering

  • Chebyshev center

  • Center of the maximum volume epplisoid (MVE)

Bisection method on R#

  • minimize convex \(f:R\to R\)

  • \(P_k\) is an interval

  • obvious choice for query point: \(x^{(k+1)}:=midpoint(P_k)\)

Bisection algorithm for one-dimensional search

given an initial interval \([l,u]\) known to contain \(x^*\); a required tolerance \(r>0\) repeat - \(x:=(l+u)/2\). - query the oracle at \(x\). - if the oracle determines that \(x^*\le x, u:=x\) - if the oracle determines that \(x^*\ge x, l:=x\) until \(u-l\e 2r\)

for differentiable \(f\): evaluate \(f'(x)\) if \(f'(x)<0\), \(l:=x\); else \(u:=x\)

Ellipsoid method (for unconstrained convex problems)#

Idea: Use ellipsoids to approximate polyhedron, find \(x^*\) in an ellipsoid instead of a polyhedron

Algorithm sketch:

  1. at iteration \(k\) we know

  2. set

  3. hence we know

Basic ellipsoid algorithm (for unconstrained convex problems)

  • given ellipsoid \(E(x,P)\) containing

  • \(\sqrt{g^T P g}\)

For inequaltiy constrained problems

  • If \(x^{(k)}\) feasible, update ellipsoid with objective cut

a (deep) cut

  • If \(x^{(k)}\) infeasible, update ellipsoid with feasible cut

minimum volume ellipsoid containing ellipsoid intersected with halfspace

where \(\tilde{g} = \frac{g}{\sqrt{g^T P g}}\).

Stopping criteria

  • if \(x^{(k)}\) is feasible and \(\sqrt{g_0^{(k)T} P^{(k)} g_0^{(k)} }\le \epsilon\)

  • if \(f_j(x^{(k)}) - \sqrt{g_j^{(k)T} P^{(k)} g_j^{(k)} } > 0\)

Decomposition methods#

Break a problem into smaller ones and solve each of the smaller ones separably, either in parallel or sequentially

Separable problems#

  • minimize \(f_1(x_1)+ f_2(x_2)\)

  • s.t. \(x_1\in C_1\), \(x_2\in C_2\)

we can solve for \(x_1 and \)x_2$ separately (in parallel)

consider uncontrained problem,

  • minimize \(f(x) = f_1(x_1,y) + f_2(x_2,y)\), \(x=(x_1,x_2,y)\)

\(y\) is the complicating variable or coupling variable; when it is fixed the problem is separable in \(x_1\) and \(x_2\)

Primal decomposition method#

fix \(y\) and define

  • subproblem1: \(\min_{x_1} f_1(x_1,y)\)

  • subproblem2: \(\min_{x_2} f_1(x_2,y)\)

with optimal values \(\phi_1(y)\) and \(\phi_2(y)\)

master problme \(\min_{y} \phi_1(y) + \phi_2(y)\)

  • can solve master problem using

  1. bisection (if \(y\) is scalar) 2 gradient or Newton method (if \(\phi_i\) differentiable) 3 subgradient, cutting-plane, or ellipsoid method

  • each

Algorithm sketch

  • Given \(y^{(0)}\), \(k=0\);

  • Repeat

  • solve two subproblems to derive \(x_1\), \(x_2\)

  • update \(y^{(k)}\) to \(y^{(k+1)}\) based on the algorithm to solve the master problem

Dual decomposition method#

  • Step 1:

  • Step 2: form dual problem

\[ \begin{align}\begin{aligned}L(x_1, y_1, x_2, y_2) = f_1(x_1,y_1) + f_2(x_2,y_2) + \nu^T (y_1 - y_2)\\separable; can minimize over $(x_1,y_1)$ and $(x_2,y_2)$ separately - $g_1(\mu) = \inf_{x_1,y_1} (f_1(x_1,y_1)+\nu^T y_1) = -f_1^*(0,-\nu)$ - $g_2(\mu) = \inf_{x_2,y_2} (f_2(x_2,y_2)-\nu^T y_2) = -f_2^*(0,\nu)$\\dual problem is: maximize $g(\nu) = g_1(\nu) + g_2(\nu)$\\- compuing $g_i(\nu)$ are the dual subproblems - can be done in parallel - a subgradient of $-g$ is $y_2-y_1$ (from solutions of sub)\end{aligned}\end{align} \]

Dual decomposition algorithm

(using subgradient algorithm for master)

  • repeat

  1. solve the dual subproblems (in parallel) find

  2. Update dual variables (prices). \(\nu:=\nu - \alpha_k(y_2-y_1)\)

  • step lenght \(\alpha_k\) can be chosen in standard ways

  • at each step we have a lowe bound \(g(\nu)\) on \(p^*\)

  • iterates are generally infeasible, i.e., \(y_1\neq y_2\)

Finding feasible iterate

  • reasonable guess of feasible point from \((x_1,y_1)\), \((x_2,y_2)\):

  • a better feasible point:

Problems with complicating constraints#

  • \(\min f_1(x_1) + f_2(x_2)\)

  • s.t. \(x_1\in C_1\), \(x_2\in C_2\), \(h_1(x_1)+h_2(x_2)\preceq 0\)

  • \(f_i, h_i, C_i\) convex

  • can interpret coupling constaints as limits on resources shared between two subproblems

fix \(t\in R^p\) and define

  • \(t\) is the quantity of resources allocated to first subproblem (\(-t\) is allocated to second subproblem)

  • master problem: minimize \(\phi_1(t) + \phi_2(t)\) (optimal values of subproblems) over \(t\)

  • subproblems can be solve separately (in parallel) when \(t\) is fixed

Lagrangian relaxation and subgradient method

  • \(\alpha_k\) is an appropriate step size

  • iterates need not be feasible

  • can again construct feasible primal variables using projection

Example 42

Problem setup

  • \(n\) flows, with fixed routes, in a network with \(m\) links

  • variable \(f_i\ge 0 \) denotes the rate of flow \(j\)

  • flow utility is \(U_j: R\to R\), strictly concave, increasing

  • traffic \(t_i\) on link \(i\) is the sum of flows passing through it

  • \(t=Rf\), where \(R\) is the routing matrix \(R_{ij} = 1\) flow j passes over link \(i\), 0 otherwise

  • link capacity constraint: \(t\preceq c\)

Rate control problem:

  • maximize \(U(f) = \sum_j^n U_j(f_j)\)

  • s.t. \(Rf \preceq c\)

  • convex problem

  • dual decomposition gives decentralized method

Lagrangian: \(L(f,\lambda) = -\sum_j^n U_j(f_j) + \lambda^T (Rf - c)\)

Lagrangian dual

\[g(\lambda) = \inf_f(\sum_j^n -U_j(f_j) + \lambda^T (Rf-c)) = -\lambda^T c + \sum_j^n \inf_{f_j} (-U_j(f_j)+(r_j^T \lambda) f_j) = -\lambda&^T c - \sum_j^n(-U_j)^* (-r_j^T \lambda)\]
  • dual problem: \(\max -\lambda^T c - \sum_{j=1}^n (-U_j)^* (-r_j^T \lambda)\)

  • s.t. \(\lambda \succeq 0\)

a subgradient of \(-g\) is given by \(c-R\bar{f}\), where \(\bar{f}_j\) is a solution of the subproblem \(\min_{f_j} -U_j(f_j)+(r_j^T \lambda) f_j\)

algorithm

  • given

  • repeat

  • Sum link prices along each route. Calculate \(\Lambda_j = r_j^T \lambda\).

  • Optimize flows (separately) using flow prices. \(f_j=argmax(U_j(f_j)-\Lambda_j f_j)\)

  • Calculate link capacity margins: \(s:=c- Rf\)

  • update link prices \(\lambda: = (\lambda - \alpha_k s)_+\)

decentralized:

  • links only need to know the flows that pass through them

  • flows only need to know prices on link they pass through

Generating feasible flow rates:

  • iterates can be (and often are) infeasible, i.e., \(Rf not \preceq c\) (but we do have \(Rf \preceq c\) in the limit)

  • define \(\eta_i = t_i/c_i = (Rf)_i /c_j\)

  • \(\eta_i < 1\) means link \(i\) is under capacity

  • \(\eta_i > 1\) means link \(i\) is over capcacity

  • define \(f^{feas}\) as

Convergence

  • \(n=10\) flows, \(m=12\) links: 3 or 4 links per flow

  • link capacities chosen randomly, uniform on [0,1, 1]

  • \(U_j(f_j) = \log f_j\)

  • optimal flow as a function of price: \(\bar{f}_j=argmax ...\)

  • initial prices: \(\lambda = 1\)

  • constant stepsize \(\alpha_k = 3\)


Methods for nonconvex optimization problems#

Convex optimization algorithms are in general

  • global (find global minimum)

  • fast (run in polynomial-time)

For general nonconvex problems, we have to give up one

  • local optimization methods are fast, but may not find global minimum (and even when they do, cannot certify it)

  • global optimization methods find global minimum (and certify it), but are often

Branch and bounds#

  • methods for global optimization for nonconvex problems

  • basic idea

  • partition feasible set into convex sets, and find lower/upper bounds for each

  • maintain global lower and upper bounds: quit if they are close enough to each other

  • else, refine partitino and repeat

  • Often slow; exponential

find global minimum of function \(f: R^m \to R\) over a \(m\)-dimensional rectangle \(Q_{init}\), to within some preset accuracy \(\epsilon\)

algorithm sketch:

  • computer lower and upper bounds on \(f^*\)

  • partition (split) \(Q_{init}\) into two rectangles \(Q_{init}=Q_1\cup Q_2\)

  • compute

  • update lower and upper bounds on \(f^*\) ….

  • refine partition, by splitting \(Q_1\) or \(Q_2\), and repeat step 3 and 4

Sequential convex programming (SCP)#

  • A local optimization method for nonconvex problems based on solving a sequence of convex problems

  • SCP is a heuristic

  • SCP often works well, i.e., finds a feasible point with good, if not optimal, objective value

Basic idea: trust region

For differentiable functions

  • Use first-order Taylor expansion for the affine approximation

  • Use (convex part of) second-order Taylor expansion for the convex approximation

Affine and convex approximation#