Algorithms#
Interior-Point methods#
basic ideas#
move inequality constraint to objective function via indicator functions
\(\min f_0(x) + \sum_{i=1}^m I_- (f_i(x))\)
s.t. \(Ax=b\)
where \(I_-(u)=0\) if \(u\le 0\), \(I_-(u)=\infty\) otherwise (indicator function of \(R_-\)).
Logarithmic barrier function#
approximation via logarithmic barrier: fix some \(t>0\)
\(\min f_0(x) - (1/t) \sum_{i=1}^m \log (-f_i(x))\)
s.t. \(Ax=b\)
\(\phi(x)=-\sum_{i=1}^m \log(-f_i(x))\), \(dom \phi = \{x\mid f_1(x) \le 0, \dots, f_m(x)<0 \}\)
convex (follows from composition rules)
twice continuously differentiable, with
gradient Hessian
becomes
\(\min f_0(x) - (1/t)\sum \log(-f_i(x))\)
s.t. \(Ax=b\)
difficult to minimize using Newton’s method (from a random starting point) when t is large
because Hessian varies rapidly near boundary of feasibility set
can be circumvented by solving a sequence of problems with increasing \(t\)
startnig each Newton minmization from the solution to the problem with previous \(t\)
Central path#
central paths: \(\{ x^*(t) \mid t>0 \}\)
\(x^*(t)\): central points
example: central path for an LP
\(\min c^T x\)
s.t. \(a_i^T x \le b_i\), \(i=1,\dots,6\)
hyperplane \(c^T x = c^T x^*(t)\) is tangent to level curve of \(\phi\) through \(x^*(t)\)
take the central path through interior of the feasible set
analytic center of a set of convex inequalities and linear equations
\(f_i(x) \le 0\), \(i=1,\dots,m\), \(Fx=g\)
is defined as the optimal point of
\(\min -\sum_i^m \log(-f_i(x))\)
s.t. \(Fx=g\) analytic center of linear inequalities \(a_i^T x\le b_i\), \(i=1,\dots,m\)
\(x_{ac}\) is minimizer of \(\phi(x) = -\sum_i^m \log(b_i-a_i^T x)\)
Barrier method (one interior-point method)#
given strictly feasible \(x\), \(t:=t^{(0)}>0\), \(\mu>1\), tolerance \(\epsilon > 0\). repeat
Centering step. Compute \(x^*(t)\) by minimizing \(t f_0 + \phi\), subject to \(Ax=b\)
Update \(x:=x^*(t)\)
Stopping criterion. quit if
increase \(t\). \(t:=\mu t\)
choice of \(\mu\) involves a trade-off: large \(\mu\) means fewer outer interations, more inner (Newton) iterations; typical values: \(\mu=10-20\) (for more practical choices of parameters, pp.570, textbook)
dual points from central path#
every \(x^*(t)\) corresponds to a dual feasible point (of the original inequality constrained problem)
\(\lambda_i^*(t) = 1/(-tf_i(x^*(t)))\) and \(\nu^*(t) = w/t\)
verification: \(x^*(t)\) solves
\(\min t f_0(x) + \phi(x)\)
s.t. \(Ax=b\)
implies
\(Ax^* = b\), \(f_i(x^*) < 0\), \(i=1, \dots, m\)
\(\exists w\), \(t\nabla f_0(x^*) + \sum \frac{1}{-f_i(x^*)} \nabla f_i(x^*) + A^T w=0\)
implies
\(x^*(t)\) minimizes the Lagrangian (of the
original problem)\(L(x,\lambda^*(t), \nu^*(t))=f_0(x) + \sum \lambda_i^*(t) f_i(x) + \nu^*(t)^T (Ax-b)\)
at: \(\lambda_i^*(t) = 1/(-t f_i(x^*(t)))\) and \(\nu^*(t) = w/t\)
Duality gap \(m/t\): \(f_0(x^*(t)) \ge p^* \ge d^* \ge g(\lambda^*(t), \nu^*(t)) = L(x^*(t), \lambda^*(t),\nu^*(t))=f_0(x^*(t))-m/t\)
Interpretation via KKT condition#
\(x=x^*(t)\), \(\lambda=\lambda^*(t)\), \(\nu=\nu^*(t)\) satsify
primal constraints: \(f_i(x)\le 0\), \(i=1,\dots,m\), \(Ax=b\)
dual constraints: \(\lambda\succeq 0\)
approximate complementary slackness:
Convergnce#
The number of steps to converge within tolerance \(\epsilon\):
plus the initial centering step (to compute \(x^*(t^{(0)})\))
Example: geometric program
Basic phase I method#
\(\min (over x,s) s\)
s.t. \(f_i(x)\le s, i=1,\dots,m\)
\(Ax=b\)
if \(x\), \(s\) feasible, with \(s<0\), then \(x\) is strictly feasible for (1)
if optimal value \(p^*\) of (2) is positive, then problem (1) is feasible
if \(p^* =0\) and attained, then problem (1) is feasible (but not strictly); if \(p^* =0\) and not attained, then problem (1) is infeasible
Sum of infeasibilities phase I method
\(\min 1^T s\)
s.t. \(s\succeq 0\), \(f_i(x)\le s_i\), \(i=1,\dots,m\)
\(Ax=b\)
The optimal value of (3) is zero and achieved iff problem (1) is feasible
Phase I via Newton method#
Primal-dual interior-point method#
Update both primal and dual variables in the Newton method to solve the modified KKT conditions
no distinction between inner and outer iterations
the primal iterates are not necessarily feasible
\(\hat{\eta}(x,\lambda) = -f(x)^T \lambda\): surrogate duality gap
backtracking line search based on the norm of the residuals
backtracking line search#
based on the norm of the residuals, modified to ensure \(f(x)\prec 0\), \(\lambda \succeq 0\).
compute the largest positive step length that does not exceed one and gives \(\lambda^+\succeq 0\):
Subgradient Method#
\(g\) is a subgradient of \(f\) (not necessarily convex) at \(x\) if \(f(y)\ge f(x) + g^T (y-x) \forall y\)
if \(f\) is convex and differentiable, \(\nabla f(x)\)
Example 38 (subgradient)
\(f=\max\{f_1,f_2\}\) with \(f_1\), \(f_2\) convex and differentiable
\(f_1(x_0)> f_2(x_0)\): unique subgradient \(g=\nabla f_1(x_0)\)
\(f_2(x_0)> f_1(x_0)\): unique subgradient \(g=\nabla f_2(x_0)\)
\(f_1(x_0) = f_2(x_0)\): subgradient form a line segment \([\nablaf_1(x_0),\nabla f_2(x_0)]\)
Subdifferential#
set of all subgradients of \(f\) at \(x\) is called the subdifferential of \(f\) at \(x\), denoted \(\partial f(x)\)
\(\partial f(x)\) is a closed convex set (can be empty)
if \(f\) is convex,
\(\partial f(x)\) is nonempty, for \(x\in dom f\)
\(\partial f(x) = \{ \nabla f(x) \}\), if \(f\) is differentiable at \(x\)
if \(\partial f(x) = \{ g \}\), …
recall for \(f\) convex, differentiable \(f(x^*)=\inf_x f(x) \Longleftrightarrow 0 = \nabla f(x^*)\)
Optimality condition for unconstrained problem#
\(\min f_0(x)\)
s.t. \(f_i(x) \le 0, i\in [m\)
we assume
\(f_i\) convex, defined on \(R^n\) (hence subdiffereentiable)
strict feaisbility (Slater’s condition)
\(x^*\) is primal optimal (\(\lambda^*\) is dual optimal) iff
\(f_i(x^*)\le 0\), \(\lambda^*_1 \le 0\)
..
..
the algorithm is similar to …
Step size#
step size rules (step sizes are fixed before algorithm execution)
constant stetp size
constant step length
square summable
Convergence#
assumption
\(f^*= \inf_x f(x) > -\infty\) , with \(f(x^*)=f^*\)
\(||g||_2 \le G \forall g\in \partial f\)
\(R\ge ||x^{(1)} - x^*||_2\)
convergence results: define \(\bar{f}=\lim_{k\to \infty} f_{best}^{(k)}\)
constant step size: \(\bar{h} - f^* \le G^2 \alpha/2\), i.e., converges to \(G^2\alpha/2\)-suboptimal (converges to \(f^*\) if \(\alpha\) small enough)
constant step length: \(G\gamma /2\)-suboptimal
Example 39
minimize \(f(x) = \max_{i=1,\dots,m} (a_i^T x + b_i)\)
Subgradient method for constrained problems#
Projection#
projection: \(s=argmin_{s\in C} ||x-s||_s\)
example: linear equality constrained problem
\(\min f(x)\)
s.t. \(Ax=b\)
projection of \(z\) onto \(\{ x\mid Ax=b \}\) is
Example 40 (Linear equality constrained problem)
minimize \(||x||_1\)
s.t. \(Ax = b\)
subgradient of objective is \(g= sign(x)\)
Projected subgradient method for dual problem#
(convex) primal: (Slater’s condition holds)
minimize \(f_0(x)\)
s.t. \(f_i(x) \le 0 \)
solve dual problem
maximize \(g(\lambda)\)
s.t. \(\lambda \succeq 0\)
via projected subgradient method: \(\lambda^{(k+1)} =... \)
Given a starting point \(\lambda\)…. Repeat
iterpretation:
\(\lambda_i\) is price for resource \(f_i(x)\)
price update \(\lambda_i^{(k+1)}=\qty(\lambda_i^{(k)}+\alpha_k f_i(x^{(k)}))_+\)
increase price \(\lambda_i\) if resource \(i\) is over-utilized (i.e., \(f_i(x) >0\))
decrease price \(\lambda_i\) if … is under-utilizedo
not negative
Example 41
minimize strictly convex quadratic \((P\succ 0)\) over unit box:
minimize \((1/2)x^T P x -q^T x\)
s.t. \(x_i^2 \le 1\)
\(L(x,\lambda) = (1/2)x^T (P+diag(2\lambda))x - q^T x - 1^T \lambda\)
\(x^*(\lambda) = (P+diag(2\lambda))^{-1} q\)
projected
Cutting plane oracle#
Oracle: a device that can answer question for us; we make no assumption on how an answer is found by the device
Cutting plane oracle (also called separation oracle), once queried at \(x\), either
asserts \(x\in X\) (\(X\subseteq R^n\))
or return a separating hyperplane between \(x\) and \(X\): \(a\neq 0\), \(a^T z \le b\) for \(z\in X\), \(a^T x \ge b\) \((a,b)\) called a cutting-plane, or cut, since it eliminates the halfspace \(\{ z\mid a^T z > b \}\) from our search for a point in \(X\)
neutral cut
deep cut
unconstrained optimization
\(f_0:R^n\to R\), convex; \(x^*\) is the optimal solution; \(X\) is the set of optimal points
Given \(x\), find \(g\in \partial f_0 (x)\)
\(x^*\) belongs to the halfspace: \(\{ z\mid g^T(z-x) \le 0 \},\forall x\)
\(g^T(z-x)\le 0\) defines a
cutting planeat \(x (a=g,b=g^Tx)\)
inequality constrained optimization#
If \(x\) is feasible, we have a (neural) objective cut \(g_0^T(z-x)\le 0\), \(g_0\in\partial f_0(x)\)
If \(x\) is not feasible, e.g., \(f_j(x)>0\), we have a (deep) feasibility cut \(f_j(x) + g_j^T (z-x) \le 0\), \(g_j \in \partial f_j(x)\)
basic (conceptual) cutting-plane/localzation algorithm
choosing the center point (specific localization methods)#
center of gravity
analytic centering
Chebyshev center
Center of the maximum volume epplisoid (MVE)
Bisection method on R#
minimize convex \(f:R\to R\)
\(P_k\) is an interval
obvious choice for query point: \(x^{(k+1)}:=midpoint(P_k)\)
Bisection algorithm for one-dimensional search
given an initial interval \([l,u]\) known to contain \(x^*\); a required tolerance \(r>0\) repeat - \(x:=(l+u)/2\). - query the oracle at \(x\). - if the oracle determines that \(x^*\le x, u:=x\) - if the oracle determines that \(x^*\ge x, l:=x\) until \(u-l\e 2r\)
for differentiable \(f\): evaluate \(f'(x)\) if \(f'(x)<0\), \(l:=x\); else \(u:=x\)
Ellipsoid method (for unconstrained convex problems)#
Idea: Use ellipsoids to approximate polyhedron, find \(x^*\) in an ellipsoid instead of a polyhedron
Algorithm sketch:
at iteration \(k\) we know
set
hence we know
Basic ellipsoid algorithm (for unconstrained convex problems)
given ellipsoid \(E(x,P)\) containing
\(\sqrt{g^T P g}\)
For inequaltiy constrained problems
If \(x^{(k)}\) feasible, update ellipsoid with
objectivecut
a (deep) cut
If \(x^{(k)}\) infeasible, update ellipsoid with
feasiblecut
minimum volume ellipsoid containing ellipsoid intersected with halfspace
where \(\tilde{g} = \frac{g}{\sqrt{g^T P g}}\).
Stopping criteria
if \(x^{(k)}\) is feasible and \(\sqrt{g_0^{(k)T} P^{(k)} g_0^{(k)} }\le \epsilon\)
if \(f_j(x^{(k)}) - \sqrt{g_j^{(k)T} P^{(k)} g_j^{(k)} } > 0\)
Decomposition methods#
Break a problem into smaller ones and solve each of the smaller ones separably, either in parallel or sequentially
Separable problems#
minimize \(f_1(x_1)+ f_2(x_2)\)
s.t. \(x_1\in C_1\), \(x_2\in C_2\)
we can solve for \(x_1 and \)x_2$ separately (in parallel)
consider uncontrained problem,
minimize \(f(x) = f_1(x_1,y) + f_2(x_2,y)\), \(x=(x_1,x_2,y)\)
\(y\) is the complicating variable or coupling variable; when it is fixed the problem is separable in \(x_1\) and \(x_2\)
Primal decomposition method#
fix \(y\) and define
subproblem1: \(\min_{x_1} f_1(x_1,y)\)
subproblem2: \(\min_{x_2} f_1(x_2,y)\)
with optimal values \(\phi_1(y)\) and \(\phi_2(y)\)
master problme \(\min_{y} \phi_1(y) + \phi_2(y)\)
can solve master problem using
bisection (if \(y\) is scalar) 2 gradient or Newton method (if \(\phi_i\) differentiable) 3 subgradient, cutting-plane, or ellipsoid method
each
Algorithm sketch
Given \(y^{(0)}\), \(k=0\);
Repeat
solve two subproblems to derive \(x_1\), \(x_2\)
update \(y^{(k)}\) to \(y^{(k+1)}\) based on the algorithm to solve the master problem
Dual decomposition method#
Step 1:
Step 2: form dual problem
Dual decomposition algorithm
(using subgradient algorithm for master)
repeat
solve the dual subproblems (in parallel) find
Update dual variables (prices). \(\nu:=\nu - \alpha_k(y_2-y_1)\)
step lenght \(\alpha_k\) can be chosen in standard ways
at each step we have a lowe bound \(g(\nu)\) on \(p^*\)
iterates are generally infeasible, i.e., \(y_1\neq y_2\)
Finding feasible iterate
reasonable guess of feasible point from \((x_1,y_1)\), \((x_2,y_2)\):
a better feasible point:
Problems with complicating constraints#
\(\min f_1(x_1) + f_2(x_2)\)
s.t. \(x_1\in C_1\), \(x_2\in C_2\), \(h_1(x_1)+h_2(x_2)\preceq 0\)
\(f_i, h_i, C_i\) convex
can interpret coupling constaints as
limits on resourcesshared between two subproblems
fix \(t\in R^p\) and define
\(t\) is the quantity of resources allocated to first subproblem (\(-t\) is allocated to second subproblem)
master problem: minimize \(\phi_1(t) + \phi_2(t)\) (optimal values of subproblems) over \(t\)
subproblems can be solve separately (in parallel) when \(t\) is fixed
Lagrangian relaxation and subgradient method
\(\alpha_k\) is an appropriate step size
iterates need not be feasible
can again construct feasible primal variables using projection
Example 42
Problem setup
\(n\) flows, with fixed routes, in a network with \(m\) links
variable \(f_i\ge 0 \) denotes the rate of flow \(j\)
flow utility is \(U_j: R\to R\), strictly concave, increasing
traffic \(t_i\) on link \(i\) is the sum of flows passing through it
\(t=Rf\), where \(R\) is the routing matrix \(R_{ij} = 1\) flow j passes over link \(i\), 0 otherwise
link capacity constraint: \(t\preceq c\)
Rate control problem:
maximize \(U(f) = \sum_j^n U_j(f_j)\)
s.t. \(Rf \preceq c\)
convex problem
dual decomposition gives decentralized method
Lagrangian: \(L(f,\lambda) = -\sum_j^n U_j(f_j) + \lambda^T (Rf - c)\)
Lagrangian dual
dual problem: \(\max -\lambda^T c - \sum_{j=1}^n (-U_j)^* (-r_j^T \lambda)\)
s.t. \(\lambda \succeq 0\)
a subgradient of \(-g\) is given by \(c-R\bar{f}\), where \(\bar{f}_j\) is a solution of the subproblem \(\min_{f_j} -U_j(f_j)+(r_j^T \lambda) f_j\)
algorithm
given
repeat
Sum link prices along each route. Calculate \(\Lambda_j = r_j^T \lambda\).
Optimize flows (separately) using flow prices. \(f_j=argmax(U_j(f_j)-\Lambda_j f_j)\)
Calculate link capacity margins: \(s:=c- Rf\)
update link prices \(\lambda: = (\lambda - \alpha_k s)_+\)
decentralized:
links only need to know the flows that pass through them
flows only need to know prices on link they pass through
Generating feasible flow rates:
iterates can be (and often are) infeasible, i.e., \(Rf not \preceq c\) (but we do have \(Rf \preceq c\) in the limit)
define \(\eta_i = t_i/c_i = (Rf)_i /c_j\)
\(\eta_i < 1\) means link \(i\) is under capacity
\(\eta_i > 1\) means link \(i\) is over capcacity
define \(f^{feas}\) as
Convergence
\(n=10\) flows, \(m=12\) links: 3 or 4 links per flow
link capacities chosen randomly, uniform on [0,1, 1]
\(U_j(f_j) = \log f_j\)
optimal flow as a function of price: \(\bar{f}_j=argmax ...\)
initial prices: \(\lambda = 1\)
constant stepsize \(\alpha_k = 3\)
Methods for nonconvex optimization problems#
Convex optimization algorithms are in general
global (find global minimum)
fast (run in polynomial-time)
For general nonconvex problems, we have to give up one
local optimization methods are fast, but may not find global minimum (and even when they do, cannot certify it)
global optimization methods find global minimum (and certify it), but are often
Branch and bounds#
methods for global optimization for nonconvex problems
basic idea
partition feasible set into convex sets, and find lower/upper bounds for each
maintain global lower and upper bounds: quit if they are close enough to each other
else, refine partitino and repeat
Often slow; exponential
find global minimum of function \(f: R^m \to R\) over a \(m\)-dimensional rectangle \(Q_{init}\), to within some preset accuracy \(\epsilon\)
algorithm sketch:
computer lower and upper bounds on \(f^*\)
partition (split) \(Q_{init}\) into two rectangles \(Q_{init}=Q_1\cup Q_2\)
compute
update lower and upper bounds on \(f^*\) ….
refine partition, by splitting \(Q_1\) or \(Q_2\), and repeat step 3 and 4
Local search#
A heuristic method to solve computationally hard problems
moves from solution to solution in the space of candidate solutions (i.e., search space) by applying local changes, until a solution deemed optimal is found or a bound on the number of steps /time taken is reached
Problems that local search has been applied for
minimum vertex cover problem
TSP
Boolean SAT problem
how to choose a good neighbourhoood for the problem
guided by intuition
very little theory avaiable as a guide
which solution in the neighborhood to move to
decided using only information in the neighborhood
can get stuck in a local optimal point
Sequential convex programming (SCP)#
A local optimization method for nonconvex problems based on solving a sequence of convex problems
SCP is a heuristic
SCP often works well, i.e., finds a feasible point with good, if not optimal, objective value
Basic idea:
trust region
For differentiable functions
Use first-order Taylor expansion for the affine approximation
Use (convex part of) second-order Taylor expansion for the convex approximation