SDP and Duality#
Duality#
Optimization in standard form (not necessarily convex)
minimize \(f_0(x)\)
s.t. \(f_i\le 0\), \(h_i(x)=0\)
Lagrangian#
\(L: R^n\times R^m \times R^p \to R\) with \(dom (L) \ D \times R^m \times R^p\) \(L(x,\lambda,\nu)= f_0(x) + \sum_i^m \lambda_i f_i(x) + \sum_i^p \nu_i h_i\)
Lagrange multipliers (Lagrangian dual variables): \(\lambda_i\), \(\nu_i\)
augment \(f_0(x)\) by weighted sum of contraint functions (penalty functions)
Lagrange dual function: \(g:R^m\times R^p \to R\), $\( g(\lambda,\nu) = \inf L(x,\lambda,\nu) \)$
Lagrange dual problem
\(\max g(\lambda, \nu)\)
\(\lambda \prec 0\)
LP and its dual#
standard form LP#
minimize \(c^T x\)
s.t. \(Ax = b\), \(x \succeq 0\)
Lagrangian#
\(L(x,\lambda,\nu) = c^T x + v^T (Ax-b) - \lambda^T x = -b^T v + (c+A^T v- \lambda)^T x\)
Lagrange dual function#
\(g(\lambda,\nu) = \inf_{x} L(x,\lambda,\nu)=-b^T v, A^Tv-\lambda+c=0; -\infty, otherwise\)
Dual problem#
maximize \(-b^T v\)
s.t. \(A^T v + c \preceq 0\)
Lagrange dual and conjugate function#
optimization with affine inequality and equality constraints
minimize \(f_0(x)\)
s.t. \(Ax \preceq b\), \(C x = d\)
Lagrange dual function \(g(\lambda,v) = \inf_x f_0(x) + (A^T \lambda + C^T v)^T x - b^T \lambda -d^T v=-f_0^*(-A^T \lambda - C^T v) - b^T \lambda -d^T v\)
recall definition of conjugate \(f^*(y)=\sup_x (y^T x - f(x))\)
simplifies derivation of dual if conjugate of \(f_0\) is known \(f_0(x) = \sum_i^n x_i \log x_i\), \(f_0^*(y) = \sum_i^n e^{y_i-1}\)
Example 27 (driving the dual problem)
\(\min x_1^2 + x_2^2\)
s.t \(3x_1 + 5x_2 = 7\), \(6x_1 + 4x_2 \le 9\)
Lagrangian: \(L(x,\lambda,\mu) = x_1^2 + x_2^2 + \lambda(6x_1 + 4x_2-9)+\mu(3x_1+5x_2 -7)\)
Lagrange dual function:
\(g(\lambda,\mu) = \inf_{x\in R^2} x_1^2 + x_2^2 + \lambda(6x_1 + 4x_2 - 9) + \mu (3x_1 + 5x_2 - 7) = -13 \lambda^2 - 8.5 \mu^2 - 19\lambda \mu -9\lambda -7\mu\)
dual problem: \(\max-13\lambda^2 - 8.5 \nu^2 ...\)
Primal and Lagrange dual problems#
Primal problem (convex or non-convex)
Lagrange dual problem (is a
convexoptimization problem)
Property 1: \(g(\lambda ,\nu)\) is concave on \((\lambda,\nu)\) even if the original problem is not convex
Proof: For each fixed \(x, g(\lambda, \nu)\) is … pointwise infimum … concave
Example 28 (Max-Cut)
variable:
adjacency matrix of the graph \(Q=1, \{i,j\}\in E; 0, OW\)
a cut decided by a vector \(x\in R^n\), \(x_i = 1, i\in S; -1, OW\)
capacity of the cut \(c(x) = \frac{1}{4} \sum_i^n\sum_j^n (1-x_ix_j)Q_{ij}\) (\(1-x_ix_j=2\) if \(\{i,j\}\) is in the cut set) \((x_i-x_j/2)^2\)
primal problem (NP Complete even if \(Q\succeq 0\)):
minimize \(x^T Q x\)
s.t. \(x\in \{-1, 1\} \forall i=1, \dots, n\)
The maximum cut is \(c_\max = \frac{1}{4} \sum_i\sum_j Q_{ij} - \frac{1}{4} p^*\)
Dual problem(SDP) let \(\Lambda=diag(\lambda_1,\dots, \lambda_n)\), the Lagrangian is \(L(x,\lambda) = x^T Q x - \sum_i^n \lambda_i (x_i^2 - 1) = x^T (Q-\Lambda) x + Tr(A)\). the dual ismaximize \(tr(\Lambda)\)
s.t. \(Q-\Lambda\succeq 0\)
Property 2: \(g(\lambda,\nu)\le f_0(x)\) for every primal feasible x and dual feasible \((\lambda,\nu)\)
Remarks:
\(f_0(x) - g(\lambda,\nu)\):
duality gapfor \((x,\lambda,\nu)\), which is always non-negative \(p^*-d^*\): optimal duality gap\(d^*\le p^*\):
weak dualityfor any optimization problem: convex or not can be used to find a non-trivial lower bound on \(p^*\), for difficult primal problems
any dual feaisble solution gives a lower bound on primal objective value:
for any feasible \(x\)
\(x^T Qx \ge x^T \Lambda x = \sum_i \Lambda_{ii} x_i^2 = Tr(\Lambda)\)
Strong duality#
strong duality: \(d^* = p^*\)
does not hold in general
(usually) holds for convex problems
conditions that guarantee strong duality in convex problems are called constraint qualifications
Slater’s constraint qualification#
strong duality holds for a convex problem
minimize \(f_0(x)\)
s.t. \(f_i(x)\le 0\), \(Ax=b\)
if it is strictly feasible, i.e., \(\exists x \in int(D): f_i(x) < 0, i=1,\dots,m, Ax = b\).
Can be refined: linear inequalities do not need to hold with strict inequality
If an LP (in any form) has an optimal solution \(x^*\), then the dual also has an optimal solution \(y^*\) and \(C^T x^* = b^T y^*\)
Geometric interpretation#
interpretation of dual function: \(g(\lambda) = \inf_{(u,t)\in\mathcal{G}} (t + \lambda u)\), where \(\mathcal{G}=\{(f_1(x), f_0(x)) \}\)
\(\lambda u + t = g(\lambda)\) is (non-vertical) supporting hyperplane to \(\mathcal{G}\) which intersects t-axis at \(t=g(\lambda)\)
epigraph variation: same interpretation if \(\mathcal{G}\) is replaced with \(\mathcal{A}=\{ (u,t) \mid f_1(x)\le u, f_0(x)\le t \text{ for some } x\in D \}\)
strong duality
holds if there is a non-vertical suporting hyperplane to \(\mathcal{A}\) at \((0,p^*)\)
for convex problem, \(\mathcal{A}\) is convex, hence has supp. hyperplane at \((0,p^*)\)
Salter’s condition: if there exist \((\tilde{u},\tilde{t})\in\mathcal{A}\) with \(\tilde{u}<0\), then supporting hyperplanes at \((0,p^*)\) must be non-vertical
Sensitivity interpretation#
(unperturbed) optimization problem and its dual
minimize \(f_0(x)\)
s.t. \(f_i(x)\le 0\), \(h_i(x)=0\)
dual: \(\max g(\lambda,\nu)\) s.t. \(\lambda \succeq 0\)
perturbed problem and its dual
\(\min f_0(x)\)
s.t. \(f_i(x)\le u_i\), \(h_i(x) = v_i\)
dual: \(\max g(\lambda, \nu) - u^T\lambda - v^T \nu\) s.t. \(\lambda \succeq 0\)
\(p^*(u,v)\) is optimal value as a function of \(u\), \(v\)
local sensitivity: if strong duality holds and \(p^*(u,v)\) is differentiable at \((0,0)\) \(\lambda^*_i = - \frac{\partial p^*(0,0)}{\partial u_i}\), \(\nu^*_i = - \frac{\partial p^*(0,0)}{\partial v_i}\)
Saddle-point interpretation#
Assume no equality constraints (results can be easily extended) \(p^*=\inf_x \sup_{\lambda\succeq 0} L(x,\lambda)\), \(d^*=\sup_{\lambda\succeq 0} \inf_x L(x,\lambda)\)
weak duality: \(\sup_{\lambda\succeq 0} \inf_x L(x,\lambda) \le \inf_x \sup_{\lambda\succeq 0} L(x,\lambda)\)
strong duality: \(\sup_{\lambda\succeq 0} \inf_x L(x,\lambda) = \inf_x \sup_{\lambda\succeq 0} L(x,\lambda)\)
Max-min inequalitygenerally holds: \(\sup_{z\in Z} \inf_{w\in W} f(w,z) \le \inf_{w\in W} \sup_{z\in Z} f(w,z) \forall f,W,Z\)Strong max-min property (
saddle point)
saddle-pint for \(f\): a pair \(\tilde{w} \in W\), \(\tilde{z}\in Z\) that satisfy \(f(\tilde{w},z) \le f(\tilde{w},\tilde{z})\le f(w, \tilde{z}), \forall w\in W, z\in Z\) \(\Longleftrightarrow f(\tilde{w},\tilde{z}) = \inf_{w\in W} f(w,\tilde{z}), f(\tilde{w},\tilde{z}) = \sup_{z\in Z} f(\tilde{w}, z)\) \(\Longleftrightarrow strong max-min property holds\)
\(x^*, \lambda^*\) are primal and dual optimal points and strong duality holds iff \((x^*,\lambda^*)\) form a saddle-point for the Lagrangian
Game interpretation#
player x choose strategy from \(1,2,\dots, n\)
player y choose strategy from \(1,2,\dots, n\)
\(P_{ij}\) is the amount x pays to y (payoff) when \(x\) plays strategy i, y plays startegy \(j\)
mixed strategy:
\(u_i\): prob (player x chooses strategy \(i\))
\(y_i\): prob (player y chooses strategy \(i\))
expected payoff: \(u^T P v\)
Suppose \(x\) fixes strategy \(u\), then \(y\) plays (decides \(v\)) to maximize expected payoff:
so x must choose u to minimize \(\max_i (P^T u)_i\): \(\min_u \max_i (P^T u)_i\) s.t. \(\sum_i^n u_i = 1\) and \(u\succeq 0\) equivalent to? \(\min t\) s.t. \(P^T u \preceq t 1\), \(u^T 1 = 1\) and \(u\preceq 0\)
LP(1) with optimal value: \(p_1^*\)
Suppose \(y\) plays first (v given), then x choose u to minimize expected payoff: \(\min u^T P v\), s.t. \(\sum_i^n u_i =1\) and \(u \preceq 0\)
LP(2) with optimal value: \(p_2^*\)
LP(1) and LP(2) are duals of each other and thus have the same optimal values: \(p_1^* = p_2^*\)
Therefore, there is no advantage to play second …
Consider payoff function \(f(u,v) = u^T P v\), the optimum \(u^*\) for LP(1) and the optimum \(v^*\) for LP(2) form a saddle-point for \(f(u,v)\) \(f(u^*,v) \le f(u^*,v^*) \le f(u,v^*)\) equivalent to \(f(u^*,v^*)=\inf_u f(u,v^*)\) and \(f(u^*,v^*) = \sup_v f(u^*,v)\)
Nash equilibrium of the game: \((u^*,v^*)\) such that
\(u^*\) is the best response of player x with respect to \(v^*\)
\(v^*\) is the best response of player y with respect to \(u^*\)
Usage example of duality#
Duality gives a way to analytically solve an optimization problem
Example 29
\(\min ||x||_2^2\) s.t. \(Ax = b\)
Lagrangian: \(L(x,\nu)=x^T x + \nu^T (Ax - b)\) Lagrange dual function: \(g(v)=\min x^T x + \nu^T A x - \nu^T b\)
then \(g(\nu)=-\frac{1}{4} \nu^T A A^T \nu - \nu^T b\)
Dual problem: \(\max_\nu - \frac{1}{4} \nu^T A A^T \nu - \nu^T b\) \(\mu = -2 (AA^T)^{-1} b\) and \(x=A^T ...\)
derivative is the transpose of gradient
Duality is useful to find non-trivial lower bounds to non-convex minimization problems
Example 30
\(\min x^T W x, W \succeq\) s.t. \(x_i \in \{ -1, 1 \}\).
Relaxation: \(\min x^T W x\) s.t. \(-1 \le x_i \le 1\)
use duality t oobtain better lower bound (as on page15 of )
Non-convex problem with strong duality#
strong duality not implies convex problem
Example 31 (Non-convex problem with strong duality)
\(\min x^T A x\), \(A \in S^n\), s.t. \(x^T x = 1\).
solution: \(\min_{||x||_2 = 1} x^T A x = \text{min eigval of } A \)
let \(x=\sum_i^n \alpha_i v_i\) (linear combination of orthogonal eigenvectors of \(A\)) \(v_i\) is an eigenvector corresponding to eigenvalue \(\lambda_i\): \(A v_i = \lambda_i v_i, ||v_i||_2 = 1\). we have \(x^T A x = (\sum_i \alpha_i v_i)^T A (\sum_i \alpha_i v_i) = \sum_i \alpha_i^2 \lambda_i ||v_i||_2^2 = \sum_i^n \alpha_i^2 \lambda_i \) since \(x^T x = 1\), we know \(\sum_i \alpha_i^2 = 1\).
Therefore, \(x^T A x\) is minimized when \(\alpha_j =1\) for \(\lambda_j = \min(\lambda_1,\dots,\lambda_j)= ...\)
Derive the dual problem
Lagrangian: \(L(x,\nu)=x^T A x + \nu (x^T x - 1)\)
Lagrangian dual function: \(g(\nu) = \min_x L(x,\nu)= \min_x x^T (A+\nu I) x - \nu = -\nu, \text{if } A+\nu I \succeq 0; -\infty, \text{ otherwise } \)
dual problem: \(\max -\nu\) s.t. \(A + \nu I \succeq 0\)
Therefore, the largest \(-\nu\) to make \(A + \nu I \succeq 0\) is \(\min\{\lambda_1, \dots, \lambda_n\}\)
Complementary slackness for LP#
Primal: \(\min c^T x\) s.t. \(Ax \succeq b\), \(x\succeq 0\)
Dual: \(\max b^T y\) s.t. \(A^T y \preceq c\), \(y \preceq 0\)
Theorem 9 (Complementary slackness theorem for LP)
A pair of feasible solutions \(x\in R^n\) and \(y\in R^m\) for primal and dual LP problems is optimal iff \(y_i (b_i - (Ax)_i)=0, \forall i = 1, \dots, m\) \(x_i (c_j - (A^T y)_j)=0, \forall j = 1, \dots, m\)
Proof of sufficiency: Given \(x, y\) satisfy complementary slackness
Karush_Kuhn-Tucker (KKT) conditions#
the following four conditions are called KKT conditions (for a problem with differentiable \(f_i\), \(h_i\)): \(\min f_0(x)\) s.t. \(f_i(0)\)
primal constraints:
dual constraints:
complementary slackness: \(\lambda_i f_i(x)=0\), \(i=1,\dots,m\)
gradient of Lagrangian with respect to \(x\) vanishes:
For any optimization problem with differentiable objective and constraint functions, if strong duality holds, any pair of primal and dual optimal points \(x^*, \lambda^*, \nu^*\) must satisfy the KKT conditions \(g(\lambda^*, \nu^*) = \inf_x L(x,\lambda^*,\nu^*) = \inf_x f_0(x) + \sum_i^m \lambda_i^* f_i(x) + \sum_i^p \nu_i^* h_i(x) = f_0(x^*)\)
KKT conditions for convex programs#
For a convex program, any points \(\tilde{x}\), \(\tilde{\lambda}\), \(\tilde{\nu}\) that satisfy the KKT conditions are primal and dual optimal, and have zero duality gap \(\tilde{\lambda_i} \ge 0\), \(\forall i \to L(x,\tilde{\lambda},\tilde{\nu})\) is convex in \(x\).
gradient of Lagrangian with respect to \(\tilde{x}\) vanishes \(\implies\) \(\tilde{x}\) minimizes \(L(x,\tilde{\lambda}, \tilde{\nu})\), then \(g(\tilde{\nu},\tilde{\nu})=\inf_x L(x,\tilde{\lambda},\tilde{\nu}=...\) since \(g(\tilde{\lambda},\)
If a convex program with differentiable objective and constraint functions satisfies Salter’s conditions, then KKT condition is necessary and sufficient for optimality
recall that Slater implies strong duality
Significance of KKT
solve KKT to derive
Example 32
\(\min x_1^2 + 2x_2^2\) s.t. \(x_1 + x_2 \ge 3\)
To solve this, try to find \((x^*,\lambda^*)\) that satisfy the KKT conditions
Example 33 (Water-filling)
\(\min \min - \sum_i^k \log (1 + p_i/N_i)\) s.t. \(\sum_i^k p_i \le P\), \(p_i\ge 0, \forall i = 1,\dots, k\)
the algorithm takes the intuition from KKT
Generalization: maxmimally allocate resource to pi with the current smallest marginal utility, until all resource P is used up.
equivalent formulation of a problem can lead to very different duals
reformulating
dual
dual
Example 34
Introducing new variables and equality constraints
minimize \(f_0(Ax + b)\)
dual function is constant: \(g=\inf_x L(x) = \inf_x f_0(Ax_b) = p^*\)
we have strong duality, but dual is quire useless
reformulated problem and its dual
\(\min f_0(y)\) s.t. \(Ax + b - y = 0\)
\(b^T \nu - f_0^*(\nu)\) s.t. \(\ge\)
Example 35
Making explicit constraint implicit
LP with box constraints: primal and dual problem:
\(\min c^T x\) s.t. \(Ax=b, -1\preceq x \preceq 1\)
\(\max -b^T \nu - 1^T \lambda_1 - 1^T \lambda_2\) s.t. \(c+A^T \nu + \lambda_1 - \lambda_2 =0, \lambda_1 \succeq 0\), \(\lambda_2 \succeq 0\)
reformulation with box constraints made implicit:
\(\minimize f_0(x) = c^Tx; \infty\) s.t.
dual function \(g(\nu) = \inf_{-1\preceq x \preceq 1} (c^T x + \nu^T(Ax - b))=-b^T\nu - \norm{A^T \nu + c}_1\)
Partial Lagrangian relaxation#
One can take Lagrangian with respect to only a subset of the constraints, e.g.,
If convex problem and Slater’s conditino is satisfied: \(\max_{\lambda \succeq 0} \tilde{g}(\lambda) = \min_{f_i(x)\le 0} f_0(x)\) let \(x^*\) be the primal optimal point, \((\lambda^*_1, \lambda^*_2)\) be dual optimal point for \(g(\lambda_1,\lambda_2)\) and \(\lambda_1'\) be dual optimal point for \(\tilde{g}(\lambda)\)
The aobve property extensible to general case, when the partial Lagrangian is derived by relaxing
Example 36
Transform objective function
\(\min ||Ax-b||\)
reformulation \(\min 1/2 ||y||^2\) s.t. \(y=Ax-b\) dual problem on page 257, textbook
Generalized inequality#
dual problem
\(\max g(\lambda_1, \dots. \lambda_m, \nu)\) s.t. \(\lambda_i \succeq_{K_i^*} 0, i=1,\dots,m\)
Properties:
lower bound property: if \(\lambda_i \succeq_{K_i^*} 0\), then \(g\)
weak duality …
strong duality …
KKT
Example 37 (Semidefinite program)
primal SDP \((F_i,G\in S^k)\):
\(\min c^T x\) s.t. \(x_1 F_1 + \cdots + x_n F_n \preceq G\)
Lagrange multiplier is matrix \(Z\in S^k\), define \(<A,B>=tr(AB)\)
Lagrangian \(L(x,Z) = c^T x + Tr(Z(x_1 F_1 + \cdots + x_n F_n -G) )\)
dual function \(g(Z)=\inf_x L(x,Z)=-tr(GZ), tr(F_i Z) + c_i = 0, \; i=1, \dots ,n; -\infty, otherwise\)
dual problem:
\(\max -tr(GZ)\) s.t \(Z\succeq 0\), \(tr(F_i Z) + c_i = 0\)