SDP and Duality#

Duality#

Optimization in standard form (not necessarily convex)

  • minimize \(f_0(x)\)

  • s.t. \(f_i\le 0\), \(h_i(x)=0\)

Lagrangian#

\(L: R^n\times R^m \times R^p \to R\) with \(dom (L) \ D \times R^m \times R^p\) \(L(x,\lambda,\nu)= f_0(x) + \sum_i^m \lambda_i f_i(x) + \sum_i^p \nu_i h_i\)

  • Lagrange multipliers (Lagrangian dual variables): \(\lambda_i\), \(\nu_i\)

  • augment \(f_0(x)\) by weighted sum of contraint functions (penalty functions)

Lagrange dual function: \(g:R^m\times R^p \to R\), $\( g(\lambda,\nu) = \inf L(x,\lambda,\nu) \)$

Lagrange dual problem

  • \(\max g(\lambda, \nu)\)

  • \(\lambda \prec 0\)

LP and its dual#

standard form LP#

  • minimize \(c^T x\)

  • s.t. \(Ax = b\), \(x \succeq 0\)

Lagrangian#

\(L(x,\lambda,\nu) = c^T x + v^T (Ax-b) - \lambda^T x = -b^T v + (c+A^T v- \lambda)^T x\)

Lagrange dual function#

\(g(\lambda,\nu) = \inf_{x} L(x,\lambda,\nu)=-b^T v, A^Tv-\lambda+c=0; -\infty, otherwise\)

Dual problem#

  • maximize \(-b^T v\)

  • s.t. \(A^T v + c \preceq 0\)

Lagrange dual and conjugate function#

optimization with affine inequality and equality constraints

  • minimize \(f_0(x)\)

  • s.t. \(Ax \preceq b\), \(C x = d\)

Lagrange dual function \(g(\lambda,v) = \inf_x f_0(x) + (A^T \lambda + C^T v)^T x - b^T \lambda -d^T v=-f_0^*(-A^T \lambda - C^T v) - b^T \lambda -d^T v\)

  • recall definition of conjugate \(f^*(y)=\sup_x (y^T x - f(x))\)

  • simplifies derivation of dual if conjugate of \(f_0\) is known \(f_0(x) = \sum_i^n x_i \log x_i\), \(f_0^*(y) = \sum_i^n e^{y_i-1}\)

Example 27 (driving the dual problem)

  • \(\min x_1^2 + x_2^2\)

  • s.t \(3x_1 + 5x_2 = 7\), \(6x_1 + 4x_2 \le 9\)

Lagrangian: \(L(x,\lambda,\mu) = x_1^2 + x_2^2 + \lambda(6x_1 + 4x_2-9)+\mu(3x_1+5x_2 -7)\)

Lagrange dual function: \(g(\lambda,\mu) = \inf_{x\in R^2} x_1^2 + x_2^2 + \lambda(6x_1 + 4x_2 - 9) + \mu (3x_1 + 5x_2 - 7) = -13 \lambda^2 - 8.5 \mu^2 - 19\lambda \mu -9\lambda -7\mu\)

dual problem: \(\max-13\lambda^2 - 8.5 \nu^2 ...\)

Primal and Lagrange dual problems#

  • Primal problem (convex or non-convex)

  • Lagrange dual problem (is a convex optimization problem)

Property 1: \(g(\lambda ,\nu)\) is concave on \((\lambda,\nu)\) even if the original problem is not convex

Proof: For each fixed \(x, g(\lambda, \nu)\) is … pointwise infimum … concave

Example 28 (Max-Cut)

variable:

  • adjacency matrix of the graph \(Q=1, \{i,j\}\in E; 0, OW\)

  • a cut decided by a vector \(x\in R^n\), \(x_i = 1, i\in S; -1, OW\)

  • capacity of the cut \(c(x) = \frac{1}{4} \sum_i^n\sum_j^n (1-x_ix_j)Q_{ij}\) (\(1-x_ix_j=2\) if \(\{i,j\}\) is in the cut set) \((x_i-x_j/2)^2\)

primal problem (NP Complete even if \(Q\succeq 0\)):

  • minimize \(x^T Q x\)

  • s.t. \(x\in \{-1, 1\} \forall i=1, \dots, n\)

The maximum cut is \(c_\max = \frac{1}{4} \sum_i\sum_j Q_{ij} - \frac{1}{4} p^*\)

  • Dual problem (SDP) let \(\Lambda=diag(\lambda_1,\dots, \lambda_n)\), the Lagrangian is \(L(x,\lambda) = x^T Q x - \sum_i^n \lambda_i (x_i^2 - 1) = x^T (Q-\Lambda) x + Tr(A)\). the dual is

  • maximize \(tr(\Lambda)\)

  • s.t. \(Q-\Lambda\succeq 0\)

Property 2: \(g(\lambda,\nu)\le f_0(x)\) for every primal feasible x and dual feasible \((\lambda,\nu)\)

Remarks:

  • \(f_0(x) - g(\lambda,\nu)\): duality gap for \((x,\lambda,\nu)\), which is always non-negative \(p^*-d^*\): optimal duality gap

  • \(d^*\le p^*\): weak duality for any optimization problem: convex or not can be used to find a non-trivial lower bound on \(p^*\), for difficult primal problems

any dual feaisble solution gives a lower bound on primal objective value: for any feasible \(x\) \(x^T Qx \ge x^T \Lambda x = \sum_i \Lambda_{ii} x_i^2 = Tr(\Lambda)\)

Strong duality#

strong duality: \(d^* = p^*\)

  • does not hold in general

  • (usually) holds for convex problems

  • conditions that guarantee strong duality in convex problems are called constraint qualifications

Slater’s constraint qualification#

strong duality holds for a convex problem

  • minimize \(f_0(x)\)

  • s.t. \(f_i(x)\le 0\), \(Ax=b\)

if it is strictly feasible, i.e., \(\exists x \in int(D): f_i(x) < 0, i=1,\dots,m, Ax = b\).

Can be refined: linear inequalities do not need to hold with strict inequality

If an LP (in any form) has an optimal solution \(x^*\), then the dual also has an optimal solution \(y^*\) and \(C^T x^* = b^T y^*\)

Geometric interpretation#

interpretation of dual function: \(g(\lambda) = \inf_{(u,t)\in\mathcal{G}} (t + \lambda u)\), where \(\mathcal{G}=\{(f_1(x), f_0(x)) \}\)

  • \(\lambda u + t = g(\lambda)\) is (non-vertical) supporting hyperplane to \(\mathcal{G}\) which intersects t-axis at \(t=g(\lambda)\)

epigraph variation: same interpretation if \(\mathcal{G}\) is replaced with \(\mathcal{A}=\{ (u,t) \mid f_1(x)\le u, f_0(x)\le t \text{ for some } x\in D \}\)

strong duality

  • holds if there is a non-vertical suporting hyperplane to \(\mathcal{A}\) at \((0,p^*)\)

  • for convex problem, \(\mathcal{A}\) is convex, hence has supp. hyperplane at \((0,p^*)\)

  • Salter’s condition: if there exist \((\tilde{u},\tilde{t})\in\mathcal{A}\) with \(\tilde{u}<0\), then supporting hyperplanes at \((0,p^*)\) must be non-vertical

Sensitivity interpretation#

(unperturbed) optimization problem and its dual

  • minimize \(f_0(x)\)

  • s.t. \(f_i(x)\le 0\), \(h_i(x)=0\)

  • dual: \(\max g(\lambda,\nu)\) s.t. \(\lambda \succeq 0\)

perturbed problem and its dual

  • \(\min f_0(x)\)

  • s.t. \(f_i(x)\le u_i\), \(h_i(x) = v_i\)

  • dual: \(\max g(\lambda, \nu) - u^T\lambda - v^T \nu\) s.t. \(\lambda \succeq 0\)

  • \(p^*(u,v)\) is optimal value as a function of \(u\), \(v\)

  • local sensitivity: if strong duality holds and \(p^*(u,v)\) is differentiable at \((0,0)\) \(\lambda^*_i = - \frac{\partial p^*(0,0)}{\partial u_i}\), \(\nu^*_i = - \frac{\partial p^*(0,0)}{\partial v_i}\)

Saddle-point interpretation#

Assume no equality constraints (results can be easily extended) \(p^*=\inf_x \sup_{\lambda\succeq 0} L(x,\lambda)\), \(d^*=\sup_{\lambda\succeq 0} \inf_x L(x,\lambda)\)

  • weak duality: \(\sup_{\lambda\succeq 0} \inf_x L(x,\lambda) \le \inf_x \sup_{\lambda\succeq 0} L(x,\lambda)\)

  • strong duality: \(\sup_{\lambda\succeq 0} \inf_x L(x,\lambda) = \inf_x \sup_{\lambda\succeq 0} L(x,\lambda)\)

  • Max-min inequality generally holds: \(\sup_{z\in Z} \inf_{w\in W} f(w,z) \le \inf_{w\in W} \sup_{z\in Z} f(w,z) \forall f,W,Z\)

  • Strong max-min property (saddle point)

saddle-pint for \(f\): a pair \(\tilde{w} \in W\), \(\tilde{z}\in Z\) that satisfy \(f(\tilde{w},z) \le f(\tilde{w},\tilde{z})\le f(w, \tilde{z}), \forall w\in W, z\in Z\) \(\Longleftrightarrow f(\tilde{w},\tilde{z}) = \inf_{w\in W} f(w,\tilde{z}), f(\tilde{w},\tilde{z}) = \sup_{z\in Z} f(\tilde{w}, z)\) \(\Longleftrightarrow strong max-min property holds\)

\(x^*, \lambda^*\) are primal and dual optimal points and strong duality holds iff \((x^*,\lambda^*)\) form a saddle-point for the Lagrangian

Game interpretation#

  • player x choose strategy from \(1,2,\dots, n\)

  • player y choose strategy from \(1,2,\dots, n\)

  • \(P_{ij}\) is the amount x pays to y (payoff) when \(x\) plays strategy i, y plays startegy \(j\)

mixed strategy:

  • \(u_i\): prob (player x chooses strategy \(i\))

  • \(y_i\): prob (player y chooses strategy \(i\))

expected payoff: \(u^T P v\)

Suppose \(x\) fixes strategy \(u\), then \(y\) plays (decides \(v\)) to maximize expected payoff:

\[\max u^T P v s.t. \sum_{i=1}^m v_i= 1, v \succeeq 0\]

so x must choose u to minimize \(\max_i (P^T u)_i\): \(\min_u \max_i (P^T u)_i\) s.t. \(\sum_i^n u_i = 1\) and \(u\succeq 0\) equivalent to? \(\min t\) s.t. \(P^T u \preceq t 1\), \(u^T 1 = 1\) and \(u\preceq 0\)

LP(1) with optimal value: \(p_1^*\)

Suppose \(y\) plays first (v given), then x choose u to minimize expected payoff: \(\min u^T P v\), s.t. \(\sum_i^n u_i =1\) and \(u \preceq 0\)

LP(2) with optimal value: \(p_2^*\)

LP(1) and LP(2) are duals of each other and thus have the same optimal values: \(p_1^* = p_2^*\)

Therefore, there is no advantage to play second …

Consider payoff function \(f(u,v) = u^T P v\), the optimum \(u^*\) for LP(1) and the optimum \(v^*\) for LP(2) form a saddle-point for \(f(u,v)\) \(f(u^*,v) \le f(u^*,v^*) \le f(u,v^*)\) equivalent to \(f(u^*,v^*)=\inf_u f(u,v^*)\) and \(f(u^*,v^*) = \sup_v f(u^*,v)\)

Nash equilibrium of the game: \((u^*,v^*)\) such that

  • \(u^*\) is the best response of player x with respect to \(v^*\)

  • \(v^*\) is the best response of player y with respect to \(u^*\)

Usage example of duality#

Duality gives a way to analytically solve an optimization problem

Example 29

\(\min ||x||_2^2\) s.t. \(Ax = b\)

Lagrangian: \(L(x,\nu)=x^T x + \nu^T (Ax - b)\) Lagrange dual function: \(g(v)=\min x^T x + \nu^T A x - \nu^T b\)

\[\frac{\partial (x^T x + \nu^T A x)}{\partial x} = 2 x^T + \nu^T A = 0 \implies x = -\frac{A^T \nu}{x}\]

then \(g(\nu)=-\frac{1}{4} \nu^T A A^T \nu - \nu^T b\)

Dual problem: \(\max_\nu - \frac{1}{4} \nu^T A A^T \nu - \nu^T b\) \(\mu = -2 (AA^T)^{-1} b\) and \(x=A^T ...\)

derivative is the transpose of gradient

Duality is useful to find non-trivial lower bounds to non-convex minimization problems

Example 30

\(\min x^T W x, W \succeq\) s.t. \(x_i \in \{ -1, 1 \}\).

Relaxation: \(\min x^T W x\) s.t. \(-1 \le x_i \le 1\)

use duality t oobtain better lower bound (as on page15 of )

Non-convex problem with strong duality#

strong duality not implies convex problem

Example 31 (Non-convex problem with strong duality)

\(\min x^T A x\), \(A \in S^n\), s.t. \(x^T x = 1\).

solution: \(\min_{||x||_2 = 1} x^T A x = \text{min eigval of } A \)

let \(x=\sum_i^n \alpha_i v_i\) (linear combination of orthogonal eigenvectors of \(A\)) \(v_i\) is an eigenvector corresponding to eigenvalue \(\lambda_i\): \(A v_i = \lambda_i v_i, ||v_i||_2 = 1\). we have \(x^T A x = (\sum_i \alpha_i v_i)^T A (\sum_i \alpha_i v_i) = \sum_i \alpha_i^2 \lambda_i ||v_i||_2^2 = \sum_i^n \alpha_i^2 \lambda_i \) since \(x^T x = 1\), we know \(\sum_i \alpha_i^2 = 1\).

Therefore, \(x^T A x\) is minimized when \(\alpha_j =1\) for \(\lambda_j = \min(\lambda_1,\dots,\lambda_j)= ...\)

Derive the dual problem

  • Lagrangian: \(L(x,\nu)=x^T A x + \nu (x^T x - 1)\)

  • Lagrangian dual function: \(g(\nu) = \min_x L(x,\nu)= \min_x x^T (A+\nu I) x - \nu = -\nu, \text{if } A+\nu I \succeq 0; -\infty, \text{ otherwise } \)

  • dual problem: \(\max -\nu\) s.t. \(A + \nu I \succeq 0\)

Therefore, the largest \(-\nu\) to make \(A + \nu I \succeq 0\) is \(\min\{\lambda_1, \dots, \lambda_n\}\)

Complementary slackness for LP#

Primal: \(\min c^T x\) s.t. \(Ax \succeq b\), \(x\succeq 0\)

Dual: \(\max b^T y\) s.t. \(A^T y \preceq c\), \(y \preceq 0\)

Theorem 9 (Complementary slackness theorem for LP)

A pair of feasible solutions \(x\in R^n\) and \(y\in R^m\) for primal and dual LP problems is optimal iff \(y_i (b_i - (Ax)_i)=0, \forall i = 1, \dots, m\) \(x_i (c_j - (A^T y)_j)=0, \forall j = 1, \dots, m\)

Proof of sufficiency: Given \(x, y\) satisfy complementary slackness

Karush_Kuhn-Tucker (KKT) conditions#

the following four conditions are called KKT conditions (for a problem with differentiable \(f_i\), \(h_i\)): \(\min f_0(x)\) s.t. \(f_i(0)\)

  • primal constraints:

  • dual constraints:

  • complementary slackness: \(\lambda_i f_i(x)=0\), \(i=1,\dots,m\)

  • gradient of Lagrangian with respect to \(x\) vanishes:

\[\nabla f_0(x) + \sum_i^m \lambda_i \nabla f_i(x) + \sum_i^p \nu_i \nabla h_i(x) = 0\]

For any optimization problem with differentiable objective and constraint functions, if strong duality holds, any pair of primal and dual optimal points \(x^*, \lambda^*, \nu^*\) must satisfy the KKT conditions \(g(\lambda^*, \nu^*) = \inf_x L(x,\lambda^*,\nu^*) = \inf_x f_0(x) + \sum_i^m \lambda_i^* f_i(x) + \sum_i^p \nu_i^* h_i(x) = f_0(x^*)\)

KKT conditions for convex programs#

For a convex program, any points \(\tilde{x}\), \(\tilde{\lambda}\), \(\tilde{\nu}\) that satisfy the KKT conditions are primal and dual optimal, and have zero duality gap \(\tilde{\lambda_i} \ge 0\), \(\forall i \to L(x,\tilde{\lambda},\tilde{\nu})\) is convex in \(x\).

gradient of Lagrangian with respect to \(\tilde{x}\) vanishes \(\implies\) \(\tilde{x}\) minimizes \(L(x,\tilde{\lambda}, \tilde{\nu})\), then \(g(\tilde{\nu},\tilde{\nu})=\inf_x L(x,\tilde{\lambda},\tilde{\nu}=...\) since \(g(\tilde{\lambda},\)

If a convex program with differentiable objective and constraint functions satisfies Salter’s conditions, then KKT condition is necessary and sufficient for optimality

  • recall that Slater implies strong duality

Significance of KKT

  • solve KKT to derive

Example 32

\(\min x_1^2 + 2x_2^2\) s.t. \(x_1 + x_2 \ge 3\)

To solve this, try to find \((x^*,\lambda^*)\) that satisfy the KKT conditions

Example 33 (Water-filling)

\(\min \min - \sum_i^k \log (1 + p_i/N_i)\) s.t. \(\sum_i^k p_i \le P\), \(p_i\ge 0, \forall i = 1,\dots, k\)

the algorithm takes the intuition from KKT

Generalization: maxmimally allocate resource to pi with the current smallest marginal utility, until all resource P is used up.

  • equivalent formulation of a problem can lead to very different duals

  • reformulating

  • dual

  • dual

Example 34

Introducing new variables and equality constraints

minimize \(f_0(Ax + b)\)

  • dual function is constant: \(g=\inf_x L(x) = \inf_x f_0(Ax_b) = p^*\)

  • we have strong duality, but dual is quire useless

reformulated problem and its dual

  • \(\min f_0(y)\) s.t. \(Ax + b - y = 0\)

  • \(b^T \nu - f_0^*(\nu)\) s.t. \(\ge\)

Example 35

Making explicit constraint implicit

LP with box constraints: primal and dual problem:

  • \(\min c^T x\) s.t. \(Ax=b, -1\preceq x \preceq 1\)

  • \(\max -b^T \nu - 1^T \lambda_1 - 1^T \lambda_2\) s.t. \(c+A^T \nu + \lambda_1 - \lambda_2 =0, \lambda_1 \succeq 0\), \(\lambda_2 \succeq 0\)

reformulation with box constraints made implicit:

  • \(\minimize f_0(x) = c^Tx; \infty\) s.t.

dual function \(g(\nu) = \inf_{-1\preceq x \preceq 1} (c^T x + \nu^T(Ax - b))=-b^T\nu - \norm{A^T \nu + c}_1\)

Partial Lagrangian relaxation#

One can take Lagrangian with respect to only a subset of the constraints, e.g.,

If convex problem and Slater’s conditino is satisfied: \(\max_{\lambda \succeq 0} \tilde{g}(\lambda) = \min_{f_i(x)\le 0} f_0(x)\) let \(x^*\) be the primal optimal point, \((\lambda^*_1, \lambda^*_2)\) be dual optimal point for \(g(\lambda_1,\lambda_2)\) and \(\lambda_1'\) be dual optimal point for \(\tilde{g}(\lambda)\)

\[f_0(x^*) \ge \tilde{g}(\lambda_1') \ge \tilde{g} (\lambda_1^*) = \min f_0 (x) + \lambda_1^* f_1 (x) \le \min f_0(x) + \lambda_1^* f_1(x) + \lambda_2^* f_2(x) = g(\lambda_1^*, \lambda_2^*) = f_0(x^*)\]

The aobve property extensible to general case, when the partial Lagrangian is derived by relaxing

Example 36

Transform objective function

\(\min ||Ax-b||\)

reformulation \(\min 1/2 ||y||^2\) s.t. \(y=Ax-b\) dual problem on page 257, textbook

Generalized inequality#

dual problem

  • \(\max g(\lambda_1, \dots. \lambda_m, \nu)\) s.t. \(\lambda_i \succeq_{K_i^*} 0, i=1,\dots,m\)

Properties:

  • lower bound property: if \(\lambda_i \succeq_{K_i^*} 0\), then \(g\)

  • weak duality …

  • strong duality …

  • KKT

Example 37 (Semidefinite program)

primal SDP \((F_i,G\in S^k)\): \(\min c^T x\) s.t. \(x_1 F_1 + \cdots + x_n F_n \preceq G\)

  • Lagrange multiplier is matrix \(Z\in S^k\), define \(<A,B>=tr(AB)\)

  • Lagrangian \(L(x,Z) = c^T x + Tr(Z(x_1 F_1 + \cdots + x_n F_n -G) )\)

  • dual function \(g(Z)=\inf_x L(x,Z)=-tr(GZ), tr(F_i Z) + c_i = 0, \; i=1, \dots ,n; -\infty, otherwise\)

dual problem:

  • \(\max -tr(GZ)\) s.t \(Z\succeq 0\), \(tr(F_i Z) + c_i = 0\)