SDP and Duality

SDP and Duality#

Duality#

Optimization in standard form (not necessarily convex)

minimize $f_0(x)$
s.t. $f_i\le 0$, $h_i(x)=0$

Lagrangian#

$L: R^n\times R^m \times R^p \to R$ with $dom (L) \ D \times R^m \times R^p$ $L(x,\lambda,\nu)= f_0(x) + \sum_i^m \lambda_i f_i(x) + \sum_i^p \nu_i h_i$

Lagrange multipliers (Lagrangian dual variables): $\lambda_i$, $\nu_i$
augment $f_0(x)$ by weighted sum of contraint functions (penalty functions)

Lagrange dual function: $g:R^m\times R^p \to R$, $$ g(\lambda,\nu) = \inf L(x,\lambda,\nu) $$

Lagrange dual problem

$\max g(\lambda, \nu)$
$\lambda \prec 0$

LP and its dual#

standard form LP#

minimize $c^T x$
s.t. $Ax = b$, $x \succeq 0$

Lagrangian#

$L(x,\lambda,\nu) = c^T x + v^T (Ax-b) - \lambda^T x = -b^T v + (c+A^T v- \lambda)^T x$

Lagrange dual function#

$g(\lambda,\nu) = \inf_{x} L(x,\lambda,\nu)=-b^T v, A^Tv-\lambda+c=0; -\infty, otherwise$

Dual problem#

maximize $-b^T v$
s.t. $A^T v + c \preceq 0$

Lagrange dual and conjugate function#

optimization with affine inequality and equality constraints

minimize $f_0(x)$
s.t. $Ax \preceq b$, $C x = d$

Lagrange dual function $g(\lambda,v) = \inf_x f_0(x) + (A^T \lambda + C^T v)^T x - b^T \lambda -d^T v=-f_0^*(-A^T \lambda - C^T v) - b^T \lambda -d^T v$

recall definition of conjugate $f^*(y)=\sup_x (y^T x - f(x))$
simplifies derivation of dual if conjugate of $f_0$ is known $f_0(x) = \sum_i^n x_i \log x_i$, $f_0^*(y) = \sum_i^n e^{y_i-1}$

Example 27 (driving the dual problem)

$\min x_1^2 + x_2^2$
s.t $3x_1 + 5x_2 = 7$, $6x_1 + 4x_2 \le 9$

Lagrangian: $L(x,\lambda,\mu) = x_1^2 + x_2^2 + \lambda(6x_1 + 4x_2-9)+\mu(3x_1+5x_2 -7)$

Lagrange dual function: $g(\lambda,\mu) = \inf_{x\in R^2} x_1^2 + x_2^2 + \lambda(6x_1 + 4x_2 - 9) + \mu (3x_1 + 5x_2 - 7) = -13 \lambda^2 - 8.5 \mu^2 - 19\lambda \mu -9\lambda -7\mu$

dual problem: $\max-13\lambda^2 - 8.5 \nu^2 ...$

Primal and Lagrange dual problems#

Primal problem (convex or non-convex)
Lagrange dual problem (is a convex optimization problem)

Property 1: $g(\lambda ,\nu)$ is concave on $(\lambda,\nu)$ even if the original problem is not convex

Proof: For each fixed $x, g(\lambda, \nu)$ is … pointwise infimum … concave

Example 28 (Max-Cut)

variable:

adjacency matrix of the graph $Q=1, \{i,j\}\in E; 0, OW$
a cut decided by a vector $x\in R^n$, $x_i = 1, i\in S; -1, OW$
capacity of the cut $c(x) = \frac{1}{4} \sum_i^n\sum_j^n (1-x_ix_j)Q_{ij}$ ($1-x_ix_j=2$ if $\{i,j\}$ is in the cut set) $(x_i-x_j/2)^2$

primal problem (NP Complete even if $Q\succeq 0$):

minimize $x^T Q x$
s.t. $x\in \{-1, 1\} \forall i=1, \dots, n$

The maximum cut is $c_\max = \frac{1}{4} \sum_i\sum_j Q_{ij} - \frac{1}{4} p^*$

Dual problem (SDP) let $\Lambda=diag(\lambda_1,\dots, \lambda_n)$, the Lagrangian is $L(x,\lambda) = x^T Q x - \sum_i^n \lambda_i (x_i^2 - 1) = x^T (Q-\Lambda) x + Tr(A)$. the dual is
maximize $tr(\Lambda)$
s.t. $Q-\Lambda\succeq 0$

Property 2: $g(\lambda,\nu)\le f_0(x)$ for every primal feasible x and dual feasible $(\lambda,\nu)$

Remarks:

$f_0(x) - g(\lambda,\nu)$: duality gap for $(x,\lambda,\nu)$, which is always non-negative $p^*-d^*$: optimal duality gap
$d^*\le p^*$: weak duality for any optimization problem: convex or not can be used to find a non-trivial lower bound on $p^*$, for difficult primal problems

any dual feaisble solution gives a lower bound on primal objective value: for any feasible $x$ $x^T Qx \ge x^T \Lambda x = \sum_i \Lambda_{ii} x_i^2 = Tr(\Lambda)$

Strong duality#

strong duality: $d^* = p^*$

does not hold in general
(usually) holds for convex problems
conditions that guarantee strong duality in convex problems are called constraint qualifications

Slater’s constraint qualification#

strong duality holds for a convex problem

minimize $f_0(x)$
s.t. $f_i(x)\le 0$, $Ax=b$

if it is strictly feasible, i.e., $\exists x \in int(D): f_i(x) < 0, i=1,\dots,m, Ax = b$.

Can be refined: linear inequalities do not need to hold with strict inequality

If an LP (in any form) has an optimal solution $x^*$, then the dual also has an optimal solution $y^*$ and $C^T x^* = b^T y^*$

Geometric interpretation#

interpretation of dual function: $g(\lambda) = \inf_{(u,t)\in\mathcal{G}} (t + \lambda u)$, where $\mathcal{G}=\{(f_1(x), f_0(x)) \}$

$\lambda u + t = g(\lambda)$ is (non-vertical) supporting hyperplane to $\mathcal{G}$ which intersects t-axis at $t=g(\lambda)$

epigraph variation: same interpretation if $\mathcal{G}$ is replaced with $\mathcal{A}=\{ (u,t) \mid f_1(x)\le u, f_0(x)\le t \text{ for some } x\in D \}$

strong duality

holds if there is a non-vertical suporting hyperplane to $\mathcal{A}$ at $(0,p^*)$
for convex problem, $\mathcal{A}$ is convex, hence has supp. hyperplane at $(0,p^*)$
Salter’s condition: if there exist $(\tilde{u},\tilde{t})\in\mathcal{A}$ with $\tilde{u}<0$, then supporting hyperplanes at $(0,p^*)$ must be non-vertical

Sensitivity interpretation#

(unperturbed) optimization problem and its dual

minimize $f_0(x)$
s.t. $f_i(x)\le 0$, $h_i(x)=0$
dual: $\max g(\lambda,\nu)$ s.t. $\lambda \succeq 0$

perturbed problem and its dual

$\min f_0(x)$
s.t. $f_i(x)\le u_i$, $h_i(x) = v_i$
dual: $\max g(\lambda, \nu) - u^T\lambda - v^T \nu$ s.t. $\lambda \succeq 0$
$p^*(u,v)$ is optimal value as a function of $u$, $v$
local sensitivity: if strong duality holds and $p^*(u,v)$ is differentiable at $(0,0)$ $\lambda^*_i = - \frac{\partial p^*(0,0)}{\partial u_i}$, $\nu^*_i = - \frac{\partial p^*(0,0)}{\partial v_i}$

Saddle-point interpretation#

Assume no equality constraints (results can be easily extended) $p^*=\inf_x \sup_{\lambda\succeq 0} L(x,\lambda)$, $d^*=\sup_{\lambda\succeq 0} \inf_x L(x,\lambda)$

weak duality: $\sup_{\lambda\succeq 0} \inf_x L(x,\lambda) \le \inf_x \sup_{\lambda\succeq 0} L(x,\lambda)$
strong duality: $\sup_{\lambda\succeq 0} \inf_x L(x,\lambda) = \inf_x \sup_{\lambda\succeq 0} L(x,\lambda)$
Max-min inequality generally holds: $\sup_{z\in Z} \inf_{w\in W} f(w,z) \le \inf_{w\in W} \sup_{z\in Z} f(w,z) \forall f,W,Z$
Strong max-min property (saddle point)

saddle-pint for $f$: a pair $\tilde{w} \in W$, $\tilde{z}\in Z$ that satisfy $f(\tilde{w},z) \le f(\tilde{w},\tilde{z})\le f(w, \tilde{z}), \forall w\in W, z\in Z$ $\Longleftrightarrow f(\tilde{w},\tilde{z}) = \inf_{w\in W} f(w,\tilde{z}), f(\tilde{w},\tilde{z}) = \sup_{z\in Z} f(\tilde{w}, z)$ $\Longleftrightarrow strong max-min property holds$

$x^*, \lambda^*$ are primal and dual optimal points and strong duality holds iff $(x^*,\lambda^*)$ form a saddle-point for the Lagrangian

Game interpretation#

player x choose strategy from $1,2,\dots, n$
player y choose strategy from $1,2,\dots, n$
$P_{ij}$ is the amount x pays to y (payoff) when $x$ plays strategy i, y plays startegy $j$

mixed strategy:

$u_i$: prob (player x chooses strategy $i$)
$y_i$: prob (player y chooses strategy $i$)

expected payoff: $u^T P v$

Suppose $x$ fixes strategy $u$, then $y$ plays (decides $v$) to maximize expected payoff:

\[\max u^T P v s.t. \sum_{i=1}^m v_i= 1, v \succeeq 0\]

so x must choose u to minimize $\max_i (P^T u)_i$: $\min_u \max_i (P^T u)_i$ s.t. $\sum_i^n u_i = 1$ and $u\succeq 0$ equivalent to? $\min t$ s.t. $P^T u \preceq t 1$, $u^T 1 = 1$ and $u\preceq 0$

LP(1) with optimal value: $p_1^*$

Suppose $y$ plays first (v given), then x choose u to minimize expected payoff: $\min u^T P v$, s.t. $\sum_i^n u_i =1$ and $u \preceq 0$

LP(2) with optimal value: $p_2^*$

LP(1) and LP(2) are duals of each other and thus have the same optimal values: $p_1^* = p_2^*$

Therefore, there is no advantage to play second …

Consider payoff function $f(u,v) = u^T P v$, the optimum $u^*$ for LP(1) and the optimum $v^*$ for LP(2) form a saddle-point for $f(u,v)$ $f(u^*,v) \le f(u^*,v^*) \le f(u,v^*)$ equivalent to $f(u^*,v^*)=\inf_u f(u,v^*)$ and $f(u^*,v^*) = \sup_v f(u^*,v)$

Nash equilibrium of the game: $(u^*,v^*)$ such that

$u^*$ is the best response of player x with respect to $v^*$
$v^*$ is the best response of player y with respect to $u^*$

Usage example of duality#

Duality gives a way to analytically solve an optimization problem

Example 29

$\min ||x||_2^2$ s.t. $Ax = b$

Lagrangian: $L(x,\nu)=x^T x + \nu^T (Ax - b)$ Lagrange dual function: $g(v)=\min x^T x + \nu^T A x - \nu^T b$

\[\frac{\partial (x^T x + \nu^T A x)}{\partial x} = 2 x^T + \nu^T A = 0 \implies x = -\frac{A^T \nu}{x}\]

then $g(\nu)=-\frac{1}{4} \nu^T A A^T \nu - \nu^T b$

Dual problem: $\max_\nu - \frac{1}{4} \nu^T A A^T \nu - \nu^T b$ $\mu = -2 (AA^T)^{-1} b$ and $x=A^T ...$

derivative is the transpose of gradient

Duality is useful to find non-trivial lower bounds to non-convex minimization problems

Example 30

$\min x^T W x, W \succeq$ s.t. $x_i \in \{ -1, 1 \}$.

Relaxation: $\min x^T W x$ s.t. $-1 \le x_i \le 1$

use duality t oobtain better lower bound (as on page15 of )

Non-convex problem with strong duality#

strong duality not implies convex problem

Example 31 (Non-convex problem with strong duality)

$\min x^T A x$, $A \in S^n$, s.t. $x^T x = 1$.

solution: $\min_{||x||_2 = 1} x^T A x = \text{min eigval of } A $

let $x=\sum_i^n \alpha_i v_i$ (linear combination of orthogonal eigenvectors of $A$) $v_i$ is an eigenvector corresponding to eigenvalue $\lambda_i$: $A v_i = \lambda_i v_i, ||v_i||_2 = 1$. we have $x^T A x = (\sum_i \alpha_i v_i)^T A (\sum_i \alpha_i v_i) = \sum_i \alpha_i^2 \lambda_i ||v_i||_2^2 = \sum_i^n \alpha_i^2 \lambda_i $ since $x^T x = 1$, we know $\sum_i \alpha_i^2 = 1$.

Therefore, $x^T A x$ is minimized when $\alpha_j =1$ for $\lambda_j = \min(\lambda_1,\dots,\lambda_j)= ...$

Derive the dual problem

Lagrangian: $L(x,\nu)=x^T A x + \nu (x^T x - 1)$
Lagrangian dual function: $g(\nu) = \min_x L(x,\nu)= \min_x x^T (A+\nu I) x - \nu = -\nu, \text{if } A+\nu I \succeq 0; -\infty, \text{ otherwise } $
dual problem: $\max -\nu$ s.t. $A + \nu I \succeq 0$

Therefore, the largest $-\nu$ to make $A + \nu I \succeq 0$ is $\min\{\lambda_1, \dots, \lambda_n\}$

Complementary slackness for LP#

Primal: $\min c^T x$ s.t. $Ax \succeq b$, $x\succeq 0$

Dual: $\max b^T y$ s.t. $A^T y \preceq c$, $y \preceq 0$

Theorem 9 (Complementary slackness theorem for LP)

A pair of feasible solutions $x\in R^n$ and $y\in R^m$ for primal and dual LP problems is optimal iff $y_i (b_i - (Ax)_i)=0, \forall i = 1, \dots, m$ $x_i (c_j - (A^T y)_j)=0, \forall j = 1, \dots, m$

Proof of sufficiency: Given $x, y$ satisfy complementary slackness

Karush_Kuhn-Tucker (KKT) conditions#

the following four conditions are called KKT conditions (for a problem with differentiable $f_i$, $h_i$): $\min f_0(x)$ s.t. $f_i(0)$

primal constraints:
dual constraints:
complementary slackness: $\lambda_i f_i(x)=0$, $i=1,\dots,m$
gradient of Lagrangian with respect to $x$ vanishes:

\[\nabla f_0(x) + \sum_i^m \lambda_i \nabla f_i(x) + \sum_i^p \nu_i \nabla h_i(x) = 0\]

For any optimization problem with differentiable objective and constraint functions, if strong duality holds, any pair of primal and dual optimal points $x^*, \lambda^*, \nu^*$ must satisfy the KKT conditions $g(\lambda^*, \nu^*) = \inf_x L(x,\lambda^*,\nu^*) = \inf_x f_0(x) + \sum_i^m \lambda_i^* f_i(x) + \sum_i^p \nu_i^* h_i(x) = f_0(x^*)$

KKT conditions for convex programs#

For a convex program, any points $\tilde{x}$, $\tilde{\lambda}$, $\tilde{\nu}$ that satisfy the KKT conditions are primal and dual optimal, and have zero duality gap $\tilde{\lambda_i} \ge 0$, $\forall i \to L(x,\tilde{\lambda},\tilde{\nu})$ is convex in $x$.

gradient of Lagrangian with respect to $\tilde{x}$ vanishes $\implies$ $\tilde{x}$ minimizes $L(x,\tilde{\lambda}, \tilde{\nu})$, then $g(\tilde{\nu},\tilde{\nu})=\inf_x L(x,\tilde{\lambda},\tilde{\nu}=...$ since $g(\tilde{\lambda},$

If a convex program with differentiable objective and constraint functions satisfies Salter’s conditions, then KKT condition is necessary and sufficient for optimality

recall that Slater implies strong duality

Significance of KKT

solve KKT to derive

Example 32

$\min x_1^2 + 2x_2^2$ s.t. $x_1 + x_2 \ge 3$

To solve this, try to find $(x^*,\lambda^*)$ that satisfy the KKT conditions

Example 33 (Water-filling)

$\min \min - \sum_i^k \log (1 + p_i/N_i)$ s.t. $\sum_i^k p_i \le P$, $p_i\ge 0, \forall i = 1,\dots, k$

the algorithm takes the intuition from KKT

Generalization: maxmimally allocate resource to pi with the current smallest marginal utility, until all resource P is used up.

equivalent formulation of a problem can lead to very different duals
reformulating
dual
dual

Example 34

Introducing new variables and equality constraints

minimize $f_0(Ax + b)$

dual function is constant: $g=\inf_x L(x) = \inf_x f_0(Ax_b) = p^*$
we have strong duality, but dual is quire useless

reformulated problem and its dual

$\min f_0(y)$ s.t. $Ax + b - y = 0$
$b^T \nu - f_0^*(\nu)$ s.t. $\ge$

Example 35

Making explicit constraint implicit

LP with box constraints: primal and dual problem:

$\min c^T x$ s.t. $Ax=b, -1\preceq x \preceq 1$
$\max -b^T \nu - 1^T \lambda_1 - 1^T \lambda_2$ s.t. $c+A^T \nu + \lambda_1 - \lambda_2 =0, \lambda_1 \succeq 0$, $\lambda_2 \succeq 0$

reformulation with box constraints made implicit:

$\minimize f_0(x) = c^Tx; \infty$ s.t.

dual function $g(\nu) = \inf_{-1\preceq x \preceq 1} (c^T x + \nu^T(Ax - b))=-b^T\nu - \norm{A^T \nu + c}_1$

Partial Lagrangian relaxation#

One can take Lagrangian with respect to only a subset of the constraints, e.g.,

If convex problem and Slater’s conditino is satisfied: $\max_{\lambda \succeq 0} \tilde{g}(\lambda) = \min_{f_i(x)\le 0} f_0(x)$ let $x^*$ be the primal optimal point, $(\lambda^*_1, \lambda^*_2)$ be dual optimal point for $g(\lambda_1,\lambda_2)$ and $\lambda_1'$ be dual optimal point for $\tilde{g}(\lambda)$

\[f_0(x^*) \ge \tilde{g}(\lambda_1') \ge \tilde{g} (\lambda_1^*) = \min f_0 (x) + \lambda_1^* f_1 (x) \le \min f_0(x) + \lambda_1^* f_1(x) + \lambda_2^* f_2(x) = g(\lambda_1^*, \lambda_2^*) = f_0(x^*)\]

The aobve property extensible to general case, when the partial Lagrangian is derived by relaxing

Example 36

Transform objective function

$\min ||Ax-b||$

reformulation $\min 1/2 ||y||^2$ s.t. $y=Ax-b$ dual problem on page 257, textbook

Generalized inequality#

dual problem

$\max g(\lambda_1, \dots. \lambda_m, \nu)$ s.t. $\lambda_i \succeq_{K_i^*} 0, i=1,\dots,m$

Properties:

lower bound property: if $\lambda_i \succeq_{K_i^*} 0$, then $g$
weak duality …
strong duality …
KKT

Example 37 (Semidefinite program)

primal SDP $(F_i,G\in S^k)$: $\min c^T x$ s.t. $x_1 F_1 + \cdots + x_n F_n \preceq G$

Lagrange multiplier is matrix $Z\in S^k$, define $<A,B>=tr(AB)$
Lagrangian $L(x,Z) = c^T x + Tr(Z(x_1 F_1 + \cdots + x_n F_n -G) )$
dual function $g(Z)=\inf_x L(x,Z)=-tr(GZ), tr(F_i Z) + c_i = 0, \; i=1, \dots ,n; -\infty, otherwise$

dual problem:

$\max -tr(GZ)$ s.t $Z\succeq 0$, $tr(F_i Z) + c_i = 0$

SDP and Duality

Contents

SDP and Duality#

Duality#

Lagrangian#

LP and its dual#

standard form LP#

Lagrangian#

Lagrange dual function#

Dual problem#

Lagrange dual and conjugate function#

Primal and Lagrange dual problems#

Strong duality#

Slater’s constraint qualification#

Geometric interpretation#

Sensitivity interpretation#

Saddle-point interpretation#

Game interpretation#

Usage example of duality#

Non-convex problem with strong duality#

Complementary slackness for LP#

Karush_Kuhn-Tucker (KKT) conditions#

KKT conditions for convex programs#

Partial Lagrangian relaxation#

Generalized inequality#