Duality Theorems in Optimization

Duality theorem is the duality concept in optimization first conceived by John von Neumann and proved by George Dantzig.

Dual descriptions:

Sets: union of points - intersection of halfspaces; (convex set)
Functions: function - conjugate; (convex function)
Optimization: min-common - max-crossing;

Concepts

The min-common / max-crossing duality is a dual pair of fundamental optimization problems. Given a nonempty set $M \in \mathbb{R}^{n+1}$, the min common point problem is to find the point in $M$ on the (n+1)-st axis that has the minimum (n+1)-st component $w^∗$. The max cross point problem is to find the non-vertical hyperplane containing $M$ in its upper ((n+1)-st component) closed half-space that has the maximum crossing point of the (n+1)-st axis $q^∗$.

When the set is the epigraph of the primal function $M = \text{epi}(p)$, then $w^∗ = p(0)$ and $q^∗ = p^{\star\star}(0)$, so $q^∗ \le w^∗$ with equality when $p$ is closed and convex.

Three general theorems (MC/MC Theorems): 1. strong duality; 1. existence of dual optimal solutions; 1. polyhedral refinements;

linear programming duality theorem; optimality conditions.

convex programming duality theorem; optimality conditions.

Fenchel duality theorem

conic duality

constrained optimality condition

Subdifferentiability

For a convex function, the "slope" of every supporting hyperplane is a subgradient (subderivative, 次梯度/次导数) of it at the supporting point. The subdifferential (次微分) of a convex function at a point is the set of all its subgradients at that point.

For convex functions: [@Bertsekas2003, 4.1.1]

Finite directional derivatives exist;
Left derivatives are no greater than right derivatives, along any direction;
Right derivatives are no greater than left derivatives at a point on the right;
Left derivatives and right derivatives are non-decreasing;
Right derivative is right continuous, left derivative is left continuous;

Convex functions are subdifferentiable.

The direction of steepest ascent of a concave function is the subgradient with the smallest Euclidean norm.

Conjugate subgradient theorem

Perturbed Problems

Given an optimization problem, we don't obtain a dual problem until we specify how to perturb the optimization problem.

Perturbed problem: $$\begin{align} \min \quad & f_0(x) \\ \text{s.t.} \quad & f_i(x) \le u_i && i = 1,\dots,m \\ \end{align}$$ Denote the perturbed objective function with implicit constraints as $p(x;u,v)$ and perturbed optimal value as $p^∗ (u,v)$, then the primal (objective function) is $p(x;0,0)$ and the primal optimal is $p^∗ (0,0)$.

Define $p^∗ (u,v)$ as the optimal value of the perturbed problem:

When the original problem is convex, the function $p^∗ (u,v)$ is convex.
If strong duality holds for the original problem, then $(\lambda^∗ ,\nu^∗ )$ is a subgradient of the perturbed optimal values at the primal optimal. $p^∗ (u, v) \ge p^∗ (0, 0) - \langle (\lambda^∗ , \nu^∗ ), (u, v) \rangle$

Sensitivity interpretations of the dual optimal (given strong duality):

If $\lambda^*$ is large, tightening the inequality constraints ($u < 0$) is guaranteed to increase the optimal value $p^∗ (u,v)$ greatly.
If $\lambda^*$ is small, loosening the inequality constraints ($u > 0$) will not decrease the optimal value $p^* (u,v)$ greatly.
If change in equality constraints gives $\langle \nu^∗ , v \rangle > 0$ the optimal value $p^∗ (u,v)$ is guaranteed to increase, where the absolute value of $\nu^∗$ reflects the sensitivity of increment.
If the optimal value $p^∗ (u, v)$ is differentiable at the origin, its local sensitivity (gradient) is the negative dual optimal $-(\lambda^∗ ,\nu^∗ )$.

Dual Problem

Denote convex conjugate as $\star$, then the biconjugate of perturbed optimal values is $(p^∗ )^{\star\star}(u,v)$.

Since biconjugate is the closed convex hull, $(p^∗ )^{\star\star}(0,0) \le p^∗ (0,0)$.
By Fenchel–Moreau theorem, if the primal $p(x;0,0)$ is convex and closed, $(p^∗ )^{\star\star}(0,0) = p^∗ (0,0)$.
Since convex conjugates are convex functions, $(p^∗ )^{\star\star}(0,0) = \max_{u^\star,v^\star} -(p^∗ )^\star (u^\star,v^\star)$ is a convex program.

In summary, $(p^∗ )^{\star\star}(0,0)$ is a convex program whose optimal value gives a lower bound of the primal optimal; if the primal is itself convex, the new convex program typically gives the same optimal.

Due to its good properties, we call $(p^∗ )^{\star\star}(0,0)$ the dual problem of the primal $p^∗ (0,0)$. Equivalent formulations of an optimization problem can have different perturbations, leading to different dual problems.

Lagrange Dual Problem

This is the Lagrangian narrative of the dual problem.

The Lagrangian $L : D \times \mathbb{R}^m \times \mathbb{R}^p \to \mathbb{R}$ is the objective function added by a weighted sum of the constraint functions, whose weights are called the Lagrange multiplier associated with that (in)equality constraint, aka the dual variables associated with the problem. $L(x, \lambda, \nu) = f_0(x) + \sum_i \lambda_i f_i(x) + \sum_i \nu_i h_i(x)$

The Lagrange dual function $g : \mathbb{R}^m \times \mathbb{R}^p \to R$ is the minimum value of the Lagrangian over the optimization variables, at specific values of the dual variables: $g(\lambda, \nu) = \inf_{x \in D} L(x, \lambda, \nu)$ Lagrange multipliers are in the dual space of optimization variable space.

Lagrange dual problem of a primal problem: $$\begin{align} \max \quad & g(\lambda, \nu) \\ \text{s.t.} \quad & \lambda \succeq 0 \end{align}$$

The feasible set of the dual problem: $\text{dom}(g) ∩ (\mathbb{R}_+^m \times \mathbb{R}^p)$.

Dual optimal (optimal Lagrange multipliers) is the optimal point $(\lambda^∗ , \nu^∗ )$ of the Lagrange dual problem. Optimal value of the Lagrange dual problem $d^∗ = \sup\{g(\lambda, \nu)\}$, where $d$ is for dual.

Weak duality: The optimal value of the dual (the best lower bound) is no greater than the optimal value of the primal: $d^∗ \le p^∗$. Strong duality: The optimal value of the dual (the best lower bound) equals the optimal value of the primal: $d^∗ = p^∗$. The duality gap associated with a primal feasible point $x$ and a dual feasible point $(\lambda,\nu)$ is $f_0(x) − g(\lambda,\nu)$. Sandwiching: $g(\lambda,\nu) \le d^∗ \le p^∗ \le f_0(x)$. The optimal duality gap is the gap between the optimal value of the primal problem and the greatest lower bound on it that can be obtained from the Lagrange dual function: $p^∗ - d^∗$.

Properties of the dual function:

The dual function is (always) concave, because it is the point-wise infimum of a family of affine functions (which are concave).
The dual function is no greater than the optimal value: $\forall \lambda \succeq 0, \nu: g(\lambda, \nu) \le p^∗$.
For all $x \in g^{-1}(\lambda, \nu)$, $(f_i, h_i)(x)$ is a subgradient of the dual function at $(\lambda, \nu)$.
The dual function $g(\lambda, \nu)$ is differentiable at $(\lambda, \nu)$ if primal domain $D$ is compact, $f_i$ are continuous, and $g^{-1}(\lambda, \nu)$ is a singleton $\{x\}$; additionally $\nabla g(\lambda, \nu) = (f_i, h_i)(x)$.
The direction of steepest ascent of a dual function is the subgradient with the smallest Euclidean norm.

Properties of the dual problem:

The Lagrange dual problem is (always) a convex optimization problem, since the objective to be maximized is concave and the constraint is convex.
The Lagrange dual problem can be used to find a lower bound on the optimal value of a problem that is difficult to solve, since the dual problem is always convex, which in many cases can be solved efficiently.
If the primal problem (5.1) is convex, strong duality usually (but not always) holds.

Techniques

The Lagrange dual of the standard form LP is an inequality form LP.

$$\begin{align} \max \quad & −\langle b,\nu\rangle \\ \text{s.t.} \quad & \langle A,\nu\rangle + c \succeq 0 \end{align}$$

The Lagrange dual of the inequality form LP is a standard form LP.

$$\begin{align} \max \quad & −\langle b,\lambda \rangle \\ \text{s.t.} \quad & \langle A,\lambda \rangle + c = 0 \\ & \lambda \succeq 0 \end{align}$$

An optimality criterion for convex optimization problems with differentiable objective functions

A point on the boundary of the feasible set is optimal, if the negative gradient of the objective function at this point is an outer-pointing normal of the feasible set. $\langle \nabla f_0(x), y−x \rangle \ge 0, \forall y \in X$

A convex optimization problem with differentiable objective function is trivial if its optimal set does not touch the boundary of its feasible set, in which case you may simply solve the gradient equation and ignore the constraints.

This criterion is equivalent to KKT condition in simple cases.

For problems with only equality constraints $Ax=b$, the optimality criterion reduces to $\nabla f_0(x) \perp \mathcal{N}(A)$. Since the orthogonal complement of the null space is the row space $\mathcal{N}(A)^\perp = \mathcal{R}(A^\text{T} )$, the optimization problem becomes a Lagrange multiplier problem: $\nabla f_0(x) + A^\text{T} \nu = 0, Ax=b$, where $v \in \mathbb{R}^p$

For problems with feasible set being the nonnegative orthant $\mathbb{R}_+^n$, the optimality condition can be expressed as: $x \ge 0, \nabla f_0(x) \ge 0, x \circ \nabla f_0(x)=0$, where $\circ$ is the Hadamard product.

Constraint qualifications (CQ)

Constraint qualifications are conditions under which strong duality holds. A constraint qualification guarantees the KKT conditions to hold at a local minimizer.

Slater’s condition: There exists a strictly feasible point: $\exists x \in \text{ri}(D): f_i(x) < 0, i = 1,\dots,m, Ax = b$. Affine inequalities do not need to hold with strict inequality. In a sense, this is the strongest constraint qualification.

Slater’s theorem: If a convex optimization problem meets Slater’s condition, it has strong duality. When the constraints are all linear equalities and inequalities, and $\text{dom} f_0$ is open, Slater's condition holds if the problem is feasible.

The tangent cone for an optimization problem at a feasible point is the cone generated by all the feasible directions from that point: $T(x) = \{d~|~\exists {x^k} \subseteq \Omega, {t_k} \to 0+ : x^k \to x, \frac{x^k−x}{t_k} \to d \}$.

The linearizing cone for an optimization problem at a feasible point is the cone generated by all the linearized feasible directions from that point. Linearized feasible directions are the feasible directions from a point when all constraints are linearized: $L(x) = \{d~|~\langle \nabla f_i(x) 1_{\{f_i(x) = 0\}}, d\rangle \le 0, \langle \nabla h_j(x), d\rangle = 0 \}$.

Tangent cones are closed cones; linearizing cone are polyhedral cones. Tangent cone is a subset of linearizing cone: $T(x) \subseteq L(x)$.

Guignard constraint qualification (GCQ): The tangent cone and linearizing cone at a feasible point have the same dual cones: $T^∗ (x) = L^∗ (x)$. (Proved with Farka's lemma) In a sense, this is the weakest constraint qualification.

Complementary slackness (CS)

When strong duality holds, for any primal optimal point $x^*$ and any dual optimal point $(\lambda^∗ ,\nu^∗ )$, complementary slackness holds: $\lambda_i^∗ f_i( x^∗ ) = 0$, $i = 1, \dots, m$.

Karush-Kuhn-Tucker optimality conditions (KKT)

Saddle point condition: For optimization problems with strong duality and differentiable (primal) objectives and constraints, the gradient of the Lagrangian at dual optimal points vanishes at primal optimal points: $\nabla_x L( x^∗ , \lambda^∗ , \nu^∗ ) = 0$.

KKT conditions generalize the method of Lagrange multipliers to optimization problems with inequality constraints: saddle point + primal constraints + dual constraints + complementary slackness. $$\begin{aligned} \nabla f_0( x^∗ ) & + \sum_i \lambda_i^∗ \nabla f_i( x^∗ ) + \sum_i \nu_i^∗ \nabla h_i( x^∗ ) = 0 && \text{(saddle point)} \\ f_i( x^∗ ) &\le 0, \quad i = 1,\dots,m && \text{(primal constraints)} \\ \lambda_i^∗ &\ge 0, \quad i = 1,\dots,m && \text{(dual constraints)} \\ \lambda_i^∗ f_i( x^∗ ) &= 0, \quad i = 1,\dots,m && \text{(complementary slackness)} \\ h_i( x^∗ ) &= 0, \quad i = 1,\dots,p && \text{(primal constraints)} \end{aligned}$$

Primal and dual optimal points are not checked for domain conditions, which have been guaranteed to hold.

KKT theorem [@Karush1939; @Kuhn+Tucker1951]: For optimization problems with strong duality and differentiable (primal) objectives and constraints,

(Necessity) The KKT conditions must hold for any pair of primal and dual optimal points.
(Sufficiency) If the problem is convex, any points that satisfy the KKT conditions are primal and dual optimal.

Saddle point theorem: Point $( x^∗ , \lambda^∗ , \nu^∗ )$ is a pair of primal and dual optimal points with strong duality iff it satisfies saddle point criteria: primal constraints, dual constraints, complementary slackness, and $L( x^∗ , \lambda^∗ , \nu^∗ ) = \min_x L(x, \lambda^∗ , \nu^∗ )$.

If the primal is locally convex at a point, KKT conditions implies saddle point. Interior saddle point implies KKT conditions.

Theorems of alternatives

Two systems of inequalities (and equalities) are weak alternatives if at most one of the two is feasible.

Let $g(\lambda, \nu) = \inf_{x \in D} \left\{ \sum_i \lambda_i f_i(x) + \sum_i \nu_i h_i(x) \right\}$, the following two systems of inequalities and equalities are weak alternatives:

System with non-strict inequality: $$\begin{align} f_i(x) \le 0 && i = 1,\dots,m \\ h_i(x) = 0 && i = 1,\dots,p \end{align}$$
(convex system): $$\begin{align} g(\lambda, \nu) > 0 \\ \lambda \succeq 0 \end{align}$$

The following two systems of inequalities and equalities are weak alternatives:

System with strict inequality: $$\begin{align} f_i(x) < 0 && i = 1,\dots,m \\ h_i(x) = 0 && i = 1,\dots,p \end{align}$$
(convex system): $$\begin{align} g(\lambda, \nu) \ge 0 \\ \lambda \succeq 0 \\ \lambda \ne 0 \end{align}$$

Two systems of inequalities (and equalities) are strong alternatives if exactly one of the two is feasible.

Given $f_i$ is convex and $\exists x \in \text{ri}(D): Ax=b$, let $g(\lambda, \nu) = \inf_{x \in D} \left\{ \sum_i \lambda_i f_i(x) + \langle \nu,Ax-b \rangle \right\}$, the following two systems of inequalities and equalities are strong alternatives:

Convex system with strict inequality: $$\begin{align} f_i(x) < 0 && i = 1,\dots,m \\ Ax = b && A \in \mathbb{R}^{p\times n} \end{align}$$
(convex system): $$\begin{align} g(\lambda, \nu) \ge 0 \\ \lambda \succeq 0 \\ \lambda \ne 0 \end{align}$$

If additionally, optimal value $p^∗$ is attained for the related optimization problem $$\begin{align} \min \quad & s \\ \text{s.t.} \quad & f_i(x) - s \le 0 && i = 1,\dots,m \\ & A x = b && A \in \mathbb{R}^{p\times n} \end{align}$$ The following two systems of inequalities and equalities are strong alternatives:

Convex system with non-strict inequality: $$\begin{align} f_i(x) \le 0 && i = 1,\dots,m \\ Ax = b && A \in \mathbb{R}^{p\times n} \end{align}$$
(convex system): $$\begin{align} g(\lambda, \nu) > 0 \\ \lambda \succeq 0 \end{align}$$

Cutting plane / outer-linearization method

Lagrange dual problem is equivalent to an infinite LP: $$\begin{align} \max \quad & z \\ \text{s.t.} \quad & L(x, \lambda, \nu) \ge z && x \in D \\ & \lambda \succeq 0 \end{align}$$ And could be approximated with LP: $$\begin{align} \max \quad & z \\ \text{s.t.} \quad & L(x_i, \lambda, \nu) \ge z && x_i \in D, i = 1,\dots,k \\ & \lambda \succeq 0 \end{align}$$

Denote the optimal value and point of the above LP as $(z_k, \lambda_k, \nu_k)$. If $\inf_{x \in D} L(x, \lambda_k, \nu_k) < z_k$ with optimal point $x_{k+1}$, then add $L(x_{k+1}, \lambda, \nu) \ge z$ to the constraint set and repeat. Note that ${z_k}$ is a strictly decreasing sequence, and you may stop when satisfied.

🏷 Category=Optimization