Duality theorem is the duality concept in optimization first conceived by John von Neumann and proved by George Dantzig.
Dual descriptions:
The min-common / max-crossing duality is a dual pair of fundamental optimization problems. Given a nonempty set $M \in \mathbb{R}^{n+1}$, the min common point problem is to find the point in $M$ on the (n+1)-st axis that has the minimum (n+1)-st component $w^∗$. The max cross point problem is to find the non-vertical hyperplane containing $M$ in its upper ((n+1)-st component) closed half-space that has the maximum crossing point of the (n+1)-st axis $q^∗$.
When the set is the epigraph of the primal function $M = \text{epi}(p)$, then $w^∗ = p(0)$ and $q^∗ = p^{\star\star}(0)$, so $q^∗ \le w^∗$ with equality when $p$ is closed and convex.
Three general theorems (MC/MC Theorems): 1. strong duality; 1. existence of dual optimal solutions; 1. polyhedral refinements;
linear programming duality theorem; optimality conditions.
convex programming duality theorem; optimality conditions.
Fenchel duality theorem
conic duality
constrained optimality condition
For a convex function, the "slope" of every supporting hyperplane is a subgradient (subderivative, 次梯度/次导数) of it at the supporting point. The subdifferential (次微分) of a convex function at a point is the set of all its subgradients at that point.
For convex functions: [@Bertsekas2003, 4.1.1]
Convex functions are subdifferentiable.
The direction of steepest ascent of a concave function is the subgradient with the smallest Euclidean norm.
Conjugate subgradient theorem
Given an optimization problem, we don't obtain a dual problem until we specify how to perturb the optimization problem.
Perturbed problem: $$\begin{align} \min \quad & f_0(x) \\ \text{s.t.} \quad & f_i(x) \le u_i && i = 1,\dots,m \\ \end{align}$$ Denote the perturbed objective function with implicit constraints as $p(x;u,v)$ and perturbed optimal value as $p^∗ (u,v)$, then the primal (objective function) is $p(x;0,0)$ and the primal optimal is $p^∗ (0,0)$.
Define $p^∗ (u,v)$ as the optimal value of the perturbed problem:
Sensitivity interpretations of the dual optimal (given strong duality):
Denote convex conjugate as $\star$, then the biconjugate of perturbed optimal values is $(p^∗ )^{\star\star}(u,v)$.
In summary, $(p^∗ )^{\star\star}(0,0)$ is a convex program whose optimal value gives a lower bound of the primal optimal; if the primal is itself convex, the new convex program typically gives the same optimal.
Due to its good properties, we call $(p^∗ )^{\star\star}(0,0)$ the dual problem of the primal $p^∗ (0,0)$. Equivalent formulations of an optimization problem can have different perturbations, leading to different dual problems.
This is the Lagrangian narrative of the dual problem.
The Lagrangian $L : D \times \mathbb{R}^m \times \mathbb{R}^p \to \mathbb{R}$ is the objective function added by a weighted sum of the constraint functions, whose weights are called the Lagrange multiplier associated with that (in)equality constraint, aka the dual variables associated with the problem. $L(x, \lambda, \nu) = f_0(x) + \sum_i \lambda_i f_i(x) + \sum_i \nu_i h_i(x)$
The Lagrange dual function $g : \mathbb{R}^m \times \mathbb{R}^p \to R$ is the minimum value of the Lagrangian over the optimization variables, at specific values of the dual variables: $g(\lambda, \nu) = \inf_{x \in D} L(x, \lambda, \nu)$ Lagrange multipliers are in the dual space of optimization variable space.
Lagrange dual problem of a primal problem: $$\begin{align} \max \quad & g(\lambda, \nu) \\ \text{s.t.} \quad & \lambda \succeq 0 \end{align}$$
The feasible set of the dual problem: $\text{dom}(g) ∩ (\mathbb{R}_+^m \times \mathbb{R}^p)$.
Dual optimal (optimal Lagrange multipliers) is the optimal point $(\lambda^∗ , \nu^∗ )$ of the Lagrange dual problem. Optimal value of the Lagrange dual problem $d^∗ = \sup\{g(\lambda, \nu)\}$, where $d$ is for dual.
Weak duality: The optimal value of the dual (the best lower bound) is no greater than the optimal value of the primal: $d^∗ \le p^∗$. Strong duality: The optimal value of the dual (the best lower bound) equals the optimal value of the primal: $d^∗ = p^∗$. The duality gap associated with a primal feasible point $x$ and a dual feasible point $(\lambda,\nu)$ is $f_0(x) − g(\lambda,\nu)$. Sandwiching: $g(\lambda,\nu) \le d^∗ \le p^∗ \le f_0(x)$. The optimal duality gap is the gap between the optimal value of the primal problem and the greatest lower bound on it that can be obtained from the Lagrange dual function: $p^∗ - d^∗$.
Properties of the dual function:
Properties of the dual problem:
The Lagrange dual of the standard form LP is an inequality form LP.
$$\begin{align} \max \quad & −\langle b,\nu\rangle \\ \text{s.t.} \quad & \langle A,\nu\rangle + c \succeq 0 \end{align}$$
The Lagrange dual of the inequality form LP is a standard form LP.
$$\begin{align} \max \quad & −\langle b,\lambda \rangle \\ \text{s.t.} \quad & \langle A,\lambda \rangle + c = 0 \\ & \lambda \succeq 0 \end{align}$$
A point on the boundary of the feasible set is optimal, if the negative gradient of the objective function at this point is an outer-pointing normal of the feasible set. $\langle \nabla f_0(x), y−x \rangle \ge 0, \forall y \in X$
A convex optimization problem with differentiable objective function is trivial if its optimal set does not touch the boundary of its feasible set, in which case you may simply solve the gradient equation and ignore the constraints.
This criterion is equivalent to KKT condition in simple cases.
For problems with only equality constraints $Ax=b$, the optimality criterion reduces to $\nabla f_0(x) \perp \mathcal{N}(A)$. Since the orthogonal complement of the null space is the row space $\mathcal{N}(A)^\perp = \mathcal{R}(A^\text{T} )$, the optimization problem becomes a Lagrange multiplier problem: $\nabla f_0(x) + A^\text{T} \nu = 0, Ax=b$, where $v \in \mathbb{R}^p$
For problems with feasible set being the nonnegative orthant $\mathbb{R}_+^n$, the optimality condition can be expressed as: $x \ge 0, \nabla f_0(x) \ge 0, x \circ \nabla f_0(x)=0$, where $\circ$ is the Hadamard product.
Constraint qualifications are conditions under which strong duality holds. A constraint qualification guarantees the KKT conditions to hold at a local minimizer.
Slater’s condition: There exists a strictly feasible point: $\exists x \in \text{ri}(D): f_i(x) < 0, i = 1,\dots,m, Ax = b$. Affine inequalities do not need to hold with strict inequality. In a sense, this is the strongest constraint qualification.
Slater’s theorem: If a convex optimization problem meets Slater’s condition, it has strong duality. When the constraints are all linear equalities and inequalities, and $\text{dom} f_0$ is open, Slater's condition holds if the problem is feasible.
The tangent cone for an optimization problem at a feasible point is the cone generated by all the feasible directions from that point: $T(x) = \{d~|~\exists {x^k} \subseteq \Omega, {t_k} \to 0+ : x^k \to x, \frac{x^k−x}{t_k} \to d \}$.
The linearizing cone for an optimization problem at a feasible point is the cone generated by all the linearized feasible directions from that point. Linearized feasible directions are the feasible directions from a point when all constraints are linearized: $L(x) = \{d~|~\langle \nabla f_i(x) 1_{\{f_i(x) = 0\}}, d\rangle \le 0, \langle \nabla h_j(x), d\rangle = 0 \}$.
Tangent cones are closed cones; linearizing cone are polyhedral cones. Tangent cone is a subset of linearizing cone: $T(x) \subseteq L(x)$.
Guignard constraint qualification (GCQ): The tangent cone and linearizing cone at a feasible point have the same dual cones: $T^∗ (x) = L^∗ (x)$. (Proved with Farka's lemma) In a sense, this is the weakest constraint qualification.
When strong duality holds, for any primal optimal point $x^*$ and any dual optimal point $(\lambda^∗ ,\nu^∗ )$, complementary slackness holds: $\lambda_i^∗ f_i( x^∗ ) = 0$, $i = 1, \dots, m$.
Saddle point condition: For optimization problems with strong duality and differentiable (primal) objectives and constraints, the gradient of the Lagrangian at dual optimal points vanishes at primal optimal points: $\nabla_x L( x^∗ , \lambda^∗ , \nu^∗ ) = 0$.
KKT conditions generalize the method of Lagrange multipliers to optimization problems with inequality constraints: saddle point + primal constraints + dual constraints + complementary slackness. $$\begin{aligned} \nabla f_0( x^∗ ) & + \sum_i \lambda_i^∗ \nabla f_i( x^∗ ) + \sum_i \nu_i^∗ \nabla h_i( x^∗ ) = 0 && \text{(saddle point)} \\ f_i( x^∗ ) &\le 0, \quad i = 1,\dots,m && \text{(primal constraints)} \\ \lambda_i^∗ &\ge 0, \quad i = 1,\dots,m && \text{(dual constraints)} \\ \lambda_i^∗ f_i( x^∗ ) &= 0, \quad i = 1,\dots,m && \text{(complementary slackness)} \\ h_i( x^∗ ) &= 0, \quad i = 1,\dots,p && \text{(primal constraints)} \end{aligned}$$
Primal and dual optimal points are not checked for domain conditions, which have been guaranteed to hold.
KKT theorem [@Karush1939; @Kuhn+Tucker1951]: For optimization problems with strong duality and differentiable (primal) objectives and constraints,
Saddle point theorem: Point $( x^∗ , \lambda^∗ , \nu^∗ )$ is a pair of primal and dual optimal points with strong duality iff it satisfies saddle point criteria: primal constraints, dual constraints, complementary slackness, and $L( x^∗ , \lambda^∗ , \nu^∗ ) = \min_x L(x, \lambda^∗ , \nu^∗ )$.
If the primal is locally convex at a point, KKT conditions implies saddle point. Interior saddle point implies KKT conditions.
Two systems of inequalities (and equalities) are weak alternatives if at most one of the two is feasible.
Let $g(\lambda, \nu) = \inf_{x \in D} \left\{ \sum_i \lambda_i f_i(x) + \sum_i \nu_i h_i(x) \right\}$, the following two systems of inequalities and equalities are weak alternatives:
The following two systems of inequalities and equalities are weak alternatives:
Two systems of inequalities (and equalities) are strong alternatives if exactly one of the two is feasible.
Given $f_i$ is convex and $\exists x \in \text{ri}(D): Ax=b$, let $g(\lambda, \nu) = \inf_{x \in D} \left\{ \sum_i \lambda_i f_i(x) + \langle \nu,Ax-b \rangle \right\}$, the following two systems of inequalities and equalities are strong alternatives:
If additionally, optimal value $p^∗$ is attained for the related optimization problem $$\begin{align} \min \quad & s \\ \text{s.t.} \quad & f_i(x) - s \le 0 && i = 1,\dots,m \\ & A x = b && A \in \mathbb{R}^{p\times n} \end{align}$$ The following two systems of inequalities and equalities are strong alternatives:
Lagrange dual problem is equivalent to an infinite LP: $$\begin{align} \max \quad & z \\ \text{s.t.} \quad & L(x, \lambda, \nu) \ge z && x \in D \\ & \lambda \succeq 0 \end{align}$$ And could be approximated with LP: $$\begin{align} \max \quad & z \\ \text{s.t.} \quad & L(x_i, \lambda, \nu) \ge z && x_i \in D, i = 1,\dots,k \\ & \lambda \succeq 0 \end{align}$$
Denote the optimal value and point of the above LP as $(z_k, \lambda_k, \nu_k)$. If $\inf_{x \in D} L(x, \lambda_k, \nu_k) < z_k$ with optimal point $x_{k+1}$, then add $L(x_{k+1}, \lambda, \nu) \ge z$ to the constraint set and repeat. Note that ${z_k}$ is a strictly decreasing sequence, and you may stop when satisfied.