Duality theorem is the duality concept in optimization first conceived by John von Neumann and proved by George Dantzig.

Dual descriptions:

- Sets: union of points - intersection of halfspaces; (convex set)
- Functions: function - conjugate; (convex function)
- Optimization: min-common - max-crossing;

The **min-common / max-crossing duality** is a dual pair of fundamental optimization problems.
Given a nonempty set $M \in \mathbb{R}^{n+1}$, the **min common point problem** is to find the point in $M$ on the $(n+1)$-st axis that has the minimum $(n+1)$-st component $w^*$.
The **max cross point problem** is to find the non-vertical hyperplane containing $M$ in its upper ($(n+1)$-st component) closed half-space that has the maximum crossing point of the $(n+1)$-st axis $q^*$.

When the set is the epigraph of the primal function $M = \text{epi}(p)$, then $w^* = p(0)$ and $q^* = p^{\star\star}(0)$, so $q^* \le w^*$ with equality when $p$ is closed and convex.

Three general theorems (MC/MC Theorems): 1. strong duality; 1. existence of dual optimal solutions; 1. polyhedral refinements;

linear programming duality theorem; optimality conditions.

convex programming duality theorem; optimality conditions.

Fenchel duality theorem

conic duality

constrained optimality condition

For a convex function, the "slope" of every supporting hyperplane is a **subgradient** (subderivative, 次梯度/次导数) of it at the supporting point.
The **subdifferential** (次微分) of a convex function at a point is the set of all its subgradients at that point.

For convex functions: {Bertsekas2003, 4.1.1}

- Finite directional derivatives exist;
- Left derivatives are no greater than right derivatives, along any direction;
- Right derivatives are no greater than left derivatives at a point on the right;
- Left derivatives and right derivatives are non-decreasing;
- Right derivative is right continuous, left derivative is left continuous;

Convex functions are subdifferentiable.

The direction of steepest ascent of a concave function is the subgradient with the smallest Euclidean norm.

Conjugate subgradient theorem

Given an optimization problem, we don't obtain a dual problem until we specify how to perturb the optimization problem.

Perturbed problem: $$\begin{align} \min \quad & f_0(x) \\ \text{s.t.} \quad & f_i(x) \le u_i && i = 1,\dots,m \\ \end{align}$$ Denote the perturbed objective function with implicit constraints as $p(x;u,v)$ and perturbed optimal value as $p^* (u,v)$, then the primal (objective function) is $p(x;0,0)$ and the primal optimal is $p^* (0,0)$.

Define $p^* (u,v)$ as the optimal value of the perturbed problem:

- When the original problem is convex, the function $p^* (u,v)$ is convex.
- If strong duality holds for the original problem, then $(\lambda^* ,\nu^* )$ is a subgradient of the perturbed optimal values at the primal optimal. $p^* (u, v) \ge p^* (0, 0) - \langle (\lambda^* , \nu^* ), (u, v) \rangle$

Sensitivity interpretations of the dual optimal (given strong duality):

- If $\lambda^*$ is large, tightening the inequality constraints ($u < 0$) is guaranteed to increase the optimal value $p^* (u,v)$ greatly.
- If $\lambda^*$ is small, loosening the inequality constraints ($u > 0$) will not decrease the optimal value $p^* (u,v)$ greatly.
- If change in equality constraints gives $\langle \nu^* , v \rangle > 0$ the optimal value $p^* (u,v)$ is guaranteed to increase, where the absolute value of $\nu^*$ reflects the sensitivity of increment.
- If the optimal value $p^* (u, v)$ is differentiable at the origin, its local sensitivity (gradient) is the negative dual optimal $-(\lambda^* ,\nu^* )$.

Denote convex conjugate as $\star$, then the biconjugate of perturbed optimal values is $(p^* )^{\star\star}(u,v)$.

- Since biconjugate is the closed convex hull, $(p^* )^{\star\star}(0,0) \le p^* (0,0)$.
- By Fenchel–Moreau theorem, if the primal $p(x;0,0)$ is convex and closed, $(p^* )^{\star\star}(0,0) = p^* (0,0)$.
- Since convex conjugates are convex functions, $(p^* )^{\star\star}(0,0) = \max_{u^\star,v^\star} -(p^* )^\star (u^\star,v^\star)$ is a convex program.

In summary, $(p^* )^{\star\star}(0,0)$ is a convex program whose optimal value gives a lower bound of the primal optimal; if the primal is itself convex, the new convex program typically gives the same optimal.

Due to its good properties, we call $(p^* )^{\star\star}(0,0)$ the **dual problem** of the primal $p^* (0,0)$.
Equivalent formulations of an optimization problem can have different perturbations, leading to different dual problems.

This is the Lagrangian narrative of the dual problem.

The **Lagrangian** $L : D \times \mathbb{R}^m \times \mathbb{R}^p \to \mathbb{R}$ is the objective function added by a weighted sum of the constraint functions, whose weights are called the Lagrange multiplier associated with that inequality/equality constraint, aka the dual variables associated with the problem.
$L(x, \lambda, \nu) = f_0(x) + \sum_i \lambda_i f_i(x) + \sum_i \nu_i h_i(x)$

The **Lagrange dual function** $g : \mathbb{R}^m \times \mathbb{R}^p \to R$ is the minimum value of the Lagrangian over the optimization variables, at specific values of the dual variables: $g(\lambda, \nu) = \inf_{x \in D} L(x, \lambda, \nu)$
Lagrange multipliers are in the dual space of optimization variable space.

**Lagrange dual problem** of a primal problem:
$$\begin{align}
\max \quad & g(\lambda, \nu) \\
\text{s.t.} \quad & \lambda \succeq 0
\end{align}$$

The feasible set of the dual problem: $\text{dom}(g) ∩ (\mathbb{R}_+^m \times \mathbb{R}^p)$.

**Dual optimal** (optimal Lagrange multipliers) is the optimal point $(\lambda^* , \nu^* )$ of the Lagrange dual problem.
**Optimal value** of the Lagrange dual problem $d^* = \sup\{g(\lambda, \nu)\}$, where $d$ is for dual.

**Weak duality**: The optimal value of the dual (the best lower bound) is no greater than the optimal value of the primal: $d^* \le p^*$.
**Strong duality**: The optimal value of the dual (the best lower bound) equals the optimal value of the primal: $d^* = p^ $.
The *duality gap** associated with a primal feasible point $x$ and a dual feasible point $(\lambda,\nu)$ is $f_0(x) − g(\lambda,\nu)$.
Sandwiching: $g(\lambda,\nu) \le d^* \le p^* \le f_0(x)$.
The

Properties of the dual function:

- The dual function is (always) concave, because it is the point-wise infimum of a family of affine functions (which are concave).
- The dual function is no greater than the optimal value: $\forall \lambda \succeq 0, \nu: g(\lambda, \nu) \le p^*$.
- For all $x \in g^{-1}(\lambda, \nu)$, $(f_i, h_i)(x)$ is a subgradient of the dual function at $(\lambda, \nu)$.
- The dual function $g(\lambda, \nu)$ is differentiable at $(\lambda, \nu)$ if primal domain $D$ is compact, $f_i$ are continuous, and $g^{-1}(\lambda, \nu)$ is a singleton $\{x\}$; additionally $\nabla g(\lambda, \nu) = (f_i, h_i)(x)$.
- The direction of steepest ascent of a dual function is the subgradient with the smallest Euclidean norm.

Properties of the dual problem:

- The Lagrange dual problem is (always) a convex optimization problem, since the objective to be maximized is concave and the constraint is convex.
- The Lagrange dual problem can be used to find a lower bound on the optimal value of a problem that is difficult to solve, since the dual problem is always convex, which in many cases can be solved efficiently.
- If the primal problem (5.1) is convex, strong duality usually (but not always) holds.

The Lagrange dual of the standard form LP is an inequality form LP.

$$\begin{align} \max \quad & −\langle b,\nu\rangle \\ \text{s.t.} \quad & \langle A,\nu\rangle + c \succeq 0 \end{align}$$

The Lagrange dual of the inequality form LP is a standard form LP.

$$\begin{align} \max \quad & −\langle b,\lambda \rangle \\ \text{s.t.} \quad & \langle A,\lambda \rangle + c = 0 \\ & \lambda \succeq 0 \end{align}$$

A point on the boundary of the feasible set is optimal, if the negative gradient of the objective function at this point is an outer-pointing normal of the feasible set. $\langle \nabla f_0(x), y−x \rangle \ge 0, \forall y \in X$

A convex optimization problem with differentiable objective function is trivial if its optimal set does not touch the boundary of its feasible set, in which case you may simply solve the gradient equation and ignore the constraints.

This criterion is equivalent to KKT condition in simple cases.

For problems with only equality constraints $Ax=b$, the optimality criterion reduces to $\nabla f_0(x) \perp \mathcal{N}(A)$. Since the orthogonal complement of the null space is the row space $\mathcal{N}(A)^\perp = \mathcal{R}(A^\text{T} )$, the optimization problem becomes a Lagrange multiplier problem: $\nabla f_0(x) + A^\text{T} \nu = 0, Ax=b$, where $v \in \mathbb{R}^p$

For problems with feasible set being the nonnegative orthant $\mathbb{R}_+^n$, the optimality condition can be expressed as: $x \ge 0, \nabla f_0(x) \ge 0, x \circ \nabla f_0(x)=0$, where $\circ$ is the Hadamard product.

Constraint qualifications are conditions under which strong duality holds. A constraint qualification guarantees the KKT conditions to hold at a local minimizer.

**Slater’s condition**: There exists a strictly feasible point: $\exists x \in \text{ri}(D): f_i(x) < 0, i = 1,\dots,m, Ax = b$.
Affine inequalities do not need to hold with strict inequality.
In a sense, this is the strongest constraint qualification.

**Slater’s theorem**: If a convex optimization problem meets Slater’s condition, it has strong duality.
When the constraints are all linear equalities and inequalities, and $\text{dom} f_0$ is open, Slater's condition holds if the problem is feasible.

The **tangent cone** for an optimization problem at a feasible point is the cone generated by all the feasible directions from that point:
$T(x) = \{d~|~\exists {x^k} \subseteq \Omega, {t_k} \to 0+ : x^k \to x, \frac{x^k−x}{t_k} \to d \}$.

The **linearizing cone** for an optimization problem at a feasible point is the cone generated by all the linearized feasible directions from that point.
Linearized feasible directions are the feasible directions from a point when all constraints are linearized:
$L(x) = \{d~|~\langle \nabla f_i(x) \mathbf{1}(f_i(x)=0), d\rangle \le 0, \langle \nabla h_j(x), d\rangle = 0 \}$.

Tangent cones are closed cones; linearizing cone are polyhedral cones. Tangent cone is a subset of linearizing cone: $T(x) \subseteq L(x)$.

**Guignard constraint qualification** (GCQ): The tangent cone and linearizing cone at a feasible point have the same dual cones: $T^* (x) = L^* (x)$. (Proved with Farka's lemma)
In a sense, this is the weakest constraint qualification.

When strong duality holds,
for any primal optimal point $x^*$ and any dual optimal point $(\lambda^* ,\nu^* )$,
**complementary slackness** holds: $\lambda_i^* f_i( x^* ) = 0$, $i = 1, \dots, m$.

**Saddle point condition**: For optimization problems with strong duality and differentiable (primal) objectives and constraints, the gradient of the Lagrangian at dual optimal points vanishes at primal optimal points: $\nabla_x L( x^* , \lambda^* , \nu^* ) = 0$.

**KKT conditions** generalize the method of Lagrange multipliers to optimization problems with inequality constraints:
saddle point + primal constraints + dual constraints + complementary slackness.
$$\begin{aligned}
\nabla f_0( x^* ) & + \sum_i \lambda_i^* \nabla f_i( x^* ) + \sum_i \nu_i^* \nabla h_i( x^* ) = 0 && \text{(saddle point)} \\
f_i( x^* ) &\le 0, \quad i = 1,\dots,m && \text{(primal constraints)} \\
\lambda_i^* &\ge 0, \quad i = 1,\dots,m && \text{(dual constraints)} \\
\lambda_i^* f_i( x^* ) &= 0, \quad i = 1,\dots,m && \text{(complementary slackness)} \\
h_i( x^* ) &= 0, \quad i = 1,\dots,p && \text{(primal constraints)}
\end{aligned}$$

Primal and dual optimal points are not checked for domain conditions, which have been guaranteed to hold.

**KKT theorem** {Karush1939, Kuhn+Tucker1951}:
For optimization problems with strong duality and differentiable (primal) objectives and constraints,

- (Necessity) The KKT conditions must hold for any pair of primal and dual optimal points.
- (Sufficiency) If the problem is convex, any points that satisfy the KKT conditions are primal and dual optimal.

**Saddle point theorem**: Point $( x^* , \lambda^* , \nu^* )$ is a pair of primal and dual optimal points with strong duality iff it satisfies **saddle point criteria**: primal constraints, dual constraints, complementary slackness, and $L( x^* , \lambda^* , \nu^* ) = \min_x L(x, \lambda^* , \nu^* )$.

If the primal is locally convex at a point, KKT conditions implies saddle point. Interior saddle point implies KKT conditions.

Two systems of inequalities (and equalities) are **weak alternatives** if at most one of the two is feasible.

Let $g(\lambda, \nu) = \inf_{x \in D} \left\{ \sum_i \lambda_i f_i(x) + \sum_i \nu_i h_i(x) \right\}$, the following two systems of inequalities and equalities are weak alternatives:

- System with non-strict inequality $$\begin{align} f_i(x) \le 0 && i = 1,\dots,m \\ h_i(x) = 0 && i = 1,\dots,p \end{align}$$
- (convex system) $$\begin{align} g(\lambda, \nu) > 0 \\ \lambda \succeq 0 \end{align}$$

The following two systems of inequalities and equalities are weak alternatives:

- System with strict inequality $$\begin{align} f_i(x) < 0 && i = 1,\dots,m \\ h_i(x) = 0 && i = 1,\dots,p \end{align}$$
- (convex system) $$\begin{align} g(\lambda, \nu) \ge 0 \\ \lambda \succeq 0 \\ \lambda \ne 0 \end{align}$$

Two systems of inequalities (and equalities) are **strong alternatives** if exactly one of the two is feasible.

Given $f_i$ is convex and $\exists x \in \text{ri}(D): Ax=b$, let $g(\lambda, \nu) = \inf_{x \in D} \left\{ \sum_i \lambda_i f_i(x) + \langle \nu,Ax-b \rangle \right\}$, the following two systems of inequalities and equalities are strong alternatives:

- Convex system with strict inequality $$\begin{align} f_i(x) < 0 && i = 1,\dots,m \\ Ax = b && A \in \mathbb{R}^{p\times n} \end{align}$$
- (convex system) $$\begin{align} g(\lambda, \nu) \ge 0 \\ \lambda \succeq 0 \\ \lambda \ne 0 \end{align}$$

If additionally, optimal value $p^*$ is attained for the related optimization problem $$\begin{align} \min \quad & s \\ \text{s.t.} \quad & f_i(x) - s \le 0 && i = 1,\dots,m \\ & A x = b && A \in \mathbb{R}^{p\times n} \end{align}$$ The following two systems of inequalities and equalities are strong alternatives:

- Convex system with non-strict inequality $$\begin{align} f_i(x) \le 0 && i = 1,\dots,m \\ Ax = b && A \in \mathbb{R}^{p\times n} \end{align}$$
- (convex system) $$\begin{align} g(\lambda, \nu) > 0 \\ \lambda \succeq 0 \end{align}$$

Lagrange dual problem is equivalent to an infinite LP: $$\begin{align} \max \quad & z \\ \text{s.t.} \quad & L(x, \lambda, \nu) \ge z && x \in D \\ & \lambda \succeq 0 \end{align}$$ And could be approximated with LP: $$\begin{align} \max \quad & z \\ \text{s.t.} \quad & L(x_i, \lambda, \nu) \ge z && x_i \in D, i = 1,\dots,k \\ & \lambda \succeq 0 \end{align}$$

Denote the optimal value and point of the above LP as $(z_k, \lambda_k, \nu_k)$. If $\inf_{x \in D} L(x, \lambda_k, \nu_k) < z_k$ with optimal point $x_{k+1}$, then add $L(x_{k+1}, \lambda, \nu) \ge z$ to the constraint set and repeat. Note that ${z_k}$ is a strictly decreasing sequence, and you may stop when satisfied.