Convexity

Definitions

Sets

The closure (闭包) of a set \(\text{cl}(C)\) is the collection of its limit points. The projection of a vector on a closed set is the set of vectors in the set closest to the vector: \( P_C(x) = \arg\min_{y \in C} | y - x | \). The distance of a vector to a set is the smallest distance from it to points in the set: \( d_C(x) = \min_{y \in C} | y - x | \).

A set in a linear space is convex, if any line segment between two vectors in the set is also in the set. {Bertsekas2003, 1.2.1} Dual definition of closed convex set: A set is closed and convex, if it is the intersection of all closed half-spaces containing the set. An extreme point of a (nonempty) convex set is a vector not between any two other vectors on the set. An extreme direction of a (nonempty) convex set is the direction of one of its extreme points. Convex hull (凸包) \(\text{conv}(C)\) is the smallest (intersection of all) convex set containing a set. Convex combination is a nonnegative weighted average of vectors. In comparison, nonnegative combination is a linear combination with nonnegative coefficients. Convexity is closely related to expectation: If \( P(X \in C) = 1 \) and \( C \subseteq \mathbb{R}^n \) is convex, then \( \mathbf{E}X \in C \).

Affine set is a shifted subspace. Affine hull (仿射包) \( \text{aff}(X) \) is the smallest (intersection of all) affine set containing a set. The affine dimension of a set is the dimension of its affine hull. A set of vectors are affinely independent if when picking one as the origin, the rest are linearly independent. The number of affinely independent vectors in a set cannot be more than one plus the affine dimension of these vectors, if finite. The relative interior of a set \( \text{ri}(C) \) is the intersection of its interior and its affine hull.

Generalized inequality

A set is a cone, if it is scale invariant. A cone is pointed if it contains no line. A cone is solid if it has nonempty interior. A proper cone (真锥) is a pointed, closed solid, convex cone. Generated cone \(\text{cone}(C)\) is the cone generated by a set is the nonnegative span of it. A cone is finitely generated, if it is generated by a finite set of vectors. A vector is a direction of recession of a convex set, if arbitrary shifts of the set along this vector are within the set. The recession cone of a set \(R_C\) is the cone generated by its recession directions. The lineality space of a set is the set of directions of recession whose oppose are also direction of recession: \( L_C = R_C \cap -R_C \).

A generalized inequality is the partial ordering on \(\mathbb{R}^n\) associated with a proper cone \(K\). Vector inequality is a special case of generalized inequality where the associated cone is the nonnegative cone.

\[\begin{align} x \preceq_K y, y \succeq_K x &\Leftrightarrow y−x \in K \ x \prec_K y, y \succ_K x &\Leftrightarrow y−x \in \text{int}(K) \end{align}\]

An element of a multidimensional set is the minimum element (最小元) w.r.t. a generalized inequality \(\preceq_K\), if it is no greater than any other element in the set in such partial ordering: \(S \subseteq \{x\} + K\). Similarly, An element of a multidimensional set is the maximum element w.r.t. a generalized inequality \(\succeq_K\), if it is no less than any other element in the set in such partial ordering: \(S \subseteq \{x\} - K\).

An element of a multidimensional set is a minimal element (极小元) w.r.t. a generalized inequality \(\preceq_K\), if it is no less than any element in the set in such partial ordering: \((\{x\} - K) \cap S = \emptyset\). Similarly, An element of a multidimensional set is a maximal element w.r.t. a generalized inequality \(\succeq_K\), if it is no greater than any element in the set in such partial ordering: \((\{x\} + K) \cap S = \emptyset\).

The polar cone of a set is its shade, which is a cone: \( C^* = \{ x | \langle x, y \rangle \le 0, \forall y \in C \} \). Dual cone of a set is the negative polar cone: \( \hat{C} = -C^* \). A set is polyhedral (多面的) if it is the intersection of finitely many closed half-spaces: \({x | A x \le b}\). A cone is polyhedral if it is the polar cone of some finite set.

The dual generalized inequality associated with a proper cone is the generalized inequality associated with its dual cone. This concept can be used for a dual characterization of the minimum/minimal elements. An element of a multidimensional set is the minimum element w.r.t. a generalized inequality \(\preceq_K\), if for all elements \(\lambda\) in the interior of the dual cone \(K^\) it is the unique minimizer for \(\langle \lambda, \cdot \rangle\) in the set. An element of a multidimensional set is a *minimal element** w.r.t. a generalized inequality \(\succeq_K\), if it minimizes \(\langle \lambda, \cdot \rangle\) over the set for an element \(\lambda\) in the interior of the dual cone.

Functions

An extended real-valued function is a function whose range is the extended real line. The extended-value extension \(\tilde{f}: \mathbb{R}^n → \mathbb{R} ∪ \{+\infty\}\) of a convex function \(f : C \to \mathbb{R}\) is the function defined in its containing (Euclidean) space that takes value \(+\infty\) outside its original domain.

The graph of a function is the collection of point-value pairs: \( \{ (x, f(x)) | x \in \text{dom}f \} \). The epigraph (supergraph) of a function is the collection of points on and above its graph: \( \text{epi}f = \{ (x, y) | x \in \text{dom}f, y \ge f(x) \} \).

Consistent definitions of convex functions:

(Jensen's inequality) A real function with a convex domain is convex, if every line segment of the function is dominated by its linear interpolation: \(\forall x_1, x_2 \in \text{dom}f, \forall \lambda \in [0,1], f(\boldsymbol\lambda \cdot \mathbf{x}) \le \boldsymbol\lambda \cdot f(\mathbf{x})\), where \(\boldsymbol\lambda = (\lambda, 1 - \lambda)\), \(\mathbf{x} = (x_1, x_2)\). {Bertsekas2003, 1.2.2}
(General Jensen's inequality) Given a convex set \(C\), function \(f: C \to \mathbb{R}\) is convex if for any probability distribution on the domain, function value at the expectation is dominated by the expected function value: \( \forall X, \text{supp}(X) \subseteq C: f(\mathbf{E}X) \le \mathbf{E}f(X) \).
(Extension) A real function is convex, if every line segment of its extended-value extension is dominated by its linear interpolation.
(Epigraph) An extended real function with a convex domain is convex, if its epigraph is convex. {Bertsekas2003, 1.2.4}

A function is concave if its negative is a convex function. While concave set is simply any non-convex set, concave functions exactly mirror convex functions.

A level curve is the feasible set of an equality constraint. A level set is the feasible set of an inequality constraint. A sub-level set is the feasible set of a less-than-or-equal-to constraint. A function is quasi-convex if all its sub-level sets (and thus its domain) are convex. A function is quasi-concave if its negative is quasi-convex. A function is unimodal if it is quasi-convex or quasi-concave. A function is quasilinear if it is both quasi-convex and quasi-concave.

A positive function is log-concave (log-concave) if its logarithm is concave (convex).

Given a proper cone \(K\) in its domain space, a real-valued function is \(K\)-nondecreasing if it is dominated by values in its \(K\)-cone: \( \forall x, y, x \preceq_K y : f(x) \le f(y) \). The function is \(K\)-increasing if it is strictly dominated by values in its K-cone other than the vertex: \( \forall x \ne y, x \preceq_K y : f(x) < f(y) \). \(K\)-nonincreasing and \(K\)-decreasing are similarly defined. As a special case, a real-valued function on the space of symmetric matrices is matrix increasing (decreasing) if it is increasing (decreasing) with respect to the positive semidefinite cone.

Given a proper cone \(K\) in its range space, a vector-valued function is \(K\)-convex if every line segment of the function is \(K\)-dominated by its linear interpolation: \( \mathbf{f}(\boldsymbol\lambda \cdot \mathbf{x}) \preceq_K \boldsymbol\lambda \cdot \mathbf{f}(\mathbf{x}) \). The function is strictly \(K\)-convex if the inequality is strict except on the endpoints. As a special case, a symmetric matrix valued function is matrix convex if it is convex with respect to the positive semidefinite cone.

Given the dual pair of vector spaces \(X, Y\) w.r.t a bilinear form \( \langle \cdot, \cdot \rangle \), functions \( f: X \to \mathbb{R} \) and \( f^\star: Y \to \mathbb{R} \) are conjugate if their sum is supported from below by the bilinear form pointwise in the dual space: \( f(x) + f^\star(y) \ge \langle y,x \rangle \) (Young's inequality), and \( \forall y \in Y, \exists x \in X: f(x) + f^\star(y) = \langle y,x \rangle \). Equivalently, \( f^\star(y) = \sup_{x \in X} \{ \langle y,x \rangle - f(x) \} \), also known as Fenchel transformation {Fenchel1949}. If a function is smooth, strictly convex, and increases at infinity faster than a linear function, its conjugate is equivalent to its Legendre transformation {Legendre1789}: \( f^\star(y) = \langle x^\star, \nabla f(x^\star) \rangle − f(x^\star) \), where \( y = \nabla f(x^\star) \).

Examples

Common convex sets:

Linear family
- Hyperplane: \({x | \langle a,x \rangle = b}\);
- Half-space: \({x | \langle a,x \rangle \le b}\);
- Polyhedral set \({x | A x \le b}\): polytope (多胞形), a bounded polyhedral set (a polyhedron is a three-dimensional polytope); simplex (单纯形), \(\text{conv}(v_0, \dots, v_k), k \in \mathbb{N}\);
Norm family
- Ellipsoids: \(x_c + A B_1(0) \), where \(x_c\) is center of ellipsoid, \(A\) is a linear transformation, \(B_1(0)\) is the unit ball; an alternative definition is \( \{x+x_c | \langle x, P^{−1} x\rangle \le 1\} \) where \(P\) is a positive semidefinite matrix (generalized norm);
- Norm ball: \(\{x | | x-x_c| ≤ r \}\), where \(| \cdot |\) is any vector norm.
- Norm cone: \(\{(x,t)~|~| x | ≤ t \}\), where \(| \cdot |\) is any vector norm.
Positive semidefinite cone: \(S_+^n = \{X \in S^n | X \succeq 0 \}\), where \(S^n\) is the space of n-by-n symmetric matrices.

Common convex functions:

Power family
- Power \( x^a \) on \( \mathbb{R}_+ \) if \( a \in (-\infty, 0] \cup [1, +\infty) \);
- Power \( x^a \) is concave on \( \mathbb{R}_+ \) for \( a \in [0, 1] \);
- Geometric mean \( \prod_i x_i^{\frac{1}{n}} \) is concave on \( \mathbb{R}_+^n \);
- Quadratic-over-linear \(x^2/y\) on \(\mathbb{R} \times \mathbb{R}_+\);
Exponential family
- Exponential \(e^x\);
- Logarithm \(\log(x)\) is concave on \( \mathbb{R}_+ \);
- Log-sum-exp \( \log(\sum_i e^x_i) \);
Entropy family
- Negative entropy \( x \log(x) \) on \( \mathbb{R}_+ \);
- Entropy \( \sum_i -P_i \log P_i \);
- Relative entropy (Kullback–Leibler divergence) \( \sum_i P_i \log(P_i/Q_i) \);
Norm family
- Any (scalar/vector/matrix/function/operator) norm \( | x | \);
- Maximum (infinity norm) \( \max_i x_i \);
Determinant \( \det(X) \) is concave on \(S_+^n\);
Convex conjugate of any function \( f^\star \);

Common non-trivial log-concave and log-convex functions:

Probability family
- Log-concave density functions:
  - Gaussian PDFs: \(\phi(K^{-\frac{1}{2}}(x-\mu))\);
  - Exponential PDFs: \(\prod_i λ_i e^{-λ_i x}\);
  - Uniform PDFs over a convex set: \(\frac{\mathbf{1}(C)}{m(C)}\);
  - Wishart distribution;
- CDFs of log-concave density functions are also log-concave;
- Moment generating functions \(M(z)\) are log-convex, which means cumulant generating functions \(\log M(z)\) are convex;
Determinant over trace \( \det(X) / \text{tr}(X) \) is log-concave on \(S_++^{n}\).
Integration family
- Gamma function \(\Gamma(x)\) is log-convex on \(x \ge 1\);
- Laplace transform \(\mathcal{L}p\) of a nonnegative function is log-convex;

Common matrix monotone functions: (Real-valued functions over the symmetric matrix space.)

Linear function \( \text{tr}(WX) \) is matrix nondecreasing if \( W \succeq 0 \), and is matrix increasing if \( W \succ 0 \).
Trace of inverse \( \text{tr}(X^{−1}) \) is matrix decreasing on the positive definite cone.
Determinant \( \det(X) \) is matrix increasing on the positive definite cone, and matrix nondecreasing on the positive semi-definite cone.

Common matrix convex/concave functions:

\( XX^\text{T} \), here \( f: \mathbb{R}^{n \times m} \to \mathbb{R}^{n \times n} \).
Power \( X^p \) is matrix convex if \(p \in [−1,0] \cup [1,2]\); it is matrix concave if \( p \in [0,1] \).

Propositions

Sets

Set relations: \( C \subseteq \text{cl}(C) \subseteq \text{conv}(C) \subseteq \text{cone}(C) \)

Operations that preserves set convexity: {Bertsekas2003, 1.2.1}

Unary operations
- Closure and relative interior;
- Affine transformation: scaling, translation, projection, etc;
- Pre-image (inverse image) of affine transformation;
- Projective transformation (linear-fractional): \(f(x) = \frac{Ax+b}{\langle c,x \rangle + d}\) where \(\langle c,x \rangle + d > 0\);
  - Perspective transformation: \(P(z,t) = z/t\) where \(t>0, z \in \mathbb{R}^n\);
Binary operations
- Minkowski sum;
- Cartesian product;
Arbitrary operations
- Intersection;

Properties of convex set: {Bertsekas2003, 1.4.1}

Any line segment between two vectors in the closure of a (nonempty) convex set is also in the relative interior of the set except the ends.
(Nonempty) convex set has (nonempty) relative interior.

If a set is closed with nonempty interior and has a supporting hyperplane at every point in its boundary, it is convex.

The dual cone of a proper cone is also proper.

Generated cone is convex.

Recession cone theorem {Bertsekas2003, 1.5.1}: For a (nonempty) closed convex set,

Its recession cone is closed;
A direction of recession of one vector is also a direction of recession of the set;
Its recession cone is nonzero iff it is unbounded;
Its recession cone is also its relative interior's recession cone;
For any collection of (nonempty) closed convex sets with nonempty intersection, recession cone and intersection are interchangeable.

Caratheodory's theorem {Bertsekas2003, 1.3.1}: (minimal representation of generated cone and convex hull)

Every vector in the cone generated by a set can be represented as a positive combination of a linearly independent subset of the set;
Every vector in the convex hull of a set can be represented as a convex combination of an affinely independent subset of the set;

Projection theorem {Bertsekas2003, 2.2.1}: For a (nonempty) closed convex set \(C\),

The projection of an arbitrary vector \(x\) on \(C\) exists and is unique, where the supporting hyperplane separates \(C\) and \(x\) and is orthogonal to the connecting vector to \(x\);
Projection on the set \(P_C\) is a continuous and non-expansive map (weak contractor);
Distance to \(C\) is a convex function;

Supporting hyperplane theorem: Any non-interior point of a (nonempty) convex set has some hyperplane that contains the set in one of its closed half-spaces. {Bertsekas2003, 2.4.1}

Separating hyperplane theorem: Any two disjoint (nonempty) convex sets can be separated by some hyperplane. {Bertsekas2003, 2.4.2}

Strict separation theorem {Bertsekas2003, 2.4.3}: Two disjoint (nonempty) convex sets \( C_1, C_2 \) can be strictly separated by some hyperplane if any of the following holds,

Their Minkowski difference \( C_1 - C_2 \) is closed;
One closed and one is compact;
Both are closed, and the intersection of their recession cones is just the intersection of their lineality spaces \( R_{C_1} \cap R_{C_2} = L_{C_1} \cap L_{C_2} \);
One is closed, the other is polyhedral, and the intersection of their recession cones is a subset of the former's lineality space \( R_{C_1} \cap R_{C_2} \subseteq L_{C_1} \);
Both are intersections of finitely many closed convex quadratic constraints: \( \{x | \langle Q_i x + a_i, x \rangle \le b_j, i \in 1, \dots, r \} \);

Polyhedral convexity

Polar cone is closed and convex.

Polar cone of a set is also polar cone of the cone generated by the set: \( C^* = \text{cone}(C)^* \). {Bertsekas2003, 3.1.1}

Polar cone theorem: Twice polar cone of a cone is the closure of its convex hull: \( C^{ ** } = \text{cl}(\text{conv}(C)) \). In particular, closed convex cones are self-dual to polarity: \( C^{ ** } = C\).

Cone generated by a finite set is closed. (Nontrivial) The polyhedral cone and the cone generated by the same finite set of vectors are polar to each other: for a finite set \(A\), \( A^* = \text{cone}(A)^* \), and (Farkas' Lemma) \( A^{**} = \text{cone}(A) \). {Bertsekas2003, 3.2.1}

Minkowski-Weyl Theorem: Any polyhedral cone can be represented as a finitely generated cone, and the converse is also true: \( A^* = \text{cone}(B) \).

Minkowski-Weyl representation: Given finite sets \( A, B \), a polyhedral set can be represented as the Minkowski sum of the convex hull of a finite set and a finitely generated cone; the converse is also true: \( P = \text{conv}(A) + \text{cone}(B) \). {Bertsekas2003, 3.2.2}

{Bertsekas2003, 3.3.1} For a (nonempty) convex set, extreme points on a supporting hyperplane are also extreme points of the set. A closed convex set has at least one extreme point iff it does not have two opposing recession direction. For a bounded convex set, all extreme points form the minimal set that reproduce the set by convex hull: \(C = \text{conv}(E)\), where \(E\) is the set of extreme points of \(C\).

For a polyhedral set with Minkowski-Weyl representation \( P = \text{conv}(A) + \text{cone}(B) \), its extreme points are in \(A\). {Bertsekas2003, 3.3.2}

{Bertsekas2003, 3.3.3}

Polyhedral proper separation theorem {Bertsekas2003, 3.5.1}

Generalized inequalities

Properties of generalized inequalities:

Additive: \(x \preceq_K y, u \preceq_K v \Rightarrow x+u \preceq_K y+v\);
Nonnegative scaling: \(x \preceq_K y, a \ge 0 \Rightarrow ax \preceq_K ay\);
Transitive: \(x \preceq_K y, y \preceq_K z \Rightarrow x \preceq_K z\);
Reflexive: \( x \preceq_K x \);
Antisymmetric: \(x \preceq_K y, y \preceq_K x \Rightarrow x = y\);
Limits: \( x_i \preceq_K y_i, i \in N; \lim_i x_i = x, \lim_i y_i = y \Rightarrow x \preceq_K y \);

For a generalized inequality, minimum and maximum elements of a set are unique if they ever exist, while a set can have multiple minimal and maximal elements.

Non-decreasing in each argument is equivalent to \(K\)-nondecreasing where \(K\) is the nonnegative cone.

First-order conditions for \(K\)-monotonicity: A differentiable function on a convex domain is \(K\)-nondecreasing iff its gradient is nonnegative in \(K\)-dual inequality: \(\nabla f(x) \succeq_{K^\star} 0\); It is \(K\)-increasing if its gradient is positive in \(K\)-dual inequality: \(\nabla f(x) \succ_{K^\star} 0\);

First-order conditions for \(K\)-convexity: A differentiable function on a convex domain is \(K\)-convex iff the function globally \(K\)-dominates its linear expansion: \( f(y) \succeq_K f(x) + Df(x)(y−x) \), here \( f: \mathbb{R}^n \to \mathbb{R}^m \) and \( Df(x) \in \mathbb{R}^{m \times n} \). This proposition also holds when both sides are strict.

Composition theorem: A \(K\)-nondecreasing convex function of a \(K\)-convex function is convex. Note the monotonicity restriction is on the extended-value extension of the outer function.

Convex functions

A convex function can have discontinuities only on its relative boundary.

A function is affine iff it is simultaneously convex and concave.

Operations that preserves (vector-valued) function convexity (or concavity): {Bertsekas2003, 1.2.4}

Unary
- Minimization over a (nonempty) convex set of a subset of variables: \( f(x,y) \to g(x) = \inf_{y \in C} f(x,y) \);
- Perspective, \( P: f(x) \to g(x,t) = t f(x/t), t>0 \); slice, \( g(x,1)=f(x) \);
Binary
- Cartesian product
- Affine transformation on the domain [inner composition]
- Nonnegative affine transformation [outer composition]
- Composition
  1. if the extended-value extension of the outer function is non-decreasing in each argument;
  2. or if the functions are twice differentiable over the containing (Euclidean) space and the outer function is non-decreasing in each argument.
  3. [The composition of an outer non-increasing convex function with an inner concave function is also convex, suitable for both previous criteria.]
Arbitrary
- Point-wise maximum and supremum

Characterization of differentiable convex functions:

First-order conditions: {Bertsekas2003, 1.2.5}
- A differentiable real function with a convex domain is convex iff the function globally dominates its linear expansion
- This proposition also holds when both sides are strict.
Second-order conditions: {Bertsekas2003, 1.2.6}
- A twice continuously differentiable real function with a convex domain is convex if its second derivative everywhere on the domain is positive semidefinite.
- This proposition also holds when both sides are strict.
- If the function is convex and its domain has nonempty interior, then its second derivative everywhere in the interior of the domain is positive semidefinite.

Quasi-convex functions

Operations that preserve quasi-convexity:

Nonnegative weighted maximum;
Outer composition with a nondecreasing function;
Linear-fractional transformation on the domain;
Minimization over a (nonempty) convex set of a subset of variables;

A quasilinear function has convex level sets; the converse does not hold.

A quasi-convex function \(f(x)\) can be represented by a family of convex functions \( \phi_t(x) \), where the \(t\)-sublevel set of \(f\) is the 0-sublevel set of \(\phi_t\). Such representation is not unique.

Log-concave functions

Relations:

A function is log-convex iff its reciprocal is log-concave.
A function is log-concave/convex iff it is the exponential of a concave/convex function.
Log-convex functions are convex.
Nonnegative concave functions are log-concave.

Properties:

For a twice differentiable function with a convex domain:
- It is log-convex iff \( f(x) \nabla^2 f(x) \succeq \nabla f(x) \nabla f(x)^{\text{T}} \);
- It is log-concave iff the above inequality changes direction;
Log-convexity and log-concavity are closed under multiplication and positive scaling.
Log-convexity is preserved under sums.
Log-concavity is preserved under integration over subspaces, if domain covers the full space.
- Log-concave probability densities have log-concave marginal densities.
- Log-concavity is closed under convolution.

Conjugation

Transformation properties of conjugation:

Affine transformation on the domain: \( (f(Ax + b))^\star = f^\star(A^{−\text{T}} y) − \langle b, A^{−\text{T}} y \rangle \);
Sum of independent functions: \( (f_1(u) + f_2(v))^\star = f_1^\star(w) + f_2^\star(z) \);

A function is closed if its epigraph is a closed set; or equivalently, if all its sub-level sets are closed. The conjugate function of any function is closed and convex. Almost every convex function can be expressed as the conjugate of some function.

An extended real-valued function is lower semi-continuous at a point if the function values in the point's neighborhood are either close to or greater than the function value at that point: \( \liminf_{x \to x_0} f(x) \ge f(x_0) \). The conjugate of a function is always lower semi-continuous.

The convex closure of a function is the function whose epigraph is the closure of the convex hull of the original function. The conjugate of the conjugate (twice conjugate) of a function is its convex closure: \( \text{epi}f^{\star\star} = \text{cl}(\text{conv}(\text{epi}f)) \), noticing that \( f^{\star\star} \le f \). Fenchel–Moreau theorem: If a function is convex and closed, its twice conjugate is itself: \( f^{\star\star} = f \).

🏷 Category=Analysis