Convexity

Definitions

Sets

The closure (闭包) of a set \(\text{cl}(C)\) is the collection of its limit points.

A set in a linear space is convex, if any line segment between two vectors in the set is also in the set. {Bertsekas2003, 1.2.1} An extreme point of a (nonempty) convex set is a vector not between any two other vectors on the set. An extreme direction of a (nonempty) convex set is the direction of one of its extreme points. Convex hull (凸包) \(\text{conv}(C)\) is the smallest (intersection of all) convex set containing a set. Convex combination is a nonnegative weighted average of vectors. In comparison, nonnegative combination is a linear combination with nonnegative coefficients. Convexity is closely related to expectation: If \( P(X \in C) = 1 \) and \( C \subseteq \mathbb{R}^n \) is convex, then \( \mathbf{E}X \in C \).

Affine set is a shifted subspace. Affine hull (仿射包) is the smallest (intersection of all) affine set containing a set. The affine dimension of a set is the dimension of its affine hull. A set of vectors are affinely independent if when picking one as the origin, the rest are linearly independent. The number of affinely independent vectors in a set cannot be more than one plus the affine dimension of these vectors, if finite. The relative interior of a set is the intersection of its interior and its affine hull.

Generalized inequality

A set is a cone, if it is scale invariant. A cone is pointed if it contains no line. A cone is solid if it has nonempty interior. A proper cone (真锥) is a pointed, closed solid, convex cone. Generated cone \(\text{cone}(C)\) is the cone generated by a set is the nonnegative span of it. A cone is finitely generated, if it is generated by a finite set of vectors. A vector is a direction of recession of a convex set, if arbitrary shifts of the set along this vector are within the set. The recession cone of a set is the cone generated by its recession directions.

A generalized inequality is the partial ordering on \(\mathbb{R}^n\) associated with a proper cone \(K\). Vector inequality is a special case of generalized inequality where the associated cone is the non-negative cone.

\[\begin{align} x \preceq_K y, y \succeq_K x &\Leftrightarrow y−x \in K \ x \prec_K y, y \succ_K x &\Leftrightarrow y−x \in \text{int}(K) \end{align}\]

An element of a multidimensional set is the minimum element (最小元) w.r.t. a generalized inequality \(\preceq_K\), if it is no greater than any other element in the set in such partial ordering: \(S \subseteq \{x\} + K\). Similarly, An element of a multidimensional set is the maximum element w.r.t. a generalized inequality \(\succeq_K\), if it is no less than any other element in the set in such partial ordering: \(S \subseteq \{x\} - K\).

An element of a multidimensional set is a minimal element (极小元) w.r.t. a generalized inequality \(\preceq_K\), if it is no less than any element in the set in such partial ordering: \((\{x\} - K) \cap S = \emptyset\). Similarly, An element of a multidimensional set is a maximal element w.r.t. a generalized inequality \(\succeq_K\), if it is no greater than any element in the set in such partial ordering: \((\{x\} + K) \cap S = \emptyset\).

The polar cone of a set is its shade, which is a cone. Dual cone is the opposite of polar cone. A cone is polyhedral if it is the polar cone of some finite set.

The dual generalized inequality associated with a proper cone is the generalized inequality associated with its dual cone. This concept can be used for a dual characterization of the minimum/minimal elements. An element of a multidimensional set is the minimum element w.r.t. a generalized inequality \(\preceq_K\), if for all elements \(\lambda\) in the interior of the dual cone \(K^\) it is the unique minimizer for \(\langle \lambda, \cdot \rangle\) in the set. An element of a multidimensional set is a *minimal element** w.r.t. a generalized inequality \(\succeq_K\), if it minimizes \(\langle \lambda, \cdot \rangle\) over the set for an element \(\lambda\) in the interior of the dual cone.

Functions

An extended real-valued function is a function whose range is the extended real line. The extended-value extension \(\tilde{f}: \mathbb{R}^n → \mathbb{R} ∪ \{+\infty\}\) of a convex function \(f : C \to \mathbb{R}\) is the function defined in its containing (Euclidean) space that takes value \(+\infty\) outside its original domain.

The graph of a function is the collection of point-value pairs: \( \{ (x, f(x)) | x \in \text{dom}f \} \). The epigraph (supergraph) of a function is the collection of points on and above its graph: \( \{ (x, y) | x \in \text{dom}f, y \ge f(x) \} \).

Consistent definitions of convex functions:

(Jensen's inequality) A real function with a convex domain is convex, if every line segment of the function is dominated by its linear interpolation: \(\forall x_1, x_2 \in \text{dom}f, \forall \lambda \in [0,1], f(\boldsymbol\lambda \cdot \mathbf{x}) \le \boldsymbol\lambda \cdot f(\mathbf{x})\), where \(\boldsymbol\lambda = (\lambda, 1 - \lambda)\), \(\mathbf{x} = (x_1, x_2)\). {Bertsekas2003, 1.2.2}
(General Jensen's inequality) Given a convex set \(C\), function \(f: C \to \mathbb{R}\) is convex if for any probability distribution on the domain, function value at the expectation is dominated by the expected function value: \( \forall X, \text{supp}(X) \subseteq C: f(\mathbf{E}X) \le \mathbf{E}f(X) \).
(Extension) A real function is convex, if every line segment of its extended-value extension is dominated by its linear interpolation.
(Epigraph) An extended real function with a convex domain is convex, if its epigraph is convex. {Bertsekas2003, 1.2.4}

A function is concave if its negative is a convex function. While concave set is simply any non-convex set, concave functions exactly mirror convex functions.

A level curve is the feasible set of an equality constraint. A level set is the feasible set of an inequality constraint. A sub-level set is the feasible set of a less-than-or-equal-to constraint. A function is quasi-convex if all its sub-level sets (and thus its domain) are convex. A function is quasi-concave if its negative is quasi-convex. A function is unimodal if it is quasi-convex or quasi-concave. A function is quasilinear if it is both quasi-convex and quasi-concave.

A positive function is log-concave (log-concave) if its logarithm is concave (convex).

Given a proper cone \(K\) in its domain space, a real-valued function is \(K\)-nondecreasing if it is dominated by values in its \(K\)-cone: \( \forall x, y, x \preceq_K y : f(x) \le f(y) \). The function is \(K\)-increasing if it is strictly dominated by values in its K-cone other than the vertex: \( \forall x \ne y, x \preceq_K y : f(x) < f(y) \). \(K\)-nonincreasing and \(K\)-decreasing are similarly defined. As a special case, a real-valued function on the space of symmetric matrices is matrix increasing (decreasing) if it is increasing (decreasing) with respect to the positive semidefinite cone.

Given a proper cone \(K\) in its range space, a vector-valued function is \(K\)-convex if every line segment of the function is \(K\)-dominated by its linear interpolation: \( \mathbf{f}(\boldsymbol\lambda \cdot \mathbf{x}) \preceq_K \boldsymbol\lambda \cdot \mathbf{f}(\mathbf{x}) \). The function is strictly \(K\)-convex if the inequality is strict except on the endpoints. As a special case, a symmetric matrix valued function is matrix convex if it is convex with respect to the positive semidefinite cone.

Given the dual pair of vector spaces \(X, Y\) w.r.t a bilinear form \( \langle \cdot, \cdot \rangle \), functions \( f: X \to \mathbb{R} \) and \( f^\star: Y \to \mathbb{R} \) are conjugate if their sum is supported from below by the bilinear form pointwise in the dual space: \( f(x) + f^\star(y) \ge \langle y,x \rangle \) (Young's inequality), and \( \forall y \in Y, \exists x \in X: f(x) + f^\star(y) = \langle y,x \rangle \). Equivalently, \( f^\star(y) = \sup_{x \in X} \{ \langle y,x \rangle - f(x) \} \), also known as Fenchel transformation {Fenchel1949}. If a function is smooth, strictly convex, and increases at infinity faster than a linear function, its conjugate is equivalent to its Legendre transformation {Legendre1789}: \( f^\star(y) = \langle x^\star, \nabla f(x^\star) \rangle − f(x^\star) \), where \( y = \nabla f(x^\star) \).

Examples

Common convex sets:

Linear family
- Hyperplane: \({x | \langle a,x \rangle = b}\);
- Half-space: \({x | \langle a,x \rangle \le b}\);
- Polyhedron (or polyhedral set; 多面体) is the intersection of finitely many closed half-spaces: \({x | A x \le b}\); polytope (多胞形), a bounded polyhedron; simplex (单纯形), \(\text{conv}(v_0, \dots, v_k), k \in \mathbb{N}\);
Norm family
- Ellipsoids: \(x_c + A B_1(0) \), where \(x_c\) is center of ellipsoid, \(A\) is a linear transformation, \(B_1(0)\) is the unit ball; an alternative definition is \( \{x+x_c | \langle x, P^{−1} x\rangle \le 1\} \) where \(P\) is a positive semidefinite matrix (generalized norm);
- Norm ball: \(\{x | | x-x_c| ≤ r \}\), where \(| \cdot |\) is any vector norm.
- Norm cone: \(\{(x,t)~|~| x | ≤ t \}\), where \(| \cdot |\) is any vector norm.
Positive semidefinite cone: \(S_+^n = \{X \in S^n | X \succeq 0 \}\), where \(S^n\) is the space of n-by-n symmetric matrices.

Common convex functions:

Power family
- Power \( x^a \) on \( \mathbb{R}_+ \) if \( a \in (-\infty, 0] \cup [1, +\infty) \);
- Power \( x^a \) is concave on \( \mathbb{R}_+ \) for \( a \in [0, 1] \);
- Geometric mean \( \prod_i x_i^{\frac{1}{n}} \) is concave on \( \mathbb{R}_+^n \);
- Quadratic-over-linear \(x^2/y\) on \(\mathbb{R} \times \mathbb{R}_+\);
Exponential family
- Exponential \(e^x\);
- Logarithm \(\log(x)\) is concave on \( \mathbb{R}_+ \);
- Log-sum-exp \( \log(\sum_i e^x_i) \);
Entropy family
- Negative entropy \( x \log(x) \) on \( \mathbb{R}_+ \);
- Entropy \( \sum_i -P_i \log P_i \);
- Relative entropy (Kullback–Leibler divergence) \( \sum_i P_i \log(P_i/Q_i) \);
Norm family
- Any (scalar/vector/matrix/function/operator) norm \( | x | \);
- Maximum (infinity norm) \( \max_i x_i \);
Determinant \( \det(X) \) is concave on \(S_+^n\);
Convex conjugate of any function \( f^\star \);

Common non-trivial log-concave and log-convex functions:

Probability family
- Log-concave density functions:
  - Gaussian PDFs: \(\phi(K^{-\frac{1}{2}}(x-\mu))\);
  - Exponential PDFs: \(\prod_i λ_i e^{-λ_i x}\);
  - Uniform PDFs over a convex set: \(\frac{\mathbf{1}(C)}{m(C)}\);
  - Wishart distribution;
- CDFs of log-concave density functions are also log-concave;
- Moment generating functions \(M(z)\) are log-convex, which means cumulant generating functions \(\log M(z)\) are convex;
Determinant over trace \( \det(X) / \text{tr}(X) \) is log-concave on \(S_++^{n}\).
Integration family
- Gamma function \(\Gamma(x)\) is log-convex on \(x \ge 1\);
- Laplace transform \(\mathcal{L}p\) of a nonnegative function is log-convex;

Common matrix monotone functions: (Real-valued functions over the symmetric matrix space.)

Linear function \( \text{tr}(WX) \) is matrix nondecreasing if \( W \succeq 0 \), and is matrix increasing if \( W \succ 0 \).
Trace of inverse \( \text{tr}(X^{−1}) \) is matrix decreasing on the positive definite cone.
Determinant \( \det(X) \) is matrix increasing on the positive definite cone, and matrix nondecreasing on the positive semi-definite cone.

Common matrix convex/concave functions:

\( XX^\text{T} \), here \( f: \mathbb{R}^{n \times m} \to \mathbb{R}^{n \times n} \).
Power \( X^p \) is matrix convex if \(p \in [−1,0] \cup [1,2]\); it is matrix concave if \( p \in [0,1] \).

Propositions

Sets

Set relations: \( C \subseteq \text{cl}(C) \subseteq \text{conv}(C) \subseteq \text{cone}(C) \)

Operations that preserves set convexity: {Bertsekas2003, 1.2.1}

Unary operations
- Closure and relative interior;
- Affine transformation: scaling, translation, projection, etc;
- Pre-image (inverse image) of affine transformation;
- Projective transformation (linear-fractional): \(f(x) = \frac{Ax+b}{\langle c,x \rangle + d}\) where \(\langle c,x \rangle + d > 0\);
  - Perspective transformation: \(P(z,t) = z/t\) where \(t>0, z \in \mathbb{R}^n\);
Binary operations
- Minkowski sum;
- Cartesian product;
Arbitrary operations
- Intersection;

Properties of convex set: {Bertsekas2003, 1.4.1}

Any line segment between two vectors in the closure of a (nonempty) convex set is also in the relative interior of the set except the ends.
(Nonempty) convex set has (nonempty) relative interior.

The dual cone of a proper cone is also proper.

Generated cone is convex.

Recession cone theorem {Bertsekas2003, 1.5.1}: For a (nonempty) closed convex set,

Its recession cone is closed;
A direction of recession of one vector is also a direction of recession of the set;
Its recession cone is nonzero iff it is unbounded;
Its recession cone is also its relative interior's recession cone;
For any collection of (nonempty) closed convex sets with nonempty intersection, recession cone and intersection are interchangeable.

Caratheodory's theorem {Bertsekas2003, 1.3.1}: (minimal representation of generated cone and convex hull)

Every vector in the cone generated by a set can be represented as a positive combination of a linearly independent subset of the set;
Every vector in the convex hull of a set can be represented as a convex combination of an affinely independent subset of the set;

Polyhedral convexity

Polar cone is closed and convex.

Polar cone of a set is also polar cone of cone generated by the set: \( C^* = \text{cone}(C)^* \). {Bertsekas2003, 3.1.1}

Polar cone theorem: Twice polar cone of a cone is the closure of its convex hull: \( C^{ ** } = \text{cl}(\text{conv}(C)) \). In particular, closed convex cones are self-dual to polarity: \( C^{ ** } = C\).

Cone generated by a finite set is closed. (Nontrivial) The polyhedral cone and the cone generated by the same finite set of vectors are polar to each other: for a finite set \(A\), \( A^* = \text{cone}(A)^* \), and (Farkas' Lemma) \( A^{**} = \text{cone}(A) \). {Bertsekas2003, 3.2.1}

Minkowski-Weyl Theorem: Any polyhedral cone can be represented as a finitely generated cone, and the converse is also true: \( A^* = \text{cone}(B) \).

Minkowski-Weyl representation: Given finite sets \( A, B \), a polyhedral set can be represented as the Minkowski sum of the convex hull of a finite set and a finitely generated cone; the converse is also true: \( P = \text{conv}(A) + \text{cone}(B) \). {Bertsekas2003, 3.2.2}

{Bertsekas2003, 3.3.1} For a (nonempty) convex set, extreme points on a supporting hyperplane are also extreme points of the set. A closed convex set has at least one extreme point iff it does not have two opposing recession direction. For a bounded convex set, all extreme points form the minimal set that reproduce the set by convex hull: \(C = \text{conv}(E)\), where \(E\) is the set of extreme points of \(C\).

For a polyhedron \( P = \text{conv}(A) + \text{cone}(B) \), its extreme points are in \(A\). {Bertsekas2003, 3.3.2}

{Bertsekas2003, 3.3.3}

Polyhedron proper separation theorem {Bertsekas2003, 3.5.1}

Generalized inequalities

Properties of generalized inequalities:

Additive: \(x \preceq_K y, u \preceq_K v \Rightarrow x+u \preceq_K y+v\);
Nonnegative scaling: \(x \preceq_K y, a \ge 0 \Rightarrow ax \preceq_K ay\);
Transitive: \(x \preceq_K y, y \preceq_K z \Rightarrow x \preceq_K z\);
Reflexive: \( x \preceq_K x \);
Antisymmetric: \(x \preceq_K y, y \preceq_K x \Rightarrow x = y\);
Limits: \( x_i \preceq_K y_i, i \in N; \lim_i x_i = x, \lim_i y_i = y \Rightarrow x \preceq_K y \);

For a generalized inequality, minimum and maximum elements of a set are unique if they ever exist, while a set can have multiple minimal and maximal elements.

Non-decreasing in each argument is equivalent to \(K\)-nondecreasing where \(K\) is the non-negative cone.

First-order conditions for \(K\)-monotonicity: A differentiable function on a convex domain is \(K\)-nondecreasing iff its gradient is non-negative in \(K\)-dual inequality: \(\nabla f(x) \succeq_{K^\star} 0\); It is \(K\)-increasing if its gradient is positive in \(K\)-dual inequality: \(\nabla f(x) \succ_{K^\star} 0\);

First-order conditions for \(K\)-convexity: A differentiable function on a convex domain is \(K\)-convex iff the function globally \(K\)-dominates its linear expansion: \( f(y) \succeq_K f(x) + Df(x)(y−x) \), here \( f: \mathbb{R}^n \to \mathbb{R}^m \) and \( Df(x) \in \mathbb{R}^{m \times n} \). This proposition also holds when both sides are strict.

Composition theorem: A \(K\)-nondecreasing convex function of a \(K\)-convex function is convex. Note the monotonicity restriction is on the extended-value extension of the outer function.

Convex functions

A convex function can have discontinuities only on its relative boundary.

A function is affine iff it is simultaneously convex and concave.

Operations that preserves (vector-valued) function convexity (or concavity): {Bertsekas2003, 1.2.4}

Unary
- Minimization over a (nonempty) convex set of a subset of variables: \( f(x,y) \to g(x) = \inf_{y \in C} f(x,y) \);
- Perspective, \( P: f(x) \to g(x,t) = t f(x/t), t>0 \); slice, \( g(x,1)=f(x) \);
Binary
- Cartesian product
- Affine transformation on the domain [inner composition]
- Non-negative affine transformation [outer composition]
- Composition
  1. if the extended-value extension of the outer function is non-decreasing in each argument;
  2. or if the functions are twice differentiable over the containing (Euclidean) space and the outer function is non-decreasing in each argument.
  3. [The composition of an outer non-increasing convex function with an inner concave function is also convex, suitable for both previous criteria.]
Arbitrary
- Point-wise maximum and supremum

Characterization of differentiable convex functions:

First-order conditions: {Bertsekas2003, 1.2.5}
- A differentiable real function with a convex domain is convex iff the function globally dominates its linear expansion
- This proposition also holds when both sides are strict.
Second-order conditions: {Bertsekas2003, 1.2.6}
- A twice continuously differentiable real function with a convex domain is convex if its second derivative everywhere on the domain is positive semidefinite.
- This proposition also holds when both sides are strict.
- If the function is convex and its domain has nonempty interior, then its second derivative everywhere in the interior of the domain is positive semidefinite.

Quasi-convex functions

Operations that preserve quasi-convexity:

Nonnegative weighted maximum;
Outer composition with a nondecreasing function;
Linear-fractional transformation on the domain;
Minimization over a (nonempty) convex set of a subset of variables;

A quasilinear function has convex level sets; the converse does not hold.

A quasi-convex function \(f(x)\) can be represented by a family of convex functions \( \phi_t(x) \), where the \(t\)-sublevel set of \(f\) is the 0-sublevel set of \(\phi_t\). Such representation is not unique.

Log-concave functions

Relations:

A function is log-convex iff its reciprocal is log-concave.
A function is log-concave/convex iff it is the exponential of a concave/convex function.
Log-convex functions are convex.
Non-negative concave functions are log-concave.

Properties:

For a twice differentiable function with a convex domain:
- It is log-convex iff \( f(x) \nabla^2 f(x) \succeq \nabla f(x) \nabla f(x)^{\text{T}} \);
- It is log-concave iff the above inequality changes direction;
Log-convexity and log-concavity are closed under multiplication and positive scaling.
Log-convexity is preserved under sums.
Log-concavity is preserved under integration over subspaces, if domain covers the full space.
- Log-concave probability densities have log-concave marginal densities.
- Log-concavity is closed under convolution.

Conjugation

Transformation properties of conjugation:

Affine transformation on the domain: \( (f(Ax + b))^\star = f^\star(A^{−\text{T}} y) − \langle b, A^{−\text{T}} y \rangle \);
Sum of independent functions: \( (f_1(u) + f_2(v))^\star = f_1^\star(w) + f_2^\star(z) \);

A function is closed if its epigraph is a closed set; or equivalently, if all its sub-level sets are closed. The conjugate function of any function is closed and convex. Almost every convex function can be expressed as the conjugate of some function.

An extended real-valued function is lower semi-continuous at a point if the function values in the point's neighborhood are either close to or greater than the function value at that point: \( \liminf_{x \to x_0} f(x) \ge f(x_0) \). The conjugate of a function is always lower semi-continuous.

The convex closure of a function is the function whose epigraph is the closure of the convex hull of the original function. The conjugate of the conjugate (twice conjugate) of a function is its convex closure: \( \text{epi}f^{\star\star} = \text{cl}(\text{conv}(\text{epi}f)) \), noticing that \( f^{\star\star} \le f \). Fenchel–Moreau theorem: If a function is convex and closed, its twice conjugate is itself: \( f^{\star\star} = f \).

🏷 Category=Analysis