Convexity

Concepts

Sets

Distance of a point to a subset of a metric space is the smallest distance between the point and any point in the subset: $d_C(x) = \min_{y \in C} d(x, y)$. Projection of a vector on a closed subset of a normed vector space is the set of vectors in the subset closest to the vector: $P_C(x) = \arg\min_{y \in C} \| y - x \|$.

Nonnegative combination of vectors is a linear combination where the scalars are nonnegative: $\sum_{i=1}^k a^i v_i$ where $a^i \ge 0$. Convex combination of vectors is a normalized nonnegative combination: $\sum_{i=1}^k a^i v_i$ where $a^i \ge 0$ and $\sum_{i=1}^k a^i = 1$. Line segment $\overline{pq}$ between two vectors is the set of all their convex combinations: $\overline{pq} = \{\lambda p + (1-\lambda) q: \lambda \in [0,1]\}$. Convex subset in a vector space is a subset that includes the line segment between any two of its elements [@Bertsekas2003, 1.2.1]: $A = \cup_{p, q \in A} \overline{pq}$. Convexity is closely related to expectation of random vector: a random vector supported within a convex subset has its expectation within that subset. A subset is closed and convex if and only if it is the intersection of all closed half-spaces containing the subset. Extreme point of a convex subset is a vector not between any two other vectors on the subset. Extreme direction of a convex set is the direction of one of its extreme points.

Convex hull (凸包) $\text{conv}(C)$ of a subset in a vector space is the intersection of all convex sets containing a subset. Affine hull (仿射包) $\text{aff}(X)$ is the intersection of all affine subspaces containing a set. Affine dimension of a subset is the dimension of its affine hull. A set of vectors are affinely independent if when picking one as the origin, the rest are linearly independent. The number of affinely independent vectors in a set cannot be more than one plus the affine dimension of these vectors, if finite. Relative interior $\text{ri}(C)$ of a subset is the intersection of its interior and its affine hull.

Generalized inequality

A set is a cone, if it is scale invariant. A cone is pointed if it contains no line. A cone is solid if it has nonempty interior. A proper cone (真锥) is a pointed, closed solid, convex cone. Generated cone $\text{cone}(C)$ is the cone generated by a set is the nonnegative span of it. A cone is finitely generated, if it is generated by a finite set of vectors. A vector is a direction of recession of a convex set, if arbitrary shifts of the set along this vector are within the set. The recession cone of a set $R_C$ is the cone generated by its recession directions. The lineality space of a set is the set of directions of recession whose oppose are also direction of recession: $L_C = R_C \cap -R_C$.

A generalized inequality is the partial ordering on $\mathbb{R}^n$ associated with a proper cone $K$. Vector inequality is a special case of generalized inequality where the associated cone is the nonnegative cone.

$$\begin{align} x \preceq_K y, y \succeq_K x &\Leftrightarrow y−x \in K \ x \prec_K y, y \succ_K x &\Leftrightarrow y−x \in \text{int}(K) \end{align}$$

The minimum element (最小元) of a multidimensional set w.r.t. a generalized inequality $\preceq_K$ is an element no greater than any other element in the set in such partial ordering: $S \subseteq \{x\} + K$. Similarly, the maximum element of a multidimensional set w.r.t. a generalized inequality $\succeq_K$ is an element no less than any other element in the set in such partial ordering: $S \subseteq \{x\} - K$.

A minimal element (极小元) of a multidimensional set w.r.t. a generalized inequality $\preceq_K$ is an element no less than any element in the set in such partial ordering: $(\{x\} - K) \cap S = \emptyset$. Similarly, a maximal element of a multidimensional set w.r.t. a generalized inequality $\succeq_K$ is an element no greater than any element in the set in such partial ordering: $(\{x\} + K) \cap S = \emptyset$.

The polar cone of a set is its shade, which is a cone: $C^∗ = \{ x | \langle x, y \rangle \le 0, \forall y \in C \}$. Dual cone of a set is the negative polar cone: $\hat{C} = -C^∗$. A set is polyhedral (多面的) if it is the intersection of finitely many closed half-spaces: ${x | A x \le b}$. A cone is polyhedral if it is the polar cone of some finite set.

The dual generalized inequality associated with a proper cone is the generalized inequality associated with its dual cone. This concept can be used for a dual characterization of the minimum/minimal elements. The minimum element of a multidimensional set w.r.t. a generalized inequality $\preceq_K$ is an element such that for all elements $\lambda$ in the interior of the dual cone $K^∗$, it is the unique minimizer for $\langle \lambda, \cdot \rangle$ in the set. A minimal element of a multidimensional set w.r.t. a generalized inequality $\succeq_K$ is an element that minimizes $\langle \lambda, \cdot \rangle$ over the set for an element $\lambda$ in the interior of the dual cone.

Functions

An extended real-valued function is a function whose range is the extended real line. The extended-value extension $\tilde{f}: \mathbb{R}^n → \mathbb{R} ∪ \{+\infty\}$ of a convex function $f : C \to \mathbb{R}$ is the function defined in its containing (Euclidean) space that takes value $+\infty$ outside its original domain.

The graph of a function is the collection of point-value pairs: $\{ (x, f(x)) | x \in \text{dom}f \}$. The epigraph (supergraph) of a function is the collection of points on and above its graph: $\text{epi}f = \{ (x, y) | x \in \text{dom}f, y \ge f(x) \}$.

Consistent definitions of convex functions:

(Jensen's inequality) A real-valued function with a convex domain is convex, if every line segment of the function is dominated by its linear interpolation: $\forall x_1, x_2 \in \text{dom}f, \forall \lambda \in [0,1], f(\boldsymbol\lambda \cdot \mathbf{x}) \le \boldsymbol\lambda \cdot f(\mathbf{x})$, where $\boldsymbol\lambda = (\lambda, 1 - \lambda)$, $\mathbf{x} = (x_1, x_2)$ [@Bertsekas2003, 1.2.2].
(General Jensen's inequality) Given a convex set $C$, function $f: C \to \mathbb{R}$ is convex if for any probability distribution on the domain, function value at the expectation is dominated by the expected function value: $\forall X, \text{supp}(X) \subseteq C: f(\mathbf{E}X) \le \mathbf{E}f(X)$.
(Extension) A real-valued function is convex, if every line segment of its extended-value extension is dominated by its linear interpolation.
(Epigraph) An extended real-valued function with a convex domain is convex, if its epigraph is convex. [@Bertsekas2003, 1.2.4]

A function is concave if its negative is a convex function. While concave set is simply any non-convex set, concave functions exactly mirror convex functions.

Level curve is the feasible set of an equality constraint: $\{x \mid h(x) = 0\}$. Level set or sub-level set is the feasible set of an inequality constraint: $\{x \mid f(x) \le 0\}$. Quasi-convex function is one whose sub-level sets (and thus its domain) are all convex. Quasi-concave function is one whose negative is quasi-convex. Unimodal function is one which is quasi-convex or quasi-concave. Quasilinear function is one whis is both quasi-convex and quasi-concave.

Log-concave/log-convex function is a positive function whose logarithm is concave/convex.

Given a proper cone $K$ in its domain space, a real-valued function is $K$-nondecreasing if it is dominated by values in its K-cone: $\forall x, y, x \preceq_K y : f(x) \le f(y)$. The function is $K$-increasing if it is strictly dominated by values in its K-cone other than the vertex: $\forall x \ne y, x \preceq_K y : f(x) < f(y)$. $K$-nonincreasing and $K$-decreasing are similarly defined. As a special case, a real-valued function on the space of symmetric matrices is matrix increasing (decreasing) if it is increasing (decreasing) with respect to the positive semidefinite cone.

Given a proper cone $K$ in its range space, a vector-valued function is $K$-convex if every line segment of the function is K-dominated by its linear interpolation: $\mathbf{f}(\boldsymbol\lambda \cdot \mathbf{x}) \preceq_K \boldsymbol\lambda \cdot \mathbf{f}(\mathbf{x})$. The function is strictly K-convex if the inequality is strict except on the endpoints. As a special case, a symmetric matrix valued function is matrix convex if it is convex with respect to the positive semidefinite cone.

Given the dual pair of vector spaces $X, Y$ w.r.t a bilinear form $\langle \cdot, \cdot \rangle$, functions $f: X \to \mathbb{R}$ and $f^\star: Y \to \mathbb{R}$ are conjugate if their sum is supported from below by the bilinear form pointwise in the dual space: $f(x) + f^\star(y) \ge \langle y,x \rangle$ (Young's inequality), and $\forall y \in Y, \exists x \in X: f(x) + f^\star(y) = \langle y,x \rangle$. Equivalently, $f^\star(y) = \sup_{x \in X} \{ \langle y,x \rangle - f(x) \}$, also known as Fenchel transformation [@Fenchel1949]. If a function is smooth, strictly convex, and increases at infinity faster than a linear function, its conjugate is equivalent to its Legendre transformation [@Legendre1789]: $f^\star(y) = \langle x^\star, \nabla f(x^\star) \rangle − f(x^\star)$, where $y = \nabla f(x^\star)$.

Examples

Common convex sets:

Linear family
- Hyperplane: ${x | \langle a,x \rangle = b}$;
- Half-space: ${x | \langle a,x \rangle \le b}$;
- Polyhedral set ${x | A x \le b}$: polytope (多胞形), a bounded polyhedral set (a polyhedron is a three-dimensional polytope); simplex (单纯形), $\text{conv}(v_0, \dots, v_k), k \in \mathbb{N}$;
Norm family
- Ellipsoids: $x_c + A B_1(0)$, where $x_c$ is center of ellipsoid, $A$ is a linear transformation, $B_1(0)$ is the unit ball; an alternative definition is $\{x+x_c | \langle x, P^{−1} x\rangle \le 1\}$ where $P$ is a positive semidefinite matrix (generalized norm);
- Norm ball: $\{x | \| x-x_c\| ≤ r \}$, where $\| \cdot \|$ is any vector norm.
- Norm cone: $\{(x,t)~|~\| x \| ≤ t \}$, where $\| \cdot \|$ is any vector norm.
Positive semidefinite cone: $S_+^n = \{X \in S^n | X \succeq 0 \}$, where $S^n$ is the space of $n \times n$ symmetric matrices.

Common convex functions:

Power family
- Power $x^a$ on $\mathbb{R}_+$ if $a \in (-\infty, 0] \cup [1, +\infty)$;
- Power $x^a$ is concave on $\mathbb{R}_+$ for $a \in [0, 1]$;
- Geometric mean $\prod_i x_i^{\frac{1}{n}}$ is concave on $\mathbb{R}_+^n$;
- Quadratic-over-linear $x^2/y$ on $\mathbb{R} \times \mathbb{R}_+$;
Exponential family
- Exponential $e^x$;
- Logarithm $\log(x)$ is concave on $\mathbb{R}_+$;
- Log-sum-exp $\log(\sum_i e^x_i)$;
Entropy family
- Negative entropy $x \log(x)$ on $\mathbb{R}_+$;
- Entropy $\sum_i -P_i \log P_i$;
- Relative entropy (Kullback–Leibler divergence) $\sum_i P_i \log(P_i/Q_i)$;
Norm family
- Any (scalar/vector/matrix/function/operator) norm $\| x \|$;
- Maximum (infinity norm) $\max_i |x_i|$;
Determinant $\det(X)$ is concave on $S_+^n$;
Convex conjugate of any function $f^\star$;

Common non-trivial log-concave and log-convex functions:

Probability family
- Log-concave density functions:
  - Gaussian PDFs: $\phi(K^{-\frac{1}{2}}(x-\mu))$;
  - Exponential PDFs: $\prod_i λ_i e^{-λ_i x}$;
  - Uniform PDFs over a convex set: $\frac{\mathbf{1}(C)}{m(C)}$;
  - Wishart distribution;
- CDFs of log-concave density functions are also log-concave;
- Moment generating functions $M(z)$ are log-convex, which means cumulant generating functions $\log M(z)$ are convex;
Determinant over trace $\det(X) / \text{tr}(X)$ is log-concave on $S_++^{n}$.
Integration family
- Gamma function $\Gamma(x)$ is log-convex on $x \ge 1$;
- Laplace transform $\mathcal{L}p$ of a nonnegative function is log-convex;

Common matrix monotone functions: (Real-valued functions over the symmetric matrix space.)

Linear function $\text{tr}(WX)$ is matrix nondecreasing if $W \succeq 0$, and is matrix increasing if $W \succ 0$.
Trace of inverse $\text{tr}(X^{−1})$ is matrix decreasing on the positive definite cone.
Determinant $\det(X)$ is matrix increasing on the positive definite cone, and matrix nondecreasing on the positive semi-definite cone.

Common matrix convex/concave functions:

$XX^\text{T}$, here $f: \mathbb{R}^{n \times m} \to \mathbb{R}^{n \times n}$.
Power $X^p$ is matrix convex if $p \in [−1,0] \cup [1,2]$; it is matrix concave if $p \in [0,1]$.

Propositions

Sets

Set relations: $C \subseteq \text{cl}(C) \subseteq \text{conv}(C) \subseteq \text{cone}(C)$

Operations that preserves set convexity: [@Bertsekas2003, 1.2.1]

Unary operations
- Closure and relative interior;
- Affine transformation: scaling, translation, projection, etc;
- Preimage of affine transformation;
- Projective transformation (linear-fractional): $f(x) = \frac{Ax+b}{\langle c,x \rangle + d}$ where $\langle c,x \rangle + d > 0$;
  - Perspective transformation: $P(z,t) = z/t$ where $t>0, z \in \mathbb{R}^n$;
Binary operations
- Minkowski sum;
- Cartesian product;
Arbitrary operations
- Intersection;

Properties of convex set: [@Bertsekas2003, 1.4.1]

Any line segment between two vectors in the closure of a (nonempty) convex set is also in the relative interior of the set except the ends.
(Nonempty) convex set has (nonempty) relative interior.

If a set is closed with nonempty interior and has a supporting hyperplane at every point in its boundary, it is convex.

The dual cone of a proper cone is also proper.

Generated cone is convex.

Recession cone theorem [@Bertsekas2003, 1.5.1]: For a (nonempty) closed convex set,

Its recession cone is closed;
A direction of recession of one vector is also a direction of recession of the set;
Its recession cone is nonzero iff it is unbounded;
Its recession cone is also its relative interior's recession cone;
For any collection of (nonempty) closed convex sets with nonempty intersection, recession cone and intersection are interchangeable.

Caratheodory's theorem [@Bertsekas2003, 1.3.1]: (minimal representation of generated cone and convex hull)

Every vector in the cone generated by a set can be represented as a positive combination of a linearly independent subset of the set;
Every vector in the convex hull of a set can be represented as a convex combination of an affinely independent subset of the set;

Projection theorem [@Bertsekas2003, 2.2.1]: For a (nonempty) closed convex set $C$,

The projection of any vector $x$ on $C$ exists and is unique, where the supporting hyperplane separates $C$ and $x$ and is orthogonal to the connecting vector to $x$;
Projection on the set $P_C$ is a continuous and non-expansive map (weak contractor);
Distance to $C$ is a convex function;

Supporting hyperplane theorem: Any non-interior point of a (nonempty) convex set has some hyperplane that contains the set in one of its closed half-spaces. [@Bertsekas2003, 2.4.1]

Separating hyperplane theorem: Any two disjoint (nonempty) convex sets can be separated by some hyperplane. [@Bertsekas2003, 2.4.2]

Strict separation theorem [@Bertsekas2003, 2.4.3]: Two disjoint (nonempty) convex sets $C_1, C_2$ can be strictly separated by some hyperplane if any of the following holds:

Their Minkowski difference $C_1 - C_2$ is closed;
One closed and one is compact;
Both are closed, and the intersection of their recession cones is just the intersection of their lineality spaces $R_{C_1} \cap R_{C_2} = L_{C_1} \cap L_{C_2}$;
One is closed, the other is polyhedral, and the intersection of their recession cones is a subset of the former's lineality space $R_{C_1} \cap R_{C_2} \subseteq L_{C_1}$;
Both are intersections of finitely many closed convex quadratic constraints: $\{x | \langle Q_i x + a_i, x \rangle \le b_j, i \in 1, \dots, r \}$;

Polyhedral convexity

Polar cone is closed and convex.

Polar cone of a set is also polar cone of the cone generated by the set: $C^∗ = \text{cone}(C)^∗$. [@Bertsekas2003, 3.1.1]

Polar cone theorem: Twice polar cone of a cone is the closure of its convex hull: $C^{ ** } = \text{cl}(\text{conv}(C))$. In particular, closed convex cones are self-dual to polarity: $C^{ ** } = C$.

Cone generated by a finite set is closed. (Nontrivial) The polyhedral cone and the cone generated by the same finite set of vectors are polar to each other: for a finite set $A$, $A^∗ = \text{cone}(A)^∗$, and (Farkas' Lemma) $A^{**} = \text{cone}(A)$. [@Bertsekas2003, 3.2.1]

Minkowski-Weyl Theorem: Any polyhedral cone can be represented as a finitely generated cone, and the converse is also true: $A^∗ = \text{cone}(B)$.

Minkowski-Weyl representation: Given finite sets $A, B$, a polyhedral set can be represented as the Minkowski sum of the convex hull of a finite set and a finitely generated cone; the converse is also true: $P = \text{conv}(A) + \text{cone}(B)$. [@Bertsekas2003, 3.2.2]

[@Bertsekas2003, 3.3.1] For a (nonempty) convex set, extreme points on a supporting hyperplane are also extreme points of the set. A closed convex set has at least one extreme point iff it does not have two opposing recession direction. For a bounded convex set, all extreme points form the minimal set that reproduce the set by convex hull: $C = \text{conv}(E)$, where $E$ is the set of extreme points of $C$.

For a polyhedral set with Minkowski-Weyl representation $P = \text{conv}(A) + \text{cone}(B)$, its extreme points are in $A$. [@Bertsekas2003, 3.3.2]

[@Bertsekas2003, 3.3.3]

Polyhedral proper separation theorem [@Bertsekas2003, 3.5.1]

Generalized inequalities

Properties of generalized inequalities:

Additive: $x \preceq_K y, u \preceq_K v \Rightarrow x+u \preceq_K y+v$;
Nonnegative scaling: $x \preceq_K y, a \ge 0 \Rightarrow ax \preceq_K ay$;
Transitive: $x \preceq_K y, y \preceq_K z \Rightarrow x \preceq_K z$;
Reflexive: $x \preceq_K x$;
Antisymmetric: $x \preceq_K y, y \preceq_K x \Rightarrow x = y$;
Limits: $x_i \preceq_K y_i, i \in N; \lim_i x_i = x, \lim_i y_i = y \Rightarrow x \preceq_K y$;

For a generalized inequality, minimum and maximum elements of a set are unique if they ever exist, while a set can have multiple minimal and maximal elements.

Non-decreasing in each argument is equivalent to K-nondecreasing where $K$ is the nonnegative cone.

First-order conditions for K-monotonicity: A differentiable function on a convex domain is K-nondecreasing iff its gradient is nonnegative in K-dual inequality: $\nabla f(x) \succeq_{K^\star} 0$; It is K-increasing if its gradient is positive in K-dual inequality: $\nabla f(x) \succ_{K^\star} 0$;

First-order conditions for K-convexity: A differentiable function on a convex domain is K-convex iff the function globally K-dominates its linear expansion: $f(y) \succeq_K f(x) + Df(x)(y−x)$, here $f: \mathbb{R}^n \to \mathbb{R}^m$ and $Df(x) \in \mathbb{R}^{m \times n}$. This proposition also holds when both sides are strict.

Composition theorem: A K-nondecreasing convex function of a K-convex function is convex. Note the monotonicity restriction is on the extended-value extension of the outer function.

Convex functions

A convex function can only be discontinuous on its relative boundary.

A function is affine iff it is simultaneously convex and concave.

Operations that preserves (vector-valued) function convexity (or concavity): [@Bertsekas2003, 1.2.4]

Unary
- Minimization over a (nonempty) convex set of a subset of variables: $f(x,y) \to g(x) = \inf_{y \in C} f(x,y)$;
- Perspective, $P: f(x) \to g(x,t) = t f(x/t), t>0$; slice, $g(x,1)=f(x)$;
Binary
- Cartesian product
- Affine transformation on the domain [inner composition]
- Nonnegative affine transformation [outer composition]
- Composition
  1. if the extended-value extension of the outer function is non-decreasing in each argument;
  2. or if the functions are twice differentiable over the containing (Euclidean) space and the outer function is non-decreasing in each argument.
  3. [The composition of an outer non-increasing convex function with an inner concave function is also convex, suitable for both previous criteria.]
Arbitrary
- Point-wise maximum and supremum

Characterization of differentiable convex functions:

First-order conditions: [@Bertsekas2003, 1.2.5]
- A differentiable real-valued function with a convex domain is (strictly) convex iff the function globally (strictly) dominates its linear expansion.
Second-order conditions: [@Bertsekas2003, 1.2.6]
- A twice continuously differentiable real-valued function with a convex domain is (strictly) convex if its second derivative everywhere on the domain is positive (semi)definite.
- If the function is convex and its domain has nonempty interior, then its second derivative everywhere in the interior of the domain is positive semidefinite.

Quasi-convex functions

Operations that preserve quasi-convexity:

Nonnegative weighted maximum;
Outer composition with a nondecreasing function;
Linear-fractional transformation on the domain;
Minimization over a (nonempty) convex set of a subset of variables;

A quasilinear function has convex level sets; the converse does not hold.

A quasi-convex function $f(x)$ can be represented by a family of convex functions $\phi_t(x)$, where the t-sublevel set of $f$ is the 0-sublevel set of $\phi_t$. Such representation is not unique.

Log-concave functions

Relations:

A function is log-convex iff its reciprocal is log-concave.
A function is log-concave/convex iff it is the exponential of a concave/convex function.
Log-convex functions are convex.
Nonnegative concave functions are log-concave.

Properties:

For a twice differentiable function with a convex domain:
- It is log-convex iff $f(x) \nabla^2 f(x) \succeq \nabla f(x) \nabla f(x)^{\text{T}}$;
- It is log-concave iff the above inequality changes direction;
Log-convexity and log-concavity are closed under multiplication and positive scaling.
Log-convexity is preserved under sums.
Log-concavity is preserved under integration over subspaces, if domain covers the full space.
- Log-concave probability densities have log-concave marginal densities.
- Log-concavity is closed under convolution.

Conjugation

Transformation properties of conjugation:

Affine transformation on the domain: $(f(Ax + b))^\star = f^\star(A^{−\text{T}} y) − \langle b, A^{−\text{T}} y \rangle$;
Sum of independent functions: $(f_1(u) + f_2(v))^\star = f_1^\star(w) + f_2^\star(z)$;

A function is closed if its epigraph is a closed set; or equivalently, if all its sub-level sets are closed. The conjugate function of any function is closed and convex. Almost every convex function can be expressed as the conjugate of some function.

An extended real-valued function is lower semi-continuous at a point if the function values in the point's neighborhood are either close to or greater than the function value at that point: $\liminf_{x \to x_0} f(x) \ge f(x_0)$. The conjugate of a function is always lower semi-continuous.

The convex closure of a function is the function whose epigraph is the closure of the convex hull of the original function. The conjugate of the conjugate (twice conjugate) of a function is its convex closure: $\text{epi}f^{\star\star} = \text{cl}(\text{conv}(\text{epi}f))$, note that $f^{\star\star} \le f$. Fenchel–Moreau theorem: If a function is convex and closed, its twice conjugate is itself: $f^{\star\star} = f$.

🏷 Category=Analysis