Convexity

Concepts

Sets

Distance of a point to a subset of a metric space is the smallest distance between the point and any point in the subset: $d_C(x) = \min_{y \in C} d(x, y)$. Projection of a vector on a closed subset of a normed vector space is the set of vectors in the subset closest to the vector: $P_C(x) = \arg\min_{y \in C} \| y - x \|$.

Nonnegative combination of vectors is a linear combination where the scalars are nonnegative: $\sum_{i=1}^k a^i v_i$ where $a^i \ge 0$. Convex combination of vectors is a normalized nonnegative combination: $\sum_{i=1}^k a^i v_i$ where $a^i \ge 0$ and $\sum_{i=1}^k a^i = 1$. Line segment $\overline{pq}$ between two vectors is the set of all their convex combinations: $\overline{pq} = \{\lambda p + (1-\lambda) q: \lambda \in [0,1]\}$. Convex subset in a vector space is a subset that includes the line segment between any two of its elements [@Bertsekas2003, 1.2.1]: $A = \cup_{p, q \in A} \overline{pq}$. Convexity is closely related to expectation of random vector: a random vector supported within a convex subset has its expectation within that subset. A subset is closed and convex if and only if it is the intersection of all closed half-spaces containing the subset. Extreme point of a convex subset is a vector not between any two other vectors on the subset. Extreme direction of a convex set is the direction of one of its extreme points.

Convex hull (凸包) $\text{conv}(C)$ of a subset in a vector space is the intersection of all convex sets containing a subset. Affine hull (仿射包) $\text{aff}(X)$ is the intersection of all affine subspaces containing a set. Affine dimension of a subset is the dimension of its affine hull. A set of vectors are affinely independent if when picking one as the origin, the rest are linearly independent. The number of affinely independent vectors in a set cannot be more than one plus the affine dimension of these vectors, if finite. Relative interior $\text{ri}(C)$ of a subset is the intersection of its interior and its affine hull.

Generalized inequality

A set is a cone, if it is scale invariant. A cone is pointed if it contains no line. A cone is solid if it has nonempty interior. A proper cone (真锥) is a pointed, closed solid, convex cone. Generated cone $\text{cone}(C)$ is the cone generated by a set is the nonnegative span of it. A cone is finitely generated, if it is generated by a finite set of vectors. A vector is a direction of recession of a convex set, if arbitrary shifts of the set along this vector are within the set. The recession cone of a set $R_C$ is the cone generated by its recession directions. The lineality space of a set is the set of directions of recession whose oppose are also direction of recession: $L_C = R_C \cap -R_C$.

Generalized inequality $\preceq_K$ determined by a proper cone $K$ in a Cartesian power of the real numbers is the partial ordering such that one vector is less than or equal to another if and only if their difference lies in the cone: $x \preceq_K y$, $y \succeq_K x$ iff $y − x \in K$; $x \prec_K y$, $y \succ_K x$ iff $y − x \in \text{int}(K)$. Vector inequality is a special case of generalized inequality where the associated cone is the nonnegative cone.

The minimum element (最小元) of a multidimensional set w.r.t. a generalized inequality $\preceq_K$ is an element no greater than any other element in the set in such partial ordering: $S \subseteq \{x\} + K$. Similarly, the maximum element of a multidimensional set w.r.t. a generalized inequality $\succeq_K$ is an element no less than any other element in the set in such partial ordering: $S \subseteq \{x\} - K$.

A minimal element (极小元) of a multidimensional set w.r.t. a generalized inequality $\preceq_K$ is an element no less than any element in the set in such partial ordering: $(\{x\} - K) \cap S = \emptyset$. Similarly, a maximal element of a multidimensional set w.r.t. a generalized inequality $\succeq_K$ is an element no greater than any element in the set in such partial ordering: $(\{x\} + K) \cap S = \emptyset$.

The polar cone of a set is its shade, which is a cone: $C^∗ = \{ x | \langle x, y \rangle \le 0, \forall y \in C \}$. Dual cone of a set is the negative polar cone: $\hat{C} = -C^∗$. A set is polyhedral (多面的) if it is the intersection of finitely many closed half-spaces: ${x | A x \le b}$. A cone is polyhedral if it is the polar cone of some finite set.

The dual generalized inequality associated with a proper cone is the generalized inequality associated with its dual cone. This concept can be used for a dual characterization of the minimum/minimal elements. The minimum element of a multidimensional set w.r.t. a generalized inequality $\preceq_K$ is an element such that for all elements $\lambda$ in the interior of the dual cone $K^∗$, it is the unique minimizer for $\langle \lambda, \cdot \rangle$ in the set. A minimal element of a multidimensional set w.r.t. a generalized inequality $\succeq_K$ is an element that minimizes $\langle \lambda, \cdot \rangle$ over the set for an element $\lambda$ in the interior of the dual cone.

Functions

An extended real-valued function is a function whose range is the extended real line. The extended-value extension $\tilde{f}: \mathbb{R}^n → \mathbb{R} ∪ \{+\infty\}$ of a convex function $f : C \to \mathbb{R}$ is the function defined in its containing (Euclidean) space that takes value $+\infty$ outside its original domain.

The graph of a function is the collection of point-value pairs: $\{ (x, f(x)) | x \in \text{dom}f \}$. The epigraph (supergraph) of a function is the collection of points on and above its graph: $\text{epi}f = \{ (x, y) | x \in \text{dom}f, y \ge f(x) \}$.

Consistent definitions of convex functions:

(Jensen's inequality) A real-valued function with a convex domain is convex, if every line segment of the function is dominated by its linear interpolation: $\forall x_1, x_2 \in \text{dom}f, \forall \lambda \in [0,1], f(\boldsymbol\lambda \cdot \mathbf{x}) \le \boldsymbol\lambda \cdot f(\mathbf{x})$, where $\boldsymbol\lambda = (\lambda, 1 - \lambda)$, $\mathbf{x} = (x_1, x_2)$ [@Bertsekas2003, 1.2.2].
(General Jensen's inequality) Given a convex set $C$, function $f: C \to \mathbb{R}$ is convex if for any probability distribution on the domain, function value at the expectation is dominated by the expected function value: $\forall X, \text{supp}(X) \subseteq C: f(\mathbf{E}X) \le \mathbf{E}f(X)$.
(Extension) A real-valued function is convex, if every line segment of its extended-value extension is dominated by its linear interpolation.
(Epigraph) An extended real-valued function with a convex domain is convex, if its epigraph is convex. [@Bertsekas2003, 1.2.4]

A function is concave if its negative is a convex function. While concave set is simply any non-convex set, concave functions exactly mirror convex functions.

Level curve is the feasible set of an equality constraint: $\{x \mid h(x) = 0\}$. Level set or sub-level set is the feasible set of an inequality constraint: $\{x \mid f(x) \le 0\}$. Quasi-convex function is one whose sub-level sets (and thus its domain) are all convex. Quasi-concave function is one whose negative is quasi-convex. Unimodal function is one which is quasi-convex or quasi-concave. Quasilinear function is one whis is both quasi-convex and quasi-concave.

Log-concave/log-convex function is a positive function whose logarithm is concave/convex.

Strongly convex function is a twice-differentiable real-valued function on a closed convex domain of a Euclidean space, whose Hessian is lower bounded by a positive scalar matrix: $\exists m > 0$, $\forall x \in \mathcal{F}$, $\nabla^2 f(x) \succeq m I$. Every m-strongly convex function dominates the quadratic approximations using the m-scalar matrix as the second differential: $\forall x, y \in \mathcal{F}$, $f(y) \ge f(x) + \nabla f(x)^T (y-x) + \frac{m}{2} |y - x|^2$. Every strongly convex function is convex. Gradient dominated function of degree $p$, $p \in [1, 2]$, is a differentiable function on a closed convex domain of a Euclidean space, such that it has a global minimum and its value relative to the minimum is dominated by the norm of its gradient to the power of $p$ [@Polyak1963; @Nesterov2005]: $\exists \tau > 0$, $\forall x \in \mathcal{F}$, $f(x) - f_0 \le \tau |\nabla f(x)|^p$, where $f_0 = \min_{x \in \mathcal{F}} f(x)$. Gradient dominated functions can be nonconvex. Every $λ$-strongly convex function is $\frac{1}{2λ}$-gradient dominated of degree 2.

Given a proper cone $K$ in its domain space, a real-valued function is K-nondecreasing if it is dominated by values in its K-cone: $\forall x, y, x \preceq_K y : f(x) \le f(y)$. The function is K-increasing if it is strictly dominated by values in its K-cone other than the vertex: $\forall x \ne y, x \preceq_K y : f(x) < f(y)$. K-nonincreasing and K-decreasing are similarly defined. As a special case, a real-valued function on the space of symmetric matrices is matrix increasing (decreasing) if it is increasing (decreasing) with respect to the positive semidefinite cone.

Given a proper cone $K$ in its range space, a vector-valued function is K-convex if every line segment of the function is K-dominated by its linear interpolation: $\mathbf{f}(\boldsymbol\lambda \cdot \mathbf{x}) \preceq_K \boldsymbol\lambda \cdot \mathbf{f}(\mathbf{x})$. The function is strictly K-convex if the inequality is strict except on the endpoints. As a special case, a symmetric matrix valued function is matrix convex if it is convex with respect to the positive semidefinite cone.

Given the dual pair of vector spaces $X, Y$ w.r.t a bilinear form $\langle \cdot, \cdot \rangle$, functions $f: X \to \mathbb{R}$ and $f^\star: Y \to \mathbb{R}$ are conjugate if their sum is supported from below by the bilinear form pointwise in the dual space: $f(x) + f^\star(y) \ge \langle y,x \rangle$ (Young's inequality), and $\forall y \in Y, \exists x \in X: f(x) + f^\star(y) = \langle y,x \rangle$. Equivalently, $f^\star(y) = \sup_{x \in X} \{ \langle y,x \rangle - f(x) \}$, also known as Fenchel transformation [@Fenchel1949]. If a function is smooth, strictly convex, and increases at infinity faster than a linear function, its conjugate is equivalent to its Legendre transformation [@Legendre1789]: $f^\star(y) = \langle x^\star, \nabla f(x^\star) \rangle − f(x^\star)$, where $y = \nabla f(x^\star)$.

Examples

Common convex sets:

Linear family
- Hyperplane: ${x | \langle a,x \rangle = b}$;
- Half-space: ${x | \langle a,x \rangle \le b}$;
- Polyhedral set ${x | A x \le b}$: polytope (多胞形), a bounded polyhedral set (a polyhedron is a three-dimensional polytope); simplex (单纯形), $\text{conv}(v_0, \dots, v_k), k \in \mathbb{N}$;
Norm family
- Ellipsoids: $x_c + A B_1(0)$, where $x_c$ is center of ellipsoid, $A$ is a linear transformation, $B_1(0)$ is the unit ball; an alternative definition is $\{x+x_c | \langle x, P^{−1} x\rangle \le 1\}$ where $P$ is a positive semidefinite matrix (generalized norm);
- Norm ball: $\{x | \| x-x_c\| ≤ r \}$, where $\| \cdot \|$ is any vector norm.
- Norm cone: $\{(x,t)~|~\| x \| ≤ t \}$, where $\| \cdot \|$ is any vector norm.
Positive semidefinite cone: $S_+^n = \{X \in S^n | X \succeq 0 \}$, where $S^n$ is the space of $n \times n$ symmetric matrices.

Common convex functions:

Power family
- Power $x^a$ on $\mathbb{R}_+$ if $a \in (-\infty, 0] \cup [1, +\infty)$;
- Power $x^a$ is concave on $\mathbb{R}_+$ for $a \in [0, 1]$;
- Geometric mean $\prod_i x_i^{\frac{1}{n}}$ is concave on $\mathbb{R}_+^n$;
- Quadratic-over-linear $x^2/y$ on $\mathbb{R} \times \mathbb{R}_+$;
Exponential family
- Exponential $e^x$;
- Logarithm $\log(x)$ is concave on $\mathbb{R}_+$;
- Log-sum-exp $\log(\sum_i e^x_i)$;
Entropy family
- Negative entropy $x \log(x)$ on $\mathbb{R}_+$;
- Entropy $\sum_i -P_i \log P_i$;
- Relative entropy (Kullback–Leibler divergence) $\sum_i P_i \log(P_i/Q_i)$;
Norm family
- Any (scalar/vector/matrix/function/operator) norm $\| x \|$;
- Maximum (infinity norm) $\max_i |x_i|$;
Determinant $\det(X)$ is concave on $S_+^n$;
Convex conjugate of any function $f^\star$;

Common non-trivial log-concave and log-convex functions:

Probability family
- Log-concave density functions:
  - Gaussian PDFs: $\phi(K^{-\frac{1}{2}}(x-\mu))$;
  - Exponential PDFs: $\prod_i λ_i e^{-λ_i x}$;
  - Uniform PDFs over a convex set: $\frac{1_C}{m(C)}$;
  - Wishart distribution;
- CDFs of log-concave density functions are also log-concave;
- Moment generating functions $M(z)$ are log-convex, which means cumulant generating functions $\log M(z)$ are convex;
Determinant over trace $\det(X) / \text{tr}(X)$ is log-concave on $S_++^{n}$.
Integration family
- Gamma function $\Gamma(x)$ is log-convex on $x \ge 1$;
- Laplace transform $\mathcal{L}p$ of a nonnegative function is log-convex;

Common matrix monotone functions: (Real-valued functions over the symmetric matrix space.)

Linear function $\text{tr}(WX)$ is matrix nondecreasing if $W \succeq 0$, and is matrix increasing if $W \succ 0$.
Trace of inverse $\text{tr}(X^{−1})$ is matrix decreasing on the positive definite cone.
Determinant $\det(X)$ is matrix increasing on the positive definite cone, and matrix nondecreasing on the positive semi-definite cone.

Common matrix convex/concave functions:

$XX^\text{T}$, here $f: \mathbb{R}^{n \times m} \to \mathbb{R}^{n \times n}$.
Power $X^p$ is matrix convex if $p \in [−1,0] \cup [1,2]$; it is matrix concave if $p \in [0,1]$.

Propositions

Sets

Set relations: $C \subseteq \text{cl}(C) \subseteq \text{conv}(C) \subseteq \text{cone}(C)$

Operations that preserves set convexity: [@Bertsekas2003, 1.2.1]

Unary operations
- Closure and relative interior;
- Affine transformation: scaling, translation, projection, etc;
- Preimage of affine transformation;
- Projective transformation (linear-fractional): $f(x) = \frac{Ax+b}{\langle c,x \rangle + d}$ where $\langle c,x \rangle + d > 0$;
  - Perspective transformation: $P(z,t) = z/t$ where $t>0, z \in \mathbb{R}^n$;
Binary operations
- Minkowski sum;
- Cartesian product;
Arbitrary operations
- Intersection;

Properties of convex set: [@Bertsekas2003, 1.4.1]

Any line segment between two vectors in the closure of a (nonempty) convex set is also in the relative interior of the set except the ends.
(Nonempty) convex set has (nonempty) relative interior.

If a set is closed with nonempty interior and has a supporting hyperplane at every point in its boundary, it is convex.

The dual cone of a proper cone is also proper.

Generated cone is convex.

Recession cone theorem [@Bertsekas2003, 1.5.1]: For a (nonempty) closed convex set,

Its recession cone is closed;
A direction of recession of one vector is also a direction of recession of the set;
Its recession cone is nonzero iff it is unbounded;
Its recession cone is also its relative interior's recession cone;
For any collection of (nonempty) closed convex sets with nonempty intersection, recession cone and intersection are interchangeable.

Caratheodory's theorem [@Bertsekas2003, 1.3.1]: (minimal representation of generated cone and convex hull)

Every vector in the cone generated by a set can be represented as a positive combination of a linearly independent subset of the set;
Every vector in the convex hull of a set can be represented as a convex combination of an affinely independent subset of the set;

Projection theorem [@Bertsekas2003, 2.2.1]: For a (nonempty) closed convex set $C$,

The projection of any vector $x$ on $C$ exists and is unique, where the supporting hyperplane separates $C$ and $x$ and is orthogonal to the connecting vector to $x$;
Projection on the set $P_C$ is a continuous and non-expansive map (weak contractor);
Distance to $C$ is a convex function;

Supporting hyperplane theorem: Any non-interior point of a (nonempty) convex set has some hyperplane that contains the set in one of its closed half-spaces. [@Bertsekas2003, 2.4.1]

Separating hyperplane theorem: Any two disjoint (nonempty) convex sets can be separated by some hyperplane. [@Bertsekas2003, 2.4.2]

Strict separation theorem [@Bertsekas2003, 2.4.3]: Two disjoint (nonempty) convex sets $C_1, C_2$ can be strictly separated by some hyperplane if any of the following holds:

Their Minkowski difference $C_1 - C_2$ is closed;
One closed and one is compact;
Both are closed, and the intersection of their recession cones is just the intersection of their lineality spaces $R_{C_1} \cap R_{C_2} = L_{C_1} \cap L_{C_2}$;
One is closed, the other is polyhedral, and the intersection of their recession cones is a subset of the former's lineality space $R_{C_1} \cap R_{C_2} \subseteq L_{C_1}$;
Both are intersections of finitely many closed convex quadratic constraints: $\{x | \langle Q_i x + a_i, x \rangle \le b_j, i \in 1, \dots, r \}$;

Polyhedral convexity

Polar cone is closed and convex.

Polar cone of a set is also polar cone of the cone generated by the set: $C^∗ = \text{cone}(C)^∗$. [@Bertsekas2003, 3.1.1]

Polar cone theorem: Twice polar cone of a cone is the closure of its convex hull: $C^{ ** } = \text{cl}(\text{conv}(C))$. In particular, closed convex cones are self-dual to polarity: $C^{ ** } = C$.

Cone generated by a finite set is closed. (Nontrivial) The polyhedral cone and the cone generated by the same finite set of vectors are polar to each other: for a finite set $A$, $A^∗ = \text{cone}(A)^∗$, and (Farkas' Lemma) $A^{**} = \text{cone}(A)$. [@Bertsekas2003, 3.2.1]

Minkowski-Weyl Theorem: Any polyhedral cone can be represented as a finitely generated cone, and the converse is also true: $A^∗ = \text{cone}(B)$.

Minkowski-Weyl representation: Given finite sets $A, B$, a polyhedral set can be represented as the Minkowski sum of the convex hull of a finite set and a finitely generated cone; the converse is also true: $P = \text{conv}(A) + \text{cone}(B)$. [@Bertsekas2003, 3.2.2]

[@Bertsekas2003, 3.3.1] For a (nonempty) convex set, extreme points on a supporting hyperplane are also extreme points of the set. A closed convex set has at least one extreme point iff it does not have two opposing recession direction. For a bounded convex set, all extreme points form the minimal set that reproduce the set by convex hull: $C = \text{conv}(E)$, where $E$ is the set of extreme points of $C$.

For a polyhedral set with Minkowski-Weyl representation $P = \text{conv}(A) + \text{cone}(B)$, its extreme points are in $A$. [@Bertsekas2003, 3.3.2]

[@Bertsekas2003, 3.3.3]

Polyhedral proper separation theorem [@Bertsekas2003, 3.5.1]

Generalized inequalities

Properties of generalized inequalities:

Additive: $x \preceq_K y, u \preceq_K v \Rightarrow x+u \preceq_K y+v$;
Nonnegative scaling: $x \preceq_K y, a \ge 0 \Rightarrow ax \preceq_K ay$;
Transitive: $x \preceq_K y, y \preceq_K z \Rightarrow x \preceq_K z$;
Reflexive: $x \preceq_K x$;
Antisymmetric: $x \preceq_K y, y \preceq_K x \Rightarrow x = y$;
Limits: $x_i \preceq_K y_i, i \in N; \lim_i x_i = x, \lim_i y_i = y \Rightarrow x \preceq_K y$;

For a generalized inequality, minimum and maximum elements of a set are unique if they ever exist, while a set can have multiple minimal and maximal elements.

Non-decreasing in each argument is equivalent to K-nondecreasing where $K$ is the nonnegative cone.

First-order conditions for K-monotonicity: A differentiable function on a convex domain is K-nondecreasing iff its gradient is nonnegative in K-dual inequality: $\nabla f(x) \succeq_{K^\star} 0$; It is K-increasing if its gradient is positive in K-dual inequality: $\nabla f(x) \succ_{K^\star} 0$;

First-order conditions for K-convexity: A differentiable function on a convex domain is K-convex iff the function globally K-dominates its linear expansion: $f(y) \succeq_K f(x) + Df(x)(y−x)$, here $f: \mathbb{R}^n \to \mathbb{R}^m$ and $Df(x) \in \mathbb{R}^{m \times n}$. This proposition also holds when both sides are strict.

Composition theorem: A K-nondecreasing convex function of a K-convex function is convex. Note the monotonicity restriction is on the extended-value extension of the outer function.

Convex functions

A convex function can only be discontinuous on its relative boundary.

A function is affine iff it is simultaneously convex and concave.

Operations that preserves (vector-valued) function convexity (or concavity): [@Bertsekas2003, 1.2.4]

Unary
- Minimization over a (nonempty) convex set of a subset of variables: $f(x,y) \to g(x) = \inf_{y \in C} f(x,y)$;
- Perspective, $P: f(x) \to g(x,t) = t f(x/t), t>0$; slice, $g(x,1)=f(x)$;
Binary
- Cartesian product
- Affine transformation on the domain [inner composition]
- Nonnegative affine transformation [outer composition]
- Composition
  1. if the extended-value extension of the outer function is non-decreasing in each argument;
  2. or if the functions are twice differentiable over the containing (Euclidean) space and the outer function is non-decreasing in each argument.
  3. [The composition of an outer non-increasing convex function with an inner concave function is also convex, suitable for both previous criteria.]
Arbitrary
- Point-wise maximum and supremum

Characterization of differentiable convex functions:

First-order conditions: [@Bertsekas2003, 1.2.5]
- A differentiable real-valued function with a convex domain is (strictly) convex iff the function globally (strictly) dominates its linear expansion.
Second-order conditions: [@Bertsekas2003, 1.2.6]
- A twice continuously differentiable real-valued function with a convex domain is (strictly) convex if its second derivative everywhere on the domain is positive (semi)definite.
- If the function is convex and its domain has nonempty interior, then its second derivative everywhere in the interior of the domain is positive semidefinite.

Quasi-convex functions

Operations that preserve quasi-convexity:

Nonnegative weighted maximum;
Outer composition with a nondecreasing function;
Linear-fractional transformation on the domain;
Minimization over a (nonempty) convex set of a subset of variables;

A quasilinear function has convex level sets; the converse does not hold.

A quasi-convex function $f(x)$ can be represented by a family of convex functions $\phi_t(x)$, where the t-sublevel set of $f$ is the 0-sublevel set of $\phi_t$. Such representation is not unique.

Log-concave functions

Relations:

A function is log-convex iff its reciprocal is log-concave.
A function is log-concave/convex iff it is the exponential of a concave/convex function.
Log-convex functions are convex.
Nonnegative concave functions are log-concave.

Properties:

For a twice differentiable function with a convex domain:
- It is log-convex iff $f(x) \nabla^2 f(x) \succeq \nabla f(x) \nabla f(x)^{\text{T}}$;
- It is log-concave iff the above inequality changes direction;
Log-convexity and log-concavity are closed under multiplication and positive scaling.
Log-convexity is preserved under sums.
Log-concavity is preserved under integration over subspaces, if domain covers the full space.
- Log-concave probability densities have log-concave marginal densities.
- Log-concavity is closed under convolution.

Conjugation

Transformation properties of conjugation:

Affine transformation on the domain: $(f(Ax + b))^\star = f^\star(A^{−\text{T}} y) − \langle b, A^{−\text{T}} y \rangle$;
Sum of independent functions: $(f_1(u) + f_2(v))^\star = f_1^\star(w) + f_2^\star(z)$;

A function is closed if its epigraph is a closed set; or equivalently, if all its sub-level sets are closed. The conjugate function of any function is closed and convex. Almost every convex function can be expressed as the conjugate of some function.

An extended real-valued function is lower semi-continuous at a point if the function values in the point's neighborhood are either close to or greater than the function value at that point: $\liminf_{x \to x_0} f(x) \ge f(x_0)$. The conjugate of a function is always lower semi-continuous.

The convex closure of a function is the function whose epigraph is the closure of the convex hull of the original function. The conjugate of the conjugate (twice conjugate) of a function is its convex closure: $\text{epi}f^{\star\star} = \text{cl}(\text{conv}(\text{epi}f))$, note that $f^{\star\star} \le f$. Fenchel–Moreau theorem: If a function is convex and closed, its twice conjugate is itself: $f^{\star\star} = f$.

🏷 Category=Analysis