Game Theory

Play by rules and beyond the rules.

Game theory is the study of mathematical models of conflict and cooperation among rational decision-makers. [@Myerson1991] It is an advanced form of decision theory: while decision theory formulate optimization problems as if only one agent makes the decision, game theory analyzes decentralized/independent decision makers with interacting interests, who only care about their own objectives. That's how game theory get its other names, such as strategic decision making, interactive decision theory, and the logical side of decision science.

In essence, an optimization problem (or decision making problem) presents a real-valued function and asks to find the optimum set in a typically compact domain; while a game is a vector-valued function $ G: \prod_{i=i}^n S_i \to \mathbb{R}^n $, with domain the Cartesian product of $n$ sets and range the $n$-dimensional Euclidean space, and the problem asks to find the equilibrium set in the domain, where an equilibrium by a certain definition has components optimizing their corresponding functions. Although multi-attribute decision making problems also take vector values, it imposes its own definitions of optimality. As the most commonly used equilibrium concept in game theory, pure strategy Nash equilibrium can be stated as: $ G_i( s^* ) = \max_{s_i} G_i(s_i, s_{-i}^* ), \forall i \in N $. The distinction in solution concepts also shows up in solution algorithms: gradient methods in optimization, and best-response dynamics in games (which sequentially move along player strategy sets).

The mathematical theory of games can be understood in two distinct ways: as rules and as models. As rules of artificial setups, games exactly specify the strategy sets and payoffs of each player, as is the case for chess and auction. As models of real-world cooperation and conflicts, games need meticulous care to truthfully represent the possible actions and retribution of participating parties, as exemplified in Tenancy Game.

Charles Waldegrave in 1713 provided a minimax mixed strategy solution to a two-person version of the card game le Her. Antoine Augustin Cournot in 1838 considered a duopoly competition problem and provided a stable equilibrium solution (sink of a discrete dynamical system). John von Neumann in 1928 proved the existence of mixed strategy equilibria (mutually consistent solutions) in two-person zero-sum games with perfect information, known as minimax theorem, using Brouwer fixed-point theorem. von Neumann and Morgenstern in 1944 published a collection of work establishing the field of game theory, providing methods for finding equilibria, extending minimax theorem for games with imperfect information and cooperative games of more than two players; its second edition in 1947 appended an axiomatic theory of utility, as an independent discipline. {von-Neumann1947} declared that economic theory needed to use functional analytic methods, especially convex sets and topological fixed-point theorem, rather than the traditional differential calculus, because the maximum-operator did not preserve differentiable functions. Merrill Flood and Melvin Dresher in 1950 at RAND Corporation developed a game theoretical model of cooperation and conflict known as the Prisoner's Dilemma, for possible applications to global nuclear strategy. Also in 1950, John Nash developed a criterion for mutual consistency of players' strategies known as Nash equilibrium, and proved its existence in n-player non-cooperative games using Kakutani fixed-point theorem.

Game Models

Game $G = \{N, S, \succeq\} $ is a collection of players, strategy space, and preferences:

Player $i \in N = \{1, \dots, n\}$: a participant in a game who has a nontrivial action space and choose strategies based on payoffs;
Strategy $s_i \in S_i$: a complete set of actions a player takes at all possible positions of a game;
- Action (or allocation, move) $a_i \in A_i$: one of the possible alternatives a player can choose;
- Strategy profile $s \in S = \prod_i S_i$ is a collection of all players' strategies;
Preference $\succeq_i$ or payoff $u_i$: the reward for a player given an outcome at the end of a game;

In static games of complete information, strategy and action are equivalent. A strategy is pure if only one action is chosen at each position of the game. A strategy is mixed if the player chooses probabilistically from the action space of at least one position. When a player's strategy space $ S_i $ is finite, the mixed strategy space of the player is a simplex, which is compact and convex, with vertices being all the pure strategies: $ \Delta_i = \Delta^{|S_i| - 1} $. The notation for mixed strategy space $ \Delta = \prod_i \Delta_i $ of a game is different from the pure strategy space $S$.

As vector-valued functions, a no-conflict game is $ G_i(s) = P_i(s_i) $, which is equivalent to independent optimization; a dummy game is $ G_i(s) = Q_i(s_{-i}) $; a strategically separable game is $ G_i(s) = P_i(s_i) + Q_i(s_{-i}) $, the linear combination of a no-conflict and a dummy game; an identical interest game (or perfect coordination) is $ G_i(s) = P(s) $; a coordination-dummy separable game is $ G_i(s) = P(s) + Q_i(s_{-i}) $ [@Slade1994]. Condition $ \forall i \in N $ is implicit in all previous propositions.

Number of players

The simplest games are defined as two-player games. An n-player game is a game consistently defined for two or more players. A game is atomic if it has a finite number of players; it is non-atomic (large, oceanic) if it has a continuum of players.

Two-player games are mathematically simpler, and have been extensively studied. Many results for two-player games are readily extensible to n-player situations, just like extending pairs of random variables to random vectors.

Many intuitive ideas can be examined in two-player case and then generalized, like fixed-point theorem and its generalization under super-modularity. But a few important results only hold for games with at least three players, just like chaos in 3D dynamic systems, although harder to find.

Preference and utility

Every player in a game has a preference ordering over the strategy space, which is modeled mathematically as a totally ordered set or a real-valued function. In practice, two types of utility functions are used: ordinal utility and cardinal utility. Utility is a mapping of preferences to real numbers. Comparing probability space $ (U,\Sigma,P) $ and utility $ (N,S,u) $, both are extrinsically assigned rather than inherent attributes.

An ordinal utility function represents a preference ordering over the pure strategy space, without imposing a metric: $ \forall a, b \in S, a \preceq b \iff u(a) \leq u(b) $. Ordinal utility suffice for computing pure strategy Nash equilibrium. Debreu theorems claim the existence of continuous ordinal utility function for any preference relation that is complete, transitive and continuous [@Debreu1954]; and the existence of additive ordinal utility function for any preference relation with preferentially independent subsets [@Debreu1960]. For a given preference ordering, the ordinal utility function is unique up to strictly increasing transformation.

A cardinal utility function represents a preference ordering over the generated cone of the strategy space, associating a metric: $$\forall a, b \in \mathbb{R}_{\ge 0}^{|S|}, a \cdot S \preceq b \cdot S \iff u(a) \leq u(b)$$. Cardinal utility must be employed for mixed strategy calculations. The von Neumann–Morgenstern (VNM) utility theorem [@Neumann1947] constructively claims the existence of cardinal utility function for a player satisfying the four axioms of VNM-rationality: completeness, transitivity, continuity, and independence. For a given preference ordering, the cardinal utility function is unique up to a strictly increasing linear transformation. In other words, the set of cardinal utility functions representing a decision maker's preference form an open half-plane, with positive scale and arbitrary shift.

A set of games with the same strategy space are fully equivalent in mixed strategy (strategically equivalent?) if their (cardinal) payoff functions for each player represent the same preference. [@Myerson1991] In other words, given a strategy space $ S $ and player preferences $ \succeq $, the set of fully equivalent games (in mixed strategy) $ \Gamma_{S, \succeq} $ form a product space of $n$ open half-planes (or the product space of a positive cone and an Euclidean space, both $n$-dimensional): $ \forall G \in \Gamma_{S, \succeq} \subset \{ G: S \to \mathbb{R}^n \} $, $ \forall a \in \mathbb{R}^n_+, b \in \mathbb{R}^n, \text{diag}(a)~G + b \in \Gamma $.

While homogeneous preferences are naturally comparable, heterogeneous (multi-dimensional) preferences may only be partially ordered, just like positive semi-definite matrices. And sometimes we may aggregate heterogeneous axes into one axis and hence completely ordering the strategy space. But any such aggregate function over heterogeneous measures is subjective/arbitrary and shall not expect universal acceptance. In economics, the convention has been using monetary value as the universal measure of use values across goods and individuals.

Perturbation by small numbers don't change real-world phenomena significantly; it only does in the realm of theoretical models. Real data have noises and human mind do round-ups; sharp categorical transitions are caused by model simplification. Moreover, some solution concepts such as mixed strategy Nash equilibrium are susceptible to extreme values of utility. In such cases, the choice of solution concepts should be reconsidered.

As a side note, the meanings of "ordinal" and "cardinal" in utility theory are very different from those in set theory. An ordinal is a set strictly well-ordered with respect to set membership $ \in $, and every element of it is also a subset of it. {von-Neumann1923} Finite ordinals can be denoted by a natural number; the smallest infinite ordinal $ \omega $ is the order type of natural numbers. The cardinal number of a well-ordered set is the smallest ordinal number equinumerous (等势) to the set. Finite cardinals can be denoted by a natural number; the smallest infinite cardinal $ \aleph_0 $ is the cardinality of natural numbers.

Information set

(First order) mutual knowledge among a set of players is a piece of information that all players know. A piece of information is $m$-th order mutual knowledge among a set of players if the $(m-1)$-th order mutual knowledge about this information is also mutual knowledge. Common knowledge (共有知识) among a set of players is a mutual knowledge of infinite order of recursion. {Binmore and Brandenburger, 1988}

Table: Information space of two players with common knowledge ($I$ denotes the common knowledge; Each lowercase letter represents the player of the acquired information; each sequence represents a piece of compound information.)

Order of recursion	0	1	2	...
Player A	aI	abI	abaI	...
Player B	bI	baI	babI	...

That all players are rational is typically assumed to be a common knowledge, known as the common knowledge of rationality.

A game is of complete information if its definition is common knowledge to all players. Otherwise, the game is of incomplete information, which is first studied in [@Harsanyi1967] where the game's payoffs depend upon states not all players know.

Cooperative vs. Non-cooperative

A game is cooperative if players may negotiation contracts enforceable through an outside party before the game. The default model of a game is non-cooperative, where entities not explicitly included cannot enforce contracts for the players. The distinction between cooperative and non-cooperative games is unrelated to the mathematical description of a game, but rather the possibility of pre-play communication, coalition, and side-payment (post-game exchange). [@Nash1951]

A cooperative game is given by stating a value for every subset of players, called coalition: $v: 2^N \to \mathbb{R} $. The value of a coalition is then shared among its members. A payoff vector is efficient if it exactly splits the total value $v(N)$. A payoff vector is individually rational if no player receives less than what he could get on his own, i.e. $v(\{i\})$. An efficient and individually rational payoff vector is called an imputation.

The main assumption in cooperative game theory is that the grand coalition $N$ will form. If players split off and form smaller coalitions, we can apply solution concepts to the subgames defined by whatever coalitions actually form. The challenge of solving cooperative games is then to find stable imputations.

A coalition is α-effective for an outcome if its members have strategies to force that outcome of the game, regardless of its complement's strategy. A coalition is β-effective for an outcome if for any strategies of its complement, its members can respond with strategies that ensure that outcome.

Solution Concepts

In game theory, a solution of a game is a strategy profile that may be adopted by players. A solution concept specifies for any game a subset of its strategy space that are considered solutions: $ \forall G \in \Gamma, F(G) \subseteq S_G $.

The value of a game is the payoff vector given some solution concept, if determinable.

Minimax

Brouwer fixed-point theorem [@Brouwer1911]: Any continuous function from a compact convex set into itself has a fixed point.

Minimax solution of two-player zero-sum games (backward induction) {von-Neumann1928}.

A two-person zero-sum game can be considered solved on several levels: [@Allis1994]

Ultra-weak: for the initial position, the game-theoretic value (win/lose, draw) has been determined;
Weak: for the initial position, a strategy has been determined to obtain at least the game-theoretic value for both players, under reasonable resources;
Strong: for all legal positions, such a strategy has been determined;

Nash equilibrium

A Nash equilibrium (NE) is a strategy profile that no player has incentive to deviate unilaterally [@Nash1950]. A pure strategy Nash equilibrium (PSNE) is a Nash equilibrium in the pure strategy space, while a mixed strategy Nash equilibrium (MSNE) is a Nash equilibrium in the mixed strategy simplex that is not a vertex. Symbolically, a pure strategy Nash equilibrium of a game is a strategy profile $ s^* \in S $ such that $ u_i( s_i^* , s_{-i}^* ) \ge u_i( s_i, s_{-i}^* ), \forall i \in N, s_i \in S_i $; Mixed strategy Nash equilibrium can be similarly formalized, replacing $ S $ with $ \Delta $. A (pure strategy) Nash equilibrium is strict if the inequalities are strict.

Nash equilibrium has nothing to do with saddle points: saddle point is a local property, but Nash equilibrium is a component-wise argument-wise global property.

Kakutani fixed-point theorem [@Kakutani1941]: Any correspondence on a compact convex set with convex images and a closed graph has a fixed point.

A correspondence between two sets is any subset of their Cartesian product: $ R \subseteq X \times Y $. The image of an element of the former set with respect to a correspondence is the subset of the latter set whose elements are paired with the element in the correspondent: $ \text{Im}_R(x) = R \cap (\{x\} \times Y) = \{ y \in Y | (x, y) \in R \} $ The best-response correspondence of a player is a correspondence from the opponents' strategy space into the player's strategy space, such that the image of any opponent strategy with respect to the correspondence is the set of the player's optimal strategies in response: $B_i \subseteq S_{-i} \times S_i = S $, $ \text{Im}_{B_i} (s_{-i}) = \arg \max_{s_i} u_i (s_i, s_{-i}) $. The best-response correspondence of a game is a correspondence from its strategy space into itself, such that the image of any strategy profile with respect to the correspondence is the Cartesian product of each player's best-response set given the opponents' strategy: $B \subseteq S \times S $, $ \text{Im}_B (s) = \prod_i \text{Im}_{B_i}(s_{-i})$.

Games with the same strategy space are best-response equivalent if they have the same best-response correspondence.

Finite games have mixed strategy Nash equilibrium [@Nash1950]: the mixed strategy space of a finite game is the Cartesian product of simplices (probability distribution over finite set), thus compact and convex; mixed strategy payoffs are polylinear forms, linear in each argument, so every mixed strategy profile has convex best-responses and thus convex images with respect to best-response correspondence; since mixed strategy payoffs are polylinear thus continuous, best-response correspondence has a closed graph; by Kakutani fixed-point theorem, the previous conditions implies existence of fixed point, thus mixed strategy Nash equilibrium.

A continuous game is a game with compact player strategy spaces and continuous payoff functions. Continuous games have mixed strategy Nash equilibrium, by Kakutani-Glicksberg-Fan theorem {Glicksberg1952, Fan1952}. A convex game is a game where each player has a convex strategy space and a concave payoff function $ u_i(s_i; s_{-i}) $ for all opponent strategies. A convex game has pure strategy Nash equilibrium, if it has a compact strategy space and continuous payoff functions [@Nikaido1955]; the reasoning directly follows that of [@Nash1950]. Strictly convex/concave games are convex games with diagonally strictly concave payoffs: $\sum_i (s'_i - s_i) \cdot (\nabla_i u_i(\mathbf{s}) - \nabla_i u_i(\mathbf{s'})) > 0$. Strictly convex/concave games have unique pure strategy Nash equilibrium, which is also globally asymptotically stable in a system of differential equations [@Rosen1965].

Refinements

Equilibrium refinement (narrowing) shrinks the space of equilibria as much as possible, ideally to a unique equilibrium.

A focal point is a solution with features prominent to all players, which all players also believe to be a 3rd order mutual knowledge [@Schelling1960]. Thus focal points are most likely to be chosen by the players in the absence of communication.

subgame perfect equilibrium [@Selten1965] trembling hand perfect equilibrium [@Selten1975] proper equilibrium [@Myerson1978]

Bayesian equilibrium [@Harsanyi1967] perfect Bayesian equilibrium

A Nash equilibrium is strong if no coalition can cooperatively deviate to benefit all its members, given opponents' strategies [@Aumann1959]. Strong Nash equilibrium are Pareto-efficient. A Nash equilibrium is coalition-proof if no backward coalition can cooperatively deviate to benefit all its members, fixing opponent strategies one by one [@Bernheim1987].

Stable equilibrium is a refinement of Nash equilibrium so that every game has a unique set solution that is connected, not dominated, containing backwards induction equilibrium, invariant in strategically equivalent games, and containing a common solution of games eliminating any dominated strategy [@Mertens1986].

Generalizations

A correlated equilibrium is a joint probability distribution over the strategy space where no player has incentive to deviate unilaterally [@Aumann1974]. Correlated equilibrium is more general than mixed strategy Nash equilibrium, because the latter can only be independent distributions over each player's strategy space.

For finite games, a regular quantal response function is a continuously differentiable function that normalizes pure strategy payoffs into completely mixed strategies, so that strategies with higher payoffs have higher probability, and payoff increase in each individual strategy increases the corresponding probability. A regular quantal response equilibrium (R-QRE) of a normal-form game is a fixed point of the regular quantal response function. quantal response equilibrium [@McKelvey1995] boundedly-rational noisy decisions the discrete choice process where it is common knowledge that players maximize estimated expected payoffs. QRE selects a unique Nash equilibrium as logit error diminishes. For any regular quantal response function there is a regular QRE.

The rationalizable set of actions in a normal-form game is the set of actions remained after progressive elimination of actions which are never a best-response to the opponents' any rational action {Bernheim1984, Pearce1984}. An outcome is Pareto efficient (or Pareto optimal) if it is not dominated under any player's preference. In comparison, an outcome is Hicks optimal if it has the largest sum of payoffs. For one player, a strategy is strictly dominated by another if the latter is strictly better than the former under any strategy of the opponents: $ u_i(s_i, s_{-i}) > u_i(s_i', s_{-i}), \forall s_{-i} \in S_{-i} $. If the inequality is only strict for some strategy of the opponents, the strategy is called weakly dominated by the other. For two-player games, the set of all rationalizable strategies can be found by progressive/iterated elimination of strictly dominated strategies (ISEDS).

Game Dynamics: Evolution and Learning in Games

The interpretation of game theoretic solution concepts without game dynamics is tricky. Depending on the actual dynamics, real-world decisions may never converge to any of these equilibria.

Despite various solution concepts, the essential question for the conflict and cooperation for self-interests is what kind of knowledge you would like to know about it. The positive view asks mass action: What would be the average behavior if players learn the structure of a game while playing it repeatedly? The normative view asks rational prediction: What should we expect to observe in a game played by rational players who are fully knowledgeable about the structure of the game and each others' rationality? The positive view suggests massive simulation of a game and see the successful strategies selected by the rules. The normative view have made game theory dauntingly complicated, while still struggles to predict real-world problems. It is totally acceptable for distributed decisions in real life to lack a unique strategy profile with special properties of any sense. In such cases, players can only resort to negotiation/bargaining and power play.

Without stability, game-theoretic equilibrium is meaningless. Equilibrium stability is defined by a dynamics on the strategy space.

Dynamic models roughly fall into three classes: evolutionary dynamics, heuristic dynamics, and Bayesian learning dynamics; each more demanding on rationality [@Hart2005].

Population game and evolutionary dynamics

A population game is a symmetric game with finite player strategies and a large number of players. Since player payoff in symmetric games only depends on self strategy and the set of opponent strategies, which is equivalent to the distribution of strategies give a finite player strategy set, a population game can be fully specified by the payoff to each strategy given the strategy distribution. A strategy distribution is monomorphic if it is pure; otherwise it is polymorphic. The strategy distribution space approximates the simplex over the player strategy space as the number of players increases, so a population game can be formally expressed as a continuous vector-valued function on a simplex (of the lower dimension): $ F: \Delta^{m-1} \to \mathbb{R}^m $, where $m = |S|$ is the number of player strategies.

Evolutionary game theory was originated by {Maynard-Smith1973, 1982}, who analyzed the evolution of population strategy in a repeated matching symmetric matrix game, where every player is hard-wired with a pure strategy and reproduces proportionally to its payoff. These games are linear population games: $ F(x) = A x $, where $A$ is the payoff matrix. Congestion games is another example of population games.

Since each player has a negligible influence on the strategy distribution, a Nash equilibrium of population games is a strategy distribution under which all chosen strategies have the same payoff no worse than the other strategies: $ F_i( x^* ) \ge F_j( x^* ) $, $ \forall i,j \in S, x_i^* > 0 $; a strict Nash equilibrium is a pure state that has a better payoff than the other strategies: $ F_i( e_i ) \ge F_j( e_i ) $, $ \forall j \in S, j \ne i $. A restricted Nash equilibrium is a Nash equilibrium of the game where only strategies in the support of the point can be played. A strategy distribution is a Nash equilibrium if and only if the payoff vector at that point is in the polar cone at that point: $ x \in \text{NE}(F) \Leftrightarrow F(x) \in \text{cone}_x (X)^* $, where $ X = \Delta^{m-1} $ and $ \text{cone}_x (X)^* = \{ z \mid (y - x) \cdot z \le 0, \forall y \in X \} $. Population games have Nash equilibrium.

A strategy distribution of a population game is an evolutionarily stable state (ESS) if the following three equivalent conditions hold:

It is a Nash equilibrium that is a better-response to all nearby best-responses {Maynard-Smith1973}: $ \forall y \in X, x \cdot F(x) \ge y \cdot F(x) $, and $ \exists \varepsilon > 0: \forall y \in \check{U}_\varepsilon (x) $, $ y \cdot F(x) = x \cdot F(x) \Rightarrow y \cdot F(y) < x \cdot F(y) $;
It has a uniform invasion barrier [@Taylor1978]: $ \exists c > 0, \forall \varepsilon \in (0,c), \forall y \ne x $, $ x' = (x + \varepsilon y) / (1 + \varepsilon): y \cdot F(x') < x \cdot F(x') $;
It is an (infinitesimal) local invader [@Hofbauer1979]: $ \exists \varepsilon > 0: \forall y \in \check{U}_\varepsilon (x), y \cdot F(y) < x \cdot F(y) $;

The equivalence of these three definitions means that for population games: An ESS is a Nash equilibrium; A strict Nash equilibrium is an ESS. ESS may not exist for some population games.

A state is a regular ESS [@Taylor1978] if it is a quasi-strict Nash equilibrium: $ F_i(x) > F_j(x), \forall i,j \in S, x_i > 0, x_j = 0 $; additionally, $ z^\text{T} \nabla F(x) z < 0, \forall z \ne 0, \sum_{x_i > 0} z_i = 0, \sum_{x_j = 0} z_j^2 = 0 $.

state stochastically stable in the small noise limit [@Foster1990] stochastically stable in the large population limit [@Binmore1995]

The projection dynamics [@Nagurney1997]: $ \dot{x} = M_1 F(x) $, where $ M_1 = I - \mathbf{1} \mathbf{1}^\text{T} / n $ is the projection onto the tangent space of the simplex $X$.

A population game is weakly contractive if the distances between states are non-expansive under the projection dynamics: $ \frac{\text{d}}{\text{d} t} \Vert y - x \Vert^2 = 2(y-x) \cdot (F(y) - F(x)) \le 0 $. As special cases, a population game is strictly contractive if the the inequality is strict for all $ x \ne y $. A population game is conservative if the equality always holds.

A strategy distribution of a population game is a globally neutrally stable state (GNSS) if all trajectories under projection dynamics do not move away from it: $ (x - y) \cdot F(y) \ge 0, \forall y \in X $; it is a globally evolutionarily stable state (GESS) if all trajectories under projection dynamics move closer it: $ (x - y) \cdot F(y) < 0, \forall y \in X, y \ne x $. GNSS is a convex set, because it is the intersection of half spaces. GNSS is a subset of Nash equilibrium. For contractive games, Nash equilibrium is the same as GNSS, thus convex.

Revision protocol and evolutionary dynamics

A revision protocol assigns each population game a function from population states to a conditional switch probability matrix: $ \rho: \mathcal{F} \times X \to P_m $, where $ P_m $ is the set of m-dimensional (right) stochastic matrices. A revision protocol is reactive if it depends on the game only through the current payoff: $ \rho(F, x) = \rho(F(x), x), \forall x \in X, F \in \mathcal{F} $. Reactive revision protocols can be simplified as: $ \rho: \mathbb{R}^m \times X \to P_m $. A revision protocol is prospective if it is not reactive.

A deterministic evolutionary dynamics assigns each population game a Lipschitz continuous function: $ V: \mathcal{F} \to \mathcal{D} $. System states thus evolves according to $ \dot{x} = V^F(x) $, with $ V^F(x) \in TX(x), \forall x \in X $. A deterministic evolutionary dynamics on a population game can be seen as the mean dynamics of the stochastic evolutionary process induced by a revision protocol: $ V^F(x) = \rho(F, x)^\text{T} x - x \circ \rho(F, x) \mathbf{1} $ In this sense, revision protocols serve as the microfoundation of deterministic evolutionary dynamics.

Two incentive restrictions may be imposed on evolutionary dynamics: positive correlation (PC) requires that growth rates are in acute angle with payoffs, unless at critical points (aka rest points) of the dynamics: $ V^F(x) \cdot F(x) > 0, \forall x \in X, V^F(x) \ne 0 $; strong positive correlation in a subset of the strategy distribution space requires that the angle between growth rates and payoffs are bounded above by an acute one: $ \exists c > 0: V^F(x) \cdot F(x) \ge c \Vert V^F(x) \Vert \Vert F(x) \Vert, \forall x \in Y \subseteq X, V^F(x) \ne 0 $; Nash stationarity (NS) requires the Nash equilibrium of the game $F$ and the critical points of the dynamics $V^F$ coincide: $ \text{NE}(F) = \{ x \mid V^F(x) = 0 \} $.

Table: Families of deterministic evolutionary dynamics

Family	Revision Protocol $\rho(\pi, x)$	Mean Dynamics $V^F(x)$
Imitative	$ \left(x_j r_{ij}(\pi, x)\right)_{ij}$	$ x \circ (r^\text{T} - r) x $
Excess payoff	$ \mathbf{1} \tau(\hat{\pi})^\text{T} $	$ \tau(\hat{F}(x)) - \mathbf{1} \cdot \tau(\hat{F}(x)) x $
(Perturbed) Best-response	$ \mathbf{1} \tilde{M}(\pi)^\text{T} $	$ \tilde{M}(F(x)) - x $

An imitative protocol is a revision protocol that is proportionate to the adoption and payoff of the opponent strategies, such that conditional imitation rates $r_{ij}(\pi, x)$ is net monotone: $ \pi_j \ge \pi_i \Leftrightarrow \Delta r_{kj} \ge \Delta r_{ki} $, where $ \Delta r_{ki} = r_{ki} - r_{ik} $ is the net rate of imitation. Some special forms of conditional imitation rates $ r_{ij} $ include: imitation via pairwise comparisons, $ \phi(\pi_j - \pi_i) $ with $ \text{sgn}(\phi(x)) = \text{sgn}(_+) $, such as pairwise proportional imitation [@Helbing1992] $ \phi(x) = _+ $ in replicator dynamics [@Taylor1978]; imitation driven by dissatisfaction, $ a(\pi_i) $ with $a$ decreasing; imitation of success, $ c(\pi_j) $ with $c$ increasing; imitation of success with repeated sampling, $ c(\pi_j) / \bar{c} $ where $ \bar{c} = x \cdot c(\pi) $, such as $ c(\pi_j) = \pi_j $ in Maynard Smith replicator dynamics {Maynard-Smith1982} and $ c(\pi_j) = \exp(\eta^{-1} pi_j) $ in imitative logit dynamics [@Weibull1995].

Comparative protocols only depend on comparative payoffs of opponent strategies. Excess payoff is the difference between strategy payoff and population average payoff: $ \hat{\pi} = \pi - x \cdot \pi $. An excess payoff protocol is a Lipschitz continuous function of and in acute angle with the excess payoff: $ \tau(\hat{\pi}) \cdot \hat{\pi} > 0 $, such as $ \tau(\hat{\pi}) = [\hat{\pi}]_+ $ in the Brown-von Neumann-Nash dynamics (BNN) [@Brown1950]. A pairwise comparison protocol is a Lipschitz continuous protocol with sign preservation: $ \text{sgn}(\rho_{ij}) = \text{sgn}([\pi_j - \pi_i]_+) $, such as $ \rho_{ij} = [\pi_j - \pi_i]_+ $ in the Smith dynamics {Maynard-Smith1982}. While excess payoff dynamics require population average payoff to be mutual knowledge, pairwise comparison dynamics need much less information.

The best-response protocol is a target protocol where all strategies switch consistently with the maximizer correspondence, which is the set of optimal mixed strategies given a payoff vector: $ M(\pi) = \arg\max_{x \in X} x \cdot \pi $. The best-response dynamics can be seen as continuous-time fictitious play (CTFP). A perturbed best response protocol [@Fudenberg1998] is a target protocol with a perturbed best response function that smoothly approximates the maximizer correspondence: $ \tilde{M}^\nu (\pi) = \arg\max_{x \in X} x \cdot \pi - \nu(x) $, where $ \nu(x) $ is a smooth deterministic perturbation on payoff that is positive definite at interior points and steep near the boundary: $ z^\text{T} \nabla^2\nu(x) z > 0, \forall x \in \text{ri}(X), \forall z \in TX $, and $ \lim_{\varepsilon \to 0+} \inf_{U_\varepsilon(\text{rb}(X))} \Vert \nabla \nu(x) \Vert = +\infty $; such as the negated entropy function $ \nu(x) = \eta x \cdot \log x $ in the logit choice function $ \tilde{M}^\nu (\pi) = \exp(\eta^{-1} \pi) / Z(\pi) $ with noise level $ \eta > 0 $, where the normalizing constant $ Z(\pi) = \exp(\eta^{-1} \pi) \cdot \mathbf{1} $.

Table: Basic deterministic evolutionary dynamics

Name	Family	Revision Protocol $\rho(\pi, x)$	Mean Dynamics $V^F(x)$
CTFP	best-response	$ \mathbf{1} M(\pi)^\text{T} $	$ M(F(x)) - x $
logit	perturbed BR	$\mathbf{1} \exp(\eta^{-1} \pi)^\text{T} / Z(\pi)$	$ \exp(\eta^{-1}F(x)) / Z(F(x)) - x $
BNN	excess payoff	$\mathbf{1} [\hat{\pi}]_+^\text{T}$	$[\hat{F}(x)]_+ - \mathbf{1} \cdot [\hat{F}(x)]_+ x$
Smith	pairwise comparison	$ \left( [ \pi_j - \pi_i ]_+ \right)_{ij} $	$[F(x) \mathbf{1}^\text{T} - \mathbf{1} F(x)^\text{T}]_+ x + \\ [F(x) \mathbf{1}^\text{T} - \mathbf{1} F(x)^\text{T}]_- \mathbf{1} \circ x$
replicator	imitative	$ \left(x_j [ \pi_j - \pi_i ]_+ \right)_{ij} $	$ \hat{F}(x) \circ x$

† Notation $ _+ = \max\{x, 0\} $.

Continuity: only the best-response dynamics is not continuous.

Incentive properties: All have positive correlation and Nash stationarity, except that the replicator dynamics (and imitative dynamics in general) does not have NS, and perturbed best-response dynamics have PC and NS for virtual payoffs.

Adjustment process and heuristic learning

Criteria of learning rules: simplicity, performance, robustness.

Performance of learning rules:

Universally consistent (Hannan consistent; the worst-case analysis): the time average of utility generated by the learning rules asymptotically attains as much utility as if the frequencies are known in advance, uniformly over all possible probability laws over observations.
- Universally consistent learning rules exist [@Hannan1957].
- FP is not universally consistent [@Fudenberg1993], but only fails when the opponent plays so as to keep the player near indifferent [@Fudenberg1995].
- No deterministic learning rule can be universally consistent in all games against all possible opponents' rules.
- SFP is universally consistent [@Fudenberg1995].
Universally calibrated (asymptotically calibrated): the time-average regret (loss in periods playing a strategy rather than the actual best response to the frequency over those periods) for each action goes to zero.
- Universally calibrated learning procedures exists [@Foster1998], and if all players follow such rules, the time average of the play must converge to the correlated equilibria [@Foster1997].
- Regret matching is universally calibrated [@Hart2000].
Testing

Convergence to Nash equilibrium: Uncoupled and stationary deterministic continuous-time adjustment systems cannot be guaranteed to converge to equilibrium in a game [@Hart2003].

A learning rule is completely uncoupled if it does not directly depend on opponents' actions or payoffs. Recency: recent observations might get more weight than older observations. Asymptotic performance: Hannan consistency (doing well in stationary environments); calibration.

Heuristics ("rules of thumb") are behavior rules that are simple and myopic. A heuristic is adaptive if it makes better responses to the situation [@Hart2005].

Fictitious play (FP) [@Brown1951] is any learning rule where each player believes that opponent strategies are randomly sampled from an unknown multinomial distribution, and plays in best-response to the parameter distribution started from a Dirichlet prior and updated by Bayesian inference. [@Brown1951] originally proposed fictitious play to justify mixed strategy Nash equilibrium, where players simulate the game to learn their optimal strategies.

Paths generated by fictitious play terminate at strict Nash equilibrium.
Paths generated by fictitious play converge at pure strategy Nash equilibrium.
Paths generated by fictitious play converge in time average at mixed strategy Nash equilibrium; this is true for two-player zero-sum games, two-player two-strategy games, games solvable by iterated strict dominance, and potential games.

Stochastic fictitious play (SFP; smooth fictitious play) [@Fudenberg1993] differs from FP by using a stochastic (smooth) best-response function, as if payoffs are perturbed by random shocks. In some classes of games, SFP leads to convergence to an approximate Nash equilibrium, approximately justifying the assumption. In other games, SFP leads to stable cycles. SFP is universally consistent (Hannan consistent) in the sense that its time-average payoff is at least as good as maximizing against the time average of opponents' play.

Stochastic approximation relates discrete-time stochastic processes such as fictitious play to deterministic differential equations.

A variety of algorithms that extrapolate, look for patterns, or engage in systematic experimentation can lead to Nash equilibrium. Calibrated algorithms assure global convergence at least to the set of correlated equilibrium. regret matching [@Hart2000]

If players are bounded in rationality, memory or computational ability, then even greater departures from Nash equilibrium can occur. Reinforcement learning [@Roth1995]: mixed strategy weights are determined by the aggregated past performances of a player's strategies. players' mixed strategies evolve according to the replicator dynamic. Reinforcement learning is closely related to imitative learning processes: some motivated by psychological considerations, others variants on rules (such as SFP) with desirable asymptotic properties. Reinforcement learning globally converge in 2×2 games.

Imitative learning is important in practice and has consequences for both the type of equilibrium that is eventually reached and the dynamic stability of equilibria.

Learning in strategic-form games: agents' strategies are completely observed at the end of each round, and agents are randomly matched with a series of anonymous opponents. Since players cannot identify their opponents' strategies by only observing actions but not types, wrong beliefs can persist in a Bayesian game with simultaneous actions.

Learning in extensive-form games: players observe at most the sequence of actions. In the absence of off-path experimentation in extensive-form games, learning may only lead to self-confirming equilibrium [@Fudenberg1993], which only requires that players' beliefs about others' strategies are consistent with the outcome.

Identify and estimate the learning rules used by subjects in experiments: Experimental data have little power in discriminating between alternative learning models {Salmon2001. An evaluation of econometric models of adaptive learning}. The assumption of a representative agent can drive some of the conclusions of this literature {Wilcox2006. Theories of learning in games and heterogeneity bias.}.

An adjustment process takes expectation of strategy profile as input and can take place in discrete or continuous time:

Time scale
- discrete time: $ \mathbf{s}_{t+1} = \mathbf{g}(\mathbf{s}_t) $
- continuous time: $ \dot{\mathbf{s}} = \mathbf{g}(\mathbf{s}) $
Concurrence
- simultaneous: $ \mathbf{g}(\mathbf{s}) = (g_i(\mathbf{s}_{-i}))_i $
- sequential: $ \mathbf{g}(\mathbf{s}) = \mathbf{g}_n \circ \cdots \circ \mathbf{g}_1 (\mathbf{s}) $, with $ \mathbf{g}_i (\mathbf{s}) = (s_1, \dots, g_i (\mathbf{s}_{-i}), \dots, s_n) $
Expectation
- Cournot: $ \mathbf{f}_{i, t+1} = \mathbf{s}_{-i, t} $
- adaptive: $ \mathbf{f}_{i, t+1} = \mathbf{f}_{i, t} + m_i (\mathbf{s}_{-i, t} - \mathbf{f}_{i, t}) $, where $ m_i $ is the speed of adjustments;
- extrapolative: $ \mathbf{f}_{i, t+1} = \sum_{k = 0}^l a_k \mathbf{s}_{-i, t-k} $, where $ \sum_{k = 0}^l a_k = 1 $;

Table: Comparison of Adjustment Processes

Timing	Best-response	Better-response
Discrete-time sequential	sequential best-response	sequential improvement
Discrete-time simultaneous	discrete-time best-response	-
Continuous-time	continuous-time best-response	gradient

In discrete-time sequential adjustment processes, a path is a sequence of strategy profiles $ \{ s^k \}, k \in \mathbb{N} $ with only one player changes strategy at each step; a cycle is a path with identical start and end.

Under sequential improvement dynamics, an improvement path is a path where at each step the deviator has improved payoff: $ \forall k \in \mathbb{N}, \exists i \in N: s^{k+1}_{-i} = s^k_{-i}, u_i(s^{k+1}) > u_i(s^k) $. A non-deteriorating path is a path where at each step the deviator has non-deteriorating payoff. A weak improvement cycle is a non-deteriorating cycle with some improvement steps. Improvement path terminates at a Nash equilibrium.

Under sequential best-response dynamics, a path is best-response compatible if at each step the deviator moves to a best-response. A path is strict best-response compatible if at each step the deviator moves to a best-response if not already. A (strict) best-response cycle is a (strict) best-response compatible cycle with some improvement steps. Strict best-response compatible path terminates at a Nash equilibrium.

Simultaneous best-response adjustment processes includes discrete-time best-response dynamics: $ \mathbf{s}_{t+1} = B(\mathbf{s}_t) $; and continuous-time best-response dynamics: $ \dot{\mathbf{s}} = M (B(\mathbf{s}) - \mathbf{s}) $.

Gradient process [@Arrow1960]: $ \dot{\mathbf{s}} = S \nabla \mathbf{u}(\mathbf{s}) $, where $S$ is a positive diagonal matrix of adjustment speeds, and pseudo-gradient $ \nabla \mathbf{u}(\mathbf{s}) = ( \nabla_i \mathbf{u}_i(\mathbf{s}) )_{i=1}^n $.

Bayesian learning dynamics

Each player starts with a prior belief on the "state of the world", which usually includes the game being played and the other players' types and strategies. In each period, players play optimally given their beliefs, and update their beliefs by Bayes' rule after observing others' actions. {Young2004, Ch7}

If the priors contain "a grain of truth", Bayesian learning leads to Nash equilibrium [@Kalai1993].

time discounting, random play, amount of information

Stability and Convergence

Concepts of stability and convergence are consistent with those in Qualitative Theory of Differential Equations. Currently, most results on the convergence of game theoretic equilibrium are local; while global convergence is studied for population games, games with linear best-responses, or specific cases. Note that global asymptotic stability implies uniqueness of equilibrium.

A game can have different stability under different adjustment processes. For example, Cournot game with linear marginal rent has a unique pure strategy Nash equilibrium. As an exact potential game {Monderer1996b}, it is globally asymptotically stable under sequential better-response dynamics (with Cournot expectation). But under discrete-time simultaneous best-response dynamics (with Cournot expectation), it is asymptotically stable (thus a sink) only when it has two players; with three players, it asymptotically alternates between two points centered at the equilibrium along the eigen-direction of total production; with over three players, it is unstable in the aforementioned eigen-direction (thus a saddle point) [@Theocharis1960]. With strategic substitutes, contraction stability is comparably restrictive due to simultaneous adjustments [@Hefti2016]. Note that with adaptive expectation, the game can be asymptotically stable under discrete-time simultaneous best-response dynamics as long as the speed of adjustments is small enough, which also means the convergence will be slow [@Okuguchi1999]. The stability properties in this example is global, since it is a linear dynamical system.

There appears to be no universal ordering among stability relations in general regular games, except for some special cases:

Under discrete-time best-response dynamics, local contraction stability (Jacobian of best-response has spectral radius less than one) implies local asymptotic stability.
sequential best-response
Contraction stability implies local stability under continuous-time best-response dynamics with a homogeneous diagonal cofactor [@Hefti2016]: $ M(\mathbf{s}) = (m(s_i))_i $.
For one-dimensional games with weak strategic complements at an equilibrium, continuous-time best-response and gradient stabilities are equivalent [@Hefti2016].

Potential games have the most general convergence results: the potential function is a strict (global) Lyapunov function for all PC dynamics. For Lipschitz continuous PC dynamics on potential games, all ω-limit points are critical points. For Lipschitz continuous PC and NS dynamics on potential games, all ω-limit points are Nash equilibria, and (locally) asymptotically stable set coincide with isolated Nash equilibria that are local maxima of the potential function.

Contractive games are globally asymptotically stable under excess payoff dynamics if the target protocol is integrable. For strictly contractive games, Nash equilibrium is a GESS and unique. Strictly contractive games have global Lyapunov functions for the five basic dynamics:

replicator: $ H_{x^* }(x) = \sum_{\text{supp}(x^* )} x_i \log (x_i^* / x_i) $; [@Hofbauer1979]
Smith: $ \Psi(x) = \sum_i \sum_j x_i [F_j(x) - F_i(x)]_+^2 $; [@Smith1984]
BNN: $ \Gamma(x) = \Vert [\hat{F}(x)]_+ \Vert_2 $;
best-response: $ G(x) = \Vert \hat{F}(x) \Vert_\infty $;
Logit: $ \tilde{G}(x) = \nu(x) + \max_{y \in \text{int}(X)} y \cdot \hat{F}(x) - \nu(y) $;

An interior regular ESS is locally contractive. A boundary quasi-strict regular ESS is locally contractive [@Sandholm2010].

For evolutionary dynamics with strong PC in a neighborhood of strict equilibrium, such as imitative, excess payoff, and pairwise comparison dynamics, payoff deficit $ L(x) = (e_i - x) \cdot F(e_i) $ is a strict local Lyapunov function for strict equilibrium $ e_i $ [@Sandholm2014]; thus isolated rest vertices are (locally) asymptotically stable under such dynamics.

In replicator dynamics, stationary state means all individuals of the population have the same reproductive fitness; asymptotically stable state means no perturbation could establish itself in the population. For population games, a Nash equilibrium is a stationary state; an asymptotically stable state is a Nash equilibrium; ESS is locally asymptotically stable under the replicator dynamics. Regular ESS is locally exponentially stable under the replicator dynamics [@Taylor1978]. Critical points of imitative dynamics coincide with restricted Nash equilibrium. A hyperbolic critical point (no pure imaginary eigenvalues of the Jacobian matrix $ \nabla V^F(x^∗) $) have the same stability properties under all imitative dynamics [@Cressman1997]. Thus regular ESS is locally exponentially stable under all imitative dynamics.

Non-convergence

Bad RPS (Rock-Paper-Scissors where loss is higher than win) has attractive limit cycles under basic evolutionary dynamics. The hypnodisk game has an attractive limit cycle under any PC NS dynamics [@Hofbauer2011].

Population games with four or more strategies can have chaotic attractors under certain dynamics. Multi-population games with three strategies per player can also have chaotic evolutionary dynamics.

For three-population games with a unique completely mixed Nash equilibrium, if each population's revision protocol does not condition on the other populations' payoffs, hyperbolic critical points of the resulting evolutionary dynamic are unstable [@Hart2003].

Strictly dominated strategies may survive if game dynamics do not converge to Nash equilibrium. Although the best response dynamics and all imitative dynamics eliminate dominated strategies at least for most initial conditions, continuous PC NS dynamics with innovation (unused optimal strategies have positive growth rates away from Nash equilibrium) admit games with strictly dominated strategies that survive in perpetuity [@Hofbauer2011]. For example, in bad RPS with a feeble twin, the strictly dominated Twin strategy is always played by fractions of players under Smith dynamics.

Classes of Games

A game is finite if its strategy space $S$ is finite, which means it has a finite set of (non-trivial) players each with a finite strategy space. (Finite games are not very applicable.) A generic game (game with generic payoff) is one in which a small change to any one of the payoffs does not introduce new Nash equilibria or remove existing ones. Generic games have an odd number of Nash equilibria.

A game is zero-sum (aka strictly competitive) if the total payoff of all players under any outcome is a constant. Zero-sum games only have $n-1$ independent payoff functions.

A game is symmetric if the payoff function is permutation invariant: $ \forall \mathbf{s} \in S, P \in \mathcal{S}_n: \mathbf{u}(P\mathbf{s}) = P\mathbf{u}(\mathbf{s}) $. This implies that player strategy spaces are identical, only one component payoff function is independent, and that function is also independent from the order of opponent strategies. For two-person matrix games, symmetry means the two payoff matrices are transpose of each other. While a strategy profile is symmetric if all players use the same strategy, symmetric equilibrium is of special interest to symmetric games. Finite symmetric games have symmetric mixed strategy Nash equilibrium [@Nash1951].

A game has identical interests (aka perfect coordination) if it is best-response equivalent in mixed strategies to a game with identical payoff functions: $ \{ N, S, u \mathbf{1} \} $. {Monderer1996a} Identical-interest games apparently have only one independent payoff function.

A game is regular if its pseudo-gradient on the boundary points into the strategy space, and zeros of the pseudo-gradient have non-singular Jacobian. Regular games have a finite number of Nash equilibria, which are the zeros of the pseudo-gradient (critical points); uniqueness of equilibrium is equivalent to that on the set of critical points $ \det(-\nabla (\nabla \mathbf{u}(\mathbf{s}))) > 0 $; uniqueness of equilibrium is implied if every critical point has some type of stability (contraction, continuous-time best response, gradient) [@Hefti2016].

Simultaneous game

A game is simultaneous (sometimes static) if all players choose their strategies without knowing the others' strategies. In simultaneous games, outcome is equivalent to strategy profile.

A game in strategic form (or normal form) is a finite static game: $u: S \to \mathbb{R}^n$. Two-player strategic-form games are also called matrix games.

Sequential game

A game is sequential (sometimes dynamic, though other interpretations exist) if players take actions in a predefined order, where at least some players can observe other players' previous moves. In sequential games, a strategy profile determines the outcome, but is more than that.

A game in extensive form is a finite sequential game represented as a game tree, where each internal node including the root is a possible position of the game where a player chooses an edge, and the game ends when a leaf node is reached which corresponds to an outcome. [@Kuhn1953]

A subgame of a sequential game is the game corresponding to a subtree. A Nash equilibrium is subgame perfect (SPNE; perfect equilibrium) if it yields a NE in every subgame of the original game. SPNE can be found by backward induction, iterative reasoning from eventual outcomes backwards to the present choice problem.

A sequential game is of perfect information if players can observe others' moves in turn and can recall the entire history of play; in other words, every information set contains exactly one node. Otherwise, the game is of imperfect information.

Repeated game

A repeated game is a stage game played multiple times. Depending on the times repeated, the stage game could be finitely repeated or infinitely repeated.

"Folk theorems" (general feasibility theorem) refers to a class of theorems about possible Nash equilibrium payoff profiles in repeated games: any payoff vector that is individually rational for the players is a subgame perfect equilibrium, provided that the players are sufficiently patient {Friedman1971, Fudenberg1986, Aumann1992}.

Classic Examples

The most minimal class of games are two-person two-strategy games, which is fully defined by eight numbers, or equivalently two 2-by-2 matrices. von Neumann started with a smaller set: zero-sum games, which can be defined by four numbers, or a 2-by-2 matrix. These games have finite strategy space, and can be called matrix games.

The strategy space becomes convex if each player's preference admits metric and mixed strategies are allowed. "games on the square"

Hawk-Dove (Prisoner's Dilemma)

Table: Normal form of Hawk-Dove (symmetric game, showing player 1's payoff)

Player 1\2	Hawk	Dove
Hawk	$(v-c)/2$	$v$
Dove	0	${v}/{2}$

When $v > c$, Hawk-Dove is strategically equivalent to prisoner's dilemma (PD), which has one pure strategy NE: (Hawk, Hawk), aka (Defect, Defect).

When $v < c$, Hawk-Dove two pure strategy Nash equilibria (Hawk, Dove) and (Dove, Hawk), and a mixed strategy symmetric equilibrium.

Strategy for the iterated prisoners' dilemma:

报复：成功的策略必须不是一个盲目乐观者。要始终报复。一个非报复策略的例子是始终合作。这是一个非常糟糕的选择，因为“下流”策略将残酷地剥削这样的傻瓜。
宽恕：成功策略的另一个品质是必须要宽恕。如果对手不继续背叛，它们会一再退却到合作。这停止了报复和反报复的长期进行，最大化了得分点数。
不嫉妒：最后一个品质是不嫉妒，就是说不去争取得到高于对手的分数（对于“友善”的策略来说这也是不可能的，也就是说“友善”的策略永远无法得到高于对手的分数）。【但也不一定就比对手低。】

Zero-sum game

Matching pennies  Rock–paper–scissors
1,0 0,1           1,1 0,2 2,0
0,1 1,0           2,0 1,1 0,2
                  0,2 2,0 1,1

Neither have pure strategy NE, but have one mixed strategy NE: random.

Coordination game

A coordination game is a game where players' payoffs are maximized by them taking the same action. In a Pareto coordination game such as Hi-Lo, one coordination outcome dominates other coordination outcomes. In a pure coordination game, multiple coordination outcomes are rationalizable.

Hi-Lo    Pure     Stag hunt  Battle of the sexes
2,2 0,0  1,1 0,0  2,2 0,1    2,1 0,0
0,0 1,1  0,0 1,1  1,0 1,1    0,0 1,2

Aggregative game

Cournot game is the n-player game based on Cournot competition: $ \{ N, S, u \} $ where $ S = \mathbb{R}_+^n $ and $ u_i(s)=s_i P(\sum_j s_j) - C_i(s_i) $. Here, $P(q)$ is the price function of total production, aka the inverse demand function; $C_i(q_i)$ is each player's total production cost. Cournot equilibrium @Cournot1838 is the stable focus of best-response dynamics, thus stronger than Nash equilibrium. Cournot games have pure strategy Nash equilibrium in without assuming quasi-concavity of payoff functions [@Novshek1985] and convexity of strategy spaces [@Kukushkin1994]. As a special form, quasi-Cournot game have an affine price function. Constant marginal costs can be used to simplify analysis. Homogeneous firms have identical cost functions, leading to a symmetric game, with symmetric $n$-firm Cournot equilibrium. Symmetric quasi-Cournot game is a strictly convex game, thus with a unique stable pure strategy Nash equilibrium.

A game with one-dimensional player strategy spaces is aggregative if every player's payoff is a function of the player's own strategy and the sum of all players' strategies [@Selten1970]: $ u_i = u_i(s_i, \sum_j s_j) $. The game is fully aggregative if every player's payoff and marginal payoff are functions of the player's own strategy and an aggregate of all players' strategies [@Cornes2012]: $ u_i(s_i, t(\mathbf{s})) $, $ \frac{\partial u_i}{\partial s_i}(s_i, t(\mathbf{s})) $.

A game with finite dimensional player strategy spaces is quasi-aggregative, if each player has an interaction function $\sigma_i: S_{-i} \to \mathbb{R}$ and a shift function $ F_i: S_i \times \mathbb{R} \to \mathbb{R} $, both continuous, such that payoffs can be written as $ u_i = u_i(s_i, \sigma_i(s_{-i})) $ and the game admits an aggregator $ g: S \to \mathbb{R} $, $ g(s) = F_i(s_i, \sigma_i(s_{-i})) $. The game is generalized quasi-aggregative if instead $ g(s) = F_i(s_i, \sigma_i(s_{-i})) + v_i(s_{-i}) $, where $ v_i $ are arbitrary functions. [@Jensen2010]

Characteristics of fully aggregative games: [@Cornes2012] If exists continuously differentiable real functions $ h_0 $ and $ h_i, i \in N $, $h_0$ strictly increasing and $ t(\mathbf{s}) = h_0(\sum_i h_i(s_i)) $, then the game is fully aggregative. The converse is true if $ t(\mathbf{s}) $ is twice continuously differentiable and has positive first partial derivatives with form $ \frac{\partial t}{\partial s_i}(s_i, t) $, and the game has at least three players.

Properties of aggregative games:

Aggregative games have strong comparative statics properties of its Nash equilibria {Corchon1994, Acemoglu2013};
Quasi-aggregative games are pseudo-potential games, and are best-response potential games if best-response correspondences are monotone [@Jensen2010];
In fully aggregative games, every player has a backward reply correspondence (or replacement correspondence [@Cornes2012]);

replacement correspondence... $ R_i(\tau) = \arg\max $

Supermodular games

A game is supermodular (submodular) [@Topkis1979] if the player strategy spaces are complete lattice (partially ordered set with a top and a bottom) subsets of Euclidean spaces, and the payoffs are upper semicontinuous in self strategy, continuous in opponent strategy, and supermodular (submodular): $ f(x) + f(y) \le f(x \land y) + f(x \lor y) $. If a function is twice continuously differentiable, supermodularity (submodularity) is equivalent to nonnegative (nonpositive) mixed partial derivatives: $ \frac{\partial^2 f(x, y)}{\partial x \partial y} \ge 0, \forall x, y$.

Conventional substitutes and complements are defined by whether more aggressive strategies of a firm (lower price, increased advertising, greater quantity, etc.) lowers or raises another firm's total profits: $ \frac{\partial u_i}{\partial s_j} < 0 $ for substitutes; $ \frac{\partial u_i}{\partial s_j} > 0 $ for complements. In comparison, [@Bulow1985] proposed strategic substitutes and complements in their study of multi-market oligopoly: the decisions of several players are strategic substitutes (strategic complements) if they mutually offset (reinforce) one another; that is, marginal utility of a player's strategy drops (raises) with increases in the other players' strategies: $ \frac{\partial^2 u_i}{\partial s_j \partial s_i} < 0 $ for strategic substitutes (submodular); $ \frac{\partial^2 u_i}{\partial s_j \partial s_i} > 0 $ for strategic complements (supermodular).

Methods used to study supermodular games are non-topological but algebraic, specifically lattice theory which exploits order properties:

Tarski's fixed point theorem [@Tarski1955]: Any non-decreasing function from a complete lattice into itself has fixed points which also form a complete lattice.
Topkis' theorem [@Topkis1979]: For a real-valued function $f(x,t)$ from the Cartesian product of a compact subset of real numbers $X \in \mathbb{R}$ and a partially ordered set $T$, if it is supermodular and is upper semicontinuous in $x$ for all $t$, best-response $ B_x(t) = \arg \max_{x \in X} f(x, t) $ has a top $\overline{x}(t)$ and a bottom $\underline{x}(t)$, both non-decreasing functions.

By Topkis' theorem, supermodular games have non-decreasing tops and bottoms of players' best-responses. By Tarski's fixed point theorem, supermodular games have pure strategy Nash Equilibria.

For supermodular games, pure strategy Nash equilibria, rationalizable strategies (iterated elimination of strictly dominated strategies), and correlated equilibria have identical tops and bottoms; strategy profile converges within these bounds under dynamic adaptive choice behavior (best-response dynamics and Bayesian learning); and these bounds vary monotonically with certain exogenous parameters [@Milgrom1990].

Potential game

A selfish routing game $ (G, K, c) $ consists of a directed graph $G = (V, E)$, traffic demand $ K = \{ (s_i, t_i, r_i) | s_i, t_i \in V, r_i > 0, i = 1, \dots, k \} $, and edge cost functions $ c_e: \mathbb{R}_{\ge 0} \to \mathbb{R}_{\ge 0}, e \in E $. Cost of a path under a flow over the network is the sum of edge costs incurred along the path: $ C(P, f) = \sum_{e \in P} c_e (f_e) $.

The nonatomic instance of a selfish routing game has a large population of players, each contributing a negligible amount to the demand of a single O/D pair in $K$. For nonatomic instances, the O/D pairs in $K$ are distinct. A (feasible) flow for an O/D pair is a distribution of its demand over the possible paths: $ f_i \in r_i \Delta^{| \mathcal{P}_i | - 1} $, where $ \Delta^n $ denotes the $n$-dimensional standard simplex and $ \mathcal{P}_i $ is the set of all possible paths in graph $G$ with O/D pair $ (s_i, t_i) $. Nonatomic selfish routing was inspired by the toll road example of social cost in [@Pigou1920] which was first stated as traffic equilibrium in [@Wardrop1952] and first formalized in [@Beckmann1956].

For an atomic instance of a selfish routing game, $K$ is the set of players, where the O/D pairs may have duplicates; $ \mathcal{P} = \prod_{i=1}^k \mathcal{P}_i $ is the strategy space, where each strategy profile represents a flow of the players. An atomic instance is unweighted if every player controls the same amount of traffic: $ r_i = r, i = 1, \dots, k $. Atomic selfish routing was first formalized in [@Roughgarden2002].

Congestion game [@Rosenthal1973] is a special case of atomic selfish routing games. A congestion game is a game where each player chooses a combination from a finite pool of factors, with an additive payoff of factor costs each only depends on the number of players choosing that factor. Symbolically, given factors $ T = \{ 1, \dots, t \} $ and player-specific strategy spaces $ S_i \subseteq 2^T, i \in N $, each player has payoff $ u_i = \sum_{k=1}^t s_{ik} c_k(s_k) $. Congestion games always have pure-strategy Nash equilibria. [@Rosenthal1973] Rosenthal's proof constructed an optimization problem with objective function (sometimes called Rosenthal's potential) $ \Phi(a) = \sum_k \sum_{q=0}^{s_k} c_k(q) $ and a finite feasible set, whose optimal points are pure-strategy Nash equilibria of the congestion game.

For the nonatomic instance, an equilibrium flow is a flow such that for every O/D pair, the paths in use all have the minimum cost among the possible paths: $ \{P | f_{iP}^* > 0 \} \subseteq \arg\min_{P \in \mathcal{P}_i} C( P, f^* ) $, $ \forall i \in \{1, \dots, k \} $. For an atomic instance, an equilibrium flow is a pure strategy Nash equilibrium: $ C( P_i^* , f^* ) \le C( P_i, f^* ) $, $ \forall i \in \{1, \dots, k \}, P_i \in \mathcal{P}_i $.

For the nonatomic instance, equilibrium flows exist and induce the same set of edge costs. For unweighted atomic instances, equilibrium flows exist. If edge cost functions are affine, equilibrium flows also exist for any atomic instance. [@AGT2007]

Potential games are generalizations of congestion games and nonatomic selfish routing games (as in [@Beckmann1956]). A potential game is a game where the payoffs depend on a real-valued function, called the potential function. {Monderer1996b} first defined four types of potential games: exact potential game, weighted potential game, ordinal potential game, generalized ordinal potential game. With potential function $ \Phi: S \rightarrow \mathbb{R} $, $ \forall {s_{-i} \in s_{-i}}, \forall {s'_i, s''_i \in S_i} $:

\[\begin{aligned} \Phi(s'_i, s_{-i}) - \Phi(s''_i, s_{-i}) &= u_i(s'_i, s_{-i}) - u_i(s''_i, s_{-i}) &&\text{(Exact)} \\ \Phi(s'_i, s_{-i}) - \Phi(s''_i,s_{-i}) &= (u_i(s'_i, s_{-i}) - u_i(s''_i, s_{-i})) w_i &&\text{(Weighted)} \\ u_i(s'_i, s_{-i}) > u_i(s''_i, s_{-i}) &\Leftrightarrow \Phi(s'_i, s_{-i}) > \Phi(s''_i, s_{-i})&&\text{(Ordinal)} \\ u_i(s'_i, s_{-i}) > u_i(s''_i, s_{-i}) &\Rightarrow \Phi(s'_i, s_{-i}) > \Phi(s''_i, s_{-i}) &&\text{(Generalized Ordinal)} \end{aligned}\]

[@Voorneveld2000] and [@Schipper2004] generalized ordinal potential games to best-response potential game and pseudo-potential game, respectively. The potential function satisfies $ \forall i \in N, \forall s_{-i} \in S_{-i} $:

\[\begin{aligned} B_i(s_{-i}) &= \arg\max_{s_i \in S_i} \Phi(s_i,s_{-i}) &&\text{(Best-response)} \\ B_i(s_{-i}) &\supset \arg\max_{s_i \in S_i} \Phi(s_i,s_{-i}) &&\text{(Pseudo)} \end{aligned}\]

q-potential games [@Monderer2007] and nested potential games [@Uno2007].

It can be shown that congestion games are isomorphic to finite exact potential games {Monderer1996b}, and the other types of potential games are progressively more general, except that generalized ordinal potential games are not necessarily best-response potential games. [@Lã2016]

Characterizations of potential games:

Exact potential games:
- with real intervals as player strategy spaces and twice continuously differentiable payoffs are equivalent to games with identical mixed partial derivatives of payoffs: $ \frac{\partial^2 u_i}{\partial s_i \partial s_j} = \frac{\partial^2 u_j}{\partial s_i \partial s_j} $ {Monderer1996b};
- equivalent to coordination-dummy separable games [@Facchini1997];
- include strategically separable, bilateral symmetric interaction [@Ui2000], and general symmetric observations games [@La2016];
- preserved when shrinking the strategy space;
- have unique potential functions determined up to a constant;
- that differ only in dummy payoffs have the same potential function;
Weighted potential games are preserved by linear combinations of payoff functions {Monderer1996b}.
Ordinal potential games: [@Voorneveld1997]
- equivalent to games with no weak improvement cycle and strategy space properly ordered by non-deteriorating paths;
- the order condition is automatically satisfied if the strategy space is countable;
Best-response potential games: [@Voorneveld2000]
- equivalent to games with no best-response cycle and strategy space properly ordered by best-response compatible paths;
- the order condition is automatically satisfied if the strategy space is countable;
Pseudo-potential games: [@Schipper2004]
- include games with no strict best-response cycle and strategy space properly ordered by strict best-response compatible paths;
- their identical interest version must satisfy these properties;

Nash equilibrium of potential games:

The identical-interest game $ \{ N, S, \Phi \mathbf{1} \} $ resulted from replacing payoffs of a potential game by its potential function has a pure strategy Nash equilibrium being the potential function's global maxima (if exist), and is unique if the potential function is strictly concave;
A best-response potential game has the same Nash equilibrium as that of its identical-interest version {Monderer1996b, Voorneveld2000};
A pseudo-potential game includes the Nash equilibrium of its identical-interest version in its own [@Dubey2006];

Convergence (stability) of Nash equilibrium of potential games:

(Generalized ordinal) potential games converge to a stable Nash equilibrium in better-response dynamics, because every improvement path gradually increases the potential function;
- For finite ordinal potential games, every improvement path is finite {Monderer1996b};
- For continuous ordinal potential games with bounded payoffs, every $\epsilon$-improvement path is finite, so every improvement path compatible with $\epsilon$-improvement converges to an $\epsilon$-Nash equilibrium {Monderer1996b};
Best-response potential games converge to a stable Nash equilibria in best-response dynamics, because every strict best-response compatible path gradually increases the potential function;
For finite pseudo-potential games, every best-response compatible path converges to a Nash equilibrium in a finite number of steps [@Dubey2006];

Miscellaneous

Topics:

Extensive form games: bargaining, contracts;
Games with incomplete information: reputation, bargaining, global games, signaling, cheap talk, auctions;
Miscellaneous: axiomatic social choice theory;

forward induction; machina triangle; Berge's theorem of maximum; Zermelo's theorem; Kuhn's theorem; belief system; signaling games (perfect Bayesian equilibrium); unknown player payoffs; unknown other player payoffs; stochastic games: Markov perfect equilibrium;

Computation of equilibrium

Regret-matching algorithms; Lemke-Howson algorithm; Scarf's algorithm;

Any two-person game with a finite number of positions can be solved by a minimax algorithm that would exhaustively traverse the game tree.

Mechanism Design

Algorithmic game theory

Algorithmic game theory is the intersection of algorithms and game theory. Major advances are in algorithmic mechanism design [@Nisan1999] and quantifying the inefficiency of equilibria [@Roughgarden2002]. Inefficiency of equilibria (aka "price of anarchy") refers to the reduced social value in non-cooperative equilibria compared with the social optimum in cooperative settings.

Resources:

Review Of Game Theory in the Stanford Encyclopedia of Philosophy

🏷 Category=Economics

Family	Revision Protocol \(\rho(\pi, x)\)	Mean Dynamics \(V^F(x)\)
Imitative	\( \left(x_j r_{ij}(\pi, x)\right)_{ij}\)	\( x \circ (r^\text{T} - r) x \)
Excess payoff	\( \mathbf{1} \tau(\hat{\pi})^\text{T} \)	\( \tau(\hat{F}(x)) - \mathbf{1} \cdot \tau(\hat{F}(x)) x \)
(Perturbed) Best-response	\( \mathbf{1} \tilde{M}(\pi)^\text{T} \)	\( \tilde{M}(F(x)) - x \)

Name	Family	Revision Protocol \(\rho(\pi, x)\)	Mean Dynamics \(V^F(x)\)
CTFP	best-response	\( \mathbf{1} M(\pi)^\text{T} \)	\( M(F(x)) - x \)
logit	perturbed BR	\(\mathbf{1} \exp(\eta^{-1} \pi)^\text{T} / Z(\pi)\)	\( \exp(\eta^{-1}F(x)) / Z(F(x)) - x \)
BNN	excess payoff	\(\mathbf{1} [\hat{\pi}]_+^\text{T}\)	\([\hat{F}(x)]_+ - \mathbf{1} \cdot [\hat{F}(x)]_+ x\)
Smith	pairwise comparison	\( \left( [ \pi_j - \pi_i ]_+ \right)_{ij} \)	\([F(x) \mathbf{1}^\text{T} - \mathbf{1} F(x)^\text{T}]_+ x + \\ [F(x) \mathbf{1}^\text{T} - \mathbf{1} F(x)^\text{T}]_- \mathbf{1} \circ x\)
replicator	imitative	\( \left(x_j [ \pi_j - \pi_i ]_+ \right)_{ij} \)	\( \hat{F}(x) \circ x\)