The **PageRank algorithm** was developed by Page and Sergey Brin in 1996,
ranking webpages on the Internet. [@Brin1998]

## Heuristics

A webpage is more important if
there are more webpages link to it and other important pages link to it.

PageRank is designed as a score associated with a webpage/URL that indicates its importance.
A webpage earns score from webpages linking to it,
and equally distributes its score to each page it links to.

## Implementation

The heuristics implies a system of equations: $r = Mr$,
where $r$ is the vector of all webpages on the Internet,
entries of stochastic adjacency matrix $M_{ij} = 1/d_j$,
and $d_j$ is the out degree of webpage $r_j$.

But PageRank algorithm adjusted the matrix a little to make the iteration always converge
(the interpretation also deviates a little): $r = \beta Mr + (1-\beta) 1/N$,
where $N$ is the dimension of r, and common vales of β are in between 0.8 and 0.9.

The algorithm starts with $r=1/N$, multiplies with coarse matrix M,
adjusts for leaked score, and iterates.
Score vector $r$ is guaranteed to converge.

## Mathematical Properties

Theory of Markov chains (Perron–Frobenius applied to stochastic matrices):
For any start vector, the power method applied to a Markov transition matrix $P$
will converge to a unique positive stationary vector as long as P is stochastic,
irreducible and aperiodic.

The adjusted stochastic adjacency matrix satisfies all the three requirements:
$A = \beta M + (1-\beta) \mathbf{1} \mathbf{1}^T /N$.

🏷 Category=Computation Category=Data Mining