The PageRank algorithm was developed by Page and Sergey Brin in 1996, ranking webpages on the Internet. {Brin1998}

Heuristics

A webpage is more important if there are more webpages link to it and if other important pages link to it.

PageRank is designed as a score associated with a webpage/URL that indicates its importance. A webpage earns score from webpages linking to it, and equally distributes its score to each page it links to.

Implementation

The heuristics implies a system of equations: \[ r = Mr \] where r is the vector of all webpages on the Internet, entries of stochastic adjacency matrix \(M_{ij} = 1/d_j\), and \(d_j\) is the out degree of webpage \(r_j\).

But PageRank algorithm adjusted the matrix a little to make the iteration always converge (the interpretation also deviates a little): \[ r = \beta Mr + (1-\beta) 1/N \] where N is the dimension of r, and common vales of β are in between 0.8 and 0.9.

The algorithm starts with r=1/N, multiplies with coarse matrix M, adjusts for leaked score, and iterates. Score vector r is guaranteed to converge.

Mathematical Properties

Theory of Markov chains (Perron–Frobenius applied to stochastic matrices): For any start vector, the power method applied to a Markov transition matrix P will converge to a unique positive stationary vector as long as P is stochastic, irreducible and aperiodic.

The adjusted stochastic adjacency matrix satisfies all the three requirements: \[ A = \beta M + (1-\beta) \mathbf{1} \mathbf{1}^T /N \]


🏷 Category=Computation Category=Data Mining