Principal Components Analysis

Assumptions: linearity, orthogonality.

Directions (subspaces) of maximal sample variance -> Top (dominant) eigenvectors of the sample covariance matrix, with associated eigenvalues directional variances.

Principal Components: (unit) eigenvectors of the sample covariance matrix

PCA at scale:

  1. big n (observations), small d (feature set)
    • complexities:
      • local storage O(d^2) [outer product],
      • local computation O(d^3) [eigen-decomposition],
      • communication O(dk) [top eigenvectors]
    • algorithm: similar to closed-form linear regression
  2. big n, big d
    • complexities:
      • local storage O(dk+n) [iterated top eigenvectors + intermediate vector],
      • local computation O(dk+n) [vector multiplication + vector concatenation],
      • communication O(dk+n) [iterated top eigenvectors + intermediate vector]
    • algorithm: iterative algorithm (Krylov subspace methods, random projection.)

[Distributed machine learning first distributes n observations, preserving closed-form algorithm (if exists) for performance; when feature set d also scales, algorithm complexity needs to be restricted to be linear in both n and d, and typically iterative algorithms are used such as stochastic gradient descent (SGD).]

PCA vs LM LS (least square estimate of linear model): Both optimize RSS, but PCA uses distance in full space, LM LS uses distance in the label direction. Simple example: With Galton's symmetric joint distribution of inter-generation heights, PCA gives the diagonal and counter diagonal; while LM LS always gives a line flatter than the diagonal, crossing at the center of mass.

Singular Value Decomposition


🏷 Category=Computation Category=Machine Learning