Assumptions: linearity, orthogonality.
Directions (subspaces) of maximal sample variance -> Top (dominant) eigenvectors of the sample covariance matrix, with associated eigenvalues directional variances.
Principal Components: (unit) eigenvectors of the sample covariance matrix
PCA at scale:
[Distributed machine learning first distributes n observations, preserving closed-form algorithm (if exists) for performance; when feature set d also scales, algorithm complexity needs to be restricted to be linear in both n and d, and typically iterative algorithms are used such as stochastic gradient descent (SGD).]