PCA and SVD works on two different interpretations of tabular data, but are computationally identical.
How PCA and SVD are related:
Interpret SVD the PCA way: m*n matrix, m>n, rank = n. m objects in n space, no perfect collinearity. Rotate in observation space, such that the n vectors in sample space are orthogonal and have largest principal vector norms. Each vector in sample space are projections of observations in corresponding basis, so the first right singular vector have the largest sum of squared projections, and the rest recursively optimize on the residuals. Conclusion: right singular vectors is the PCA basis, left singular vectors multiplies the singular values have row vectors being object coordinates in the rotated basis.
Assumptions: linearity, orthogonality.
Principal Components are (unit) eigenvectors of the sample covariance matrix;
Principal Components Analysis (PCA): Directions (subspaces) of maximal sample variance -> Top (dominant) eigenvectors of the sample covariance matrix, with directional variances being the associated eigenvalues.
PCA can be done by eigenvalue decomposition of a data variance-covariance matrix or singular value decomposition of a data matrix. Singular values of the data matrix are the principal square roots of eigenvalues of the data variance-covariance matrix. "Spectral decomposition": sample space (R^n) can be seen as the dual to the observation space (R^p), principal components are the longest vectors in the sample space for all basis rotations in the observation space. Non-principal components can be truncated without much loss in the observation space, and (principal) information in the sample space can be stored to approximately reconstruct the data, with less storage.
PCA at scale:
[Distributed machine learning first distributes n observations, preserving closed-form algorithm (if exists) for performance; when feature set d also scales, algorithm complexity needs to be restricted to be linear in both n and d, and typically iterative algorithms are used such as stochastic gradient descent (SGD).]
PCA vs LM LS (least square estimate of linear model): Both optimize RSS, but PCA uses distance in full space, LM LS uses distance in the label direction. Simple example: With Galton's symmetric joint distribution of inter-generation heights, PCA gives the diagonal and counter diagonal; while LM LS always gives a line flatter than the diagonal, crossing at the center of mass.