Least squares is a learning problem where the loss function is an $L^2$ norm, $\min_\theta |f - \hat{f}\theta|{2,\mu}$. Given a finite sample, this is equivalent to minimizing the sum of squared 2-norms of the sample residuals, $\min_\theta \sum_{i=1}^m |f(x_i) - \hat{f}_\theta(x_i)|_2^2$. This topic is studied under approximation theory in mathematics, least squares estimators in regression analysis in statistics, and in numerical optimization.
By default, least squares implies linear least squares, where the truth is approximated by a linear model, $\hat{f}\theta(x) = \theta \cdot x$. Nonlinear least squares refers to least squares problems where a nonlinear model is used, $\hat{f}\theta(x) = \hat{f}(x; \theta)$.
Orthogonal polynomials