Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed. {Arthur Samuel, ~1959}

A computer program is said to learn from experience with respect to some task and some performance measure, if its performance on task improves with experience. {Tom Mitchell, 1997. Machine Learning.}

Machine learning focuses on methodology and algorithms.

Theory

See the main article about Learning Theory.

Extracting features:

  • Feature extraction is representing observations with (numerical) attributes, aka features, by incorporating domain knowledge.
  • For a prediction (supervised learning) problem, a label can be separated from the features: think of x (predictor variable) and y (response/outcome variable) of statistics.

Partitioning observations:

  1. Training data: the observations used in building a statistical model.
    1. Validation data: the subset of training data reserved for a grid search (or other optimization methods) of the hyperparameter in model selection.
  2. Test data: the observations reserved for validating the statistical model.

Learning workflow:

  1. Model training: apply a learning algorithm to training data to obtain a statistical model.
  2. Model selection, aka hyperparameter optimization: choose the learning algorithm's hyperparameters λ, e.g. penalty on model complexity, that have the highest metric for model evaluation as evaluated by:
    • a held-out validation set from training-validation split;
    • cross validation on the training set;
  3. Model evaluation: evaluate the generalization performance of a model with some metric.

Algorithms

Note: Contents in bold are included in Coursera Machine Learning lectures. A few topics are not identified: regularized regression, neural networks, and anomaly detection.

  • Feature extraction and transformation
  • Basic statistics: summary statistics, correlations, hypothesis testing
  • Anomaly detection: k-NN (k-Nearest Neighbors)
  • Neural networks: perceptron, convolutional neural network
  • Optimization: stochastic gradient descent, limited-memory BFGS (L-BFGS, Broyden–Fletcher–Goldfarb–Shanno)

Figure: scikit-learn machine learning algorithm map. dlib has an alternative map.

Supervised Learning

Supervised learning is fitting a model to labeled data y: classification if label is categorical, regression if label is quantitative. In comparison, unsupervised learning is finding structure in data.

Classification and regression:

  • Decision Tree and Ensemble Learning: random forests, gradient-boosted trees.
  • naive Bayes
  • linear models: support vector machines, logistic regression, linear regression
  • alternating least squares (ALS): collaborative filtering
  • isotonic regression

Structured prediction: graphical models (Bayesian network)

Clustering

See the main article about Clustering.

  • k-means
  • Gaussian mixture
  • power iteration clustering (PIC)
  • latent Dirichlet allocation (LDA)
  • streaming k-means

Dimensionality Reduction

See the main article about Dimensionality Reduction.

  • singular value decomposition (SVD)
  • principal component analysis (PCA): find dimensions (and the associated subspace) in a Euclidean space that explain the most sample variance (minimize the residuals).

alt: Typical ML pipeline

Tools

Machine Learning:

Deep Learning:

  • Python: Pylearn2, Theano
  • Java: Deeplearning4j
  • C++/CUDA: Caffe, cuda-convnet2
  • TensorFlow