Machine Learning

Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed. {Arthur Samuel, ~1959}

A computer program is said to learn from experience with respect to some task and some performance measure, if its performance on task improves with experience. {Tom Mitchell, 1997. Machine Learning.}

Machine learning focuses on methodology and algorithms.

Theory

See the main article about Learning Theory.

Extracting features:

Feature extraction is representing observations with (numerical) attributes, aka features, by incorporating domain knowledge.
For a prediction (supervised learning) problem, a label can be separated from the features: think of x (predictor variable) and y (response/outcome variable) of statistics.

Partitioning observations:

Training data: the observations used in building a statistical model.
1. Validation data: the subset of training data reserved for a grid search (or other optimization methods) of the hyperparameter in model selection.
Test data: the observations reserved for validating the statistical model.

Learning workflow:

Model training: apply a learning algorithm to training data to obtain a statistical model.
Model selection, aka hyperparameter optimization: choose the learning algorithm's hyperparameters λ, e.g. penalty on model complexity, that have the highest metric for model evaluation as evaluated by:
- a held-out validation set from training-validation split;
- cross validation on the training set;
Model evaluation: evaluate the generalization performance of a model with some metric.

Algorithms

Note: Contents in bold are included in Coursera Machine Learning lectures. A few topics are not identified: regularized regression, neural networks, and anomaly detection.

Feature extraction and transformation
Basic statistics: summary statistics, correlations, hypothesis testing
Anomaly detection: k-NN (k-Nearest Neighbors)
Neural networks: perceptron, convolutional neural network
Optimization: stochastic gradient descent, limited-memory BFGS (L-BFGS, Broyden–Fletcher–Goldfarb–Shanno)

Figure: scikit-learn machine learning algorithm map. dlib has an alternative map.

Supervised Learning

Supervised learning is fitting a model to labeled data y: classification if label is categorical, regression if label is quantitative. In comparison, unsupervised learning is finding structure in data.

Classification and regression:

Decision Tree and Ensemble Learning: random forests, gradient-boosted trees.
naive Bayes
linear models: support vector machines, logistic regression, linear regression
alternating least squares (ALS): collaborative filtering
isotonic regression

Structured prediction: graphical models (Bayesian network)

Clustering

See the main article about Clustering.

k-means
Gaussian mixture
power iteration clustering (PIC)
latent Dirichlet allocation (LDA)
streaming k-means

Dimensionality Reduction

See the main article about Dimensionality Reduction.

singular value decomposition (SVD)
principal component analysis (PCA): find dimensions (and the associated subspace) in a Euclidean space that explain the most sample variance (minimize the residuals).

alt: Typical ML pipeline

Tools

Machine Learning:

R: glmnet, randomForest , gbm, e1071 (interface to libsvm), caret, and more.
Python: scikit-learn sklearn
H2O
xgboost
Vowpal Wabbit
Spark: MLlib

Deep Learning:

Python: Pylearn2, Theano
Java: Deeplearning4j
C++/CUDA: Caffe, cuda-convnet2
TensorFlow