Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed. {Arthur Samuel, ~1959}
A computer program is said to learn from experience with respect to some task and some performance measure, if its performance on task improves with experience. {Tom Mitchell, 1997.
Machine Learning.}
Machine learning focuses on methodology and algorithms.
Theory
See the main article about Learning Theory.
Extracting features:
- 
Feature extraction is representing observations with (numerical) attributes, aka features, by incorporating domain knowledge.
- For a prediction (supervised learning) problem, a label can be separated from the features: think of x(predictor variable) andy(response/outcome variable) of statistics.
Partitioning observations:
- 
Training data: the observations used in building a statistical model.
- 
Validation data: the subset of training data reserved for a grid search (or other optimization methods) of the hyperparameter in model selection.
 
- 
Test data: the observations reserved for validating the statistical model.
Learning workflow:
- 
Model training: apply a learning algorithm to training data to obtain a statistical model.
- 
Model selection, aka hyperparameter optimization: choose the learning algorithm's hyperparameters λ, e.g. penalty on model complexity, that have the highest metric for model evaluation as evaluated by:
- a held-out validation set from training-validation split;
- 
cross validation on the training set;
 
- 
Model evaluation: evaluate the generalization performance of a model with some metric.
Algorithms
Note: Contents in bold are included in Coursera Machine Learning lectures.
A few topics are not identified: regularized regression, neural networks, and anomaly detection.
- Feature extraction and transformation
- Basic statistics: summary statistics, correlations, hypothesis testing
- Anomaly detection: k-NN (k-Nearest Neighbors)
- Neural networks: perceptron, convolutional neural network
- Optimization: stochastic gradient descent, limited-memory BFGS (L-BFGS, Broyden–Fletcher–Goldfarb–Shanno)
 Figure: scikit-learn machine learning algorithm map.
dlib has an alternative map.
Figure: scikit-learn machine learning algorithm map.
dlib has an alternative map.
Supervised Learning
Supervised learning is fitting a model to labeled data y: classification if label is categorical, regression if label is quantitative.
In comparison, unsupervised learning is finding structure in data.
Classification and regression:
- Linear classifiers:
- Generative model: linear regression, linear discriminant analysis (LDA), naive Bayes classifier;
- Discriminative model: Logistic regression (logit), support vector machines, perceptron;
 
- isotonic regression
- 
Decision Tree and Ensemble Learning: random forests, gradient-boosted trees.
Structured prediction: graphical models (Bayesian network)
Clustering
See the main article about Clustering.
- k-means
- Gaussian mixture
- power iteration clustering (PIC)
- latent Dirichlet allocation (LDA)
- streaming k-means
Dimensionality Reduction
See the main article about Dimensionality Reduction.
- singular value decomposition (SVD)
- 
principal component analysis (PCA): find dimensions (and the associated subspace) in a Euclidean space that explain the most sample variance (minimize the residuals).

Tools
Machine Learning:
- R: glmnet,randomForest,gbm,e1071(interface to libsvm),caret, and more.
- Python: scikit-learn sklearn
- 
H2O: GLM (Generalized linear models), GBM (Gradient boosting machine), GLRM (generalized lower rank models), deep neural network.
- 
xgboost: Gradient boosting machine.
- Vowpal Wabbit
- Spark: MLlib
Deep Learning:
- Python: Pylearn2,Theano
- Java: Deeplearning4j
- C++/CUDA: Caffe, cuda-convnet2
- TensorFlow
🏷 Category=Computation Category=Machine Learning