Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed. [Arthur Samuel, ~1959]
A computer program is said to learn from experience with respect to some task and some performance measure, if its performance on task improves with experience. {Tom Mitchell, 1997. Machine Learning.}
Machine learning focuses on methodology and algorithms.
Extracting features:
-
Feature extraction is representing observations with (numerical) attributes, aka features, by incorporating domain knowledge.
- For a prediction (supervised learning) problem, a label can be separated from the features: think of
x
(predictor variable) and y
(response/outcome variable) of statistics.
Partitioning observations:
-
Training data: the observations used in building a statistical model.
-
Validation data: the subset of training data reserved for a grid search (or other optimization methods) of the hyperparameter in model selection.
-
Test data: the observations reserved for validating the statistical model.
Learning workflow:
-
Model training: apply a learning algorithm to training data to obtain a statistical model.
-
Model selection, aka hyperparameter optimization: choose the learning algorithm's hyperparameters
λ
, e.g. penalty on model complexity, that have the highest metric for model evaluation as evaluated by:
- a held-out validation set from training-validation split;
-
cross validation on the training set;
-
Model evaluation: evaluate the generalization performance of a model with some metric.
Algorithms
Note: Contents in bold are included in Coursera Machine Learning lectures.
A few topics are not identified: regularized regression, neural networks, and anomaly detection.
- Feature extraction and transformation
- Basic statistics: summary statistics, correlations, hypothesis testing
- Anomaly detection: k-NN (k-Nearest Neighbors)
- Neural networks: perceptron, convolutional neural network
- Optimization: stochastic gradient descent, limited-memory BFGS (L-BFGS, Broyden–Fletcher–Goldfarb–Shanno)
Figure: scikit-learn machine learning algorithm map. dlib has an alternative map.
Supervised learning is fitting a model to labeled data y
: classification if label is categorical, regression if label is quantitative.
In comparison, unsupervised learning is finding structure in data.
Classification and regression:
-
decision trees and ensemble learning: random forests, gradient-boosted trees.
- naive Bayes
- linear models: support vector machines, logistic regression, linear regression
- alternating least squares (ALS): collaborative filtering
- isotonic regression
Structured prediction: graphical models (Bayesian network)
- k-means
- Gaussian mixture
- power iteration clustering (PIC)
- latent Dirichlet allocation (LDA)
- streaming k-means
- singular value decomposition (SVD)
-
principal component analysis (PCA): find dimensions (and the associated subspace) in a Euclidean space that explain the most sample variance (minimize the residuals).
Tools
Machine Learning:
- R:
glmnet
, randomForest
, gbm
, e1071
(interface to libsvm), caret
, and more.
- Python: scikit-learn
sklearn
- H2O
- xgboost
- Vowpal Wabbit
- Spark: MLlib (see BDAS.md)
Deep Learning:
- Python:
Pylearn2
, Theano
- Java: Deeplearning4j
- C++/CUDA: Caffe, cuda-convnet2
- TensorFlow