Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed. {Arthur Samuel, ~1959}
A computer program is said to learn from experience with respect to some task and some performance measure, if its performance on task improves with experience. {Tom Mitchell, 1997. Machine Learning.}
Machine learning focuses on methodology and algorithms.
Theory
See the main article about Learning Theory.
Extracting features:
-
Feature extraction is representing observations with (numerical) attributes, aka features, by incorporating domain knowledge.
- For a prediction (supervised learning) problem, a label can be separated from the features: think of
x
(predictor variable) and y
(response/outcome variable) of statistics.
Partitioning observations:
-
Training data: the observations used in building a statistical model.
-
Validation data: the subset of training data reserved for a grid search (or other optimization methods) of the hyperparameter in model selection.
-
Test data: the observations reserved for validating the statistical model.
Learning workflow:
-
Model training: apply a learning algorithm to training data to obtain a statistical model.
-
Model selection, aka hyperparameter optimization: choose the learning algorithm's hyperparameters
λ
, e.g. penalty on model complexity, that have the highest metric for model evaluation as evaluated by:
- a held-out validation set from training-validation split;
-
cross validation on the training set;
-
Model evaluation: evaluate the generalization performance of a model with some metric.
Algorithms
Note: Contents in bold are included in Coursera Machine Learning lectures.
A few topics are not identified: regularized regression, neural networks, and anomaly detection.
- Feature extraction and transformation
- Basic statistics: summary statistics, correlations, hypothesis testing
- Anomaly detection: k-NN (k-Nearest Neighbors)
- Neural networks: perceptron, convolutional neural network
- Optimization: stochastic gradient descent, limited-memory BFGS (L-BFGS, Broyden–Fletcher–Goldfarb–Shanno)
Figure: scikit-learn machine learning algorithm map. dlib has an alternative map.
Supervised Learning
Supervised learning is fitting a model to labeled data y
: classification if label is categorical, regression if label is quantitative.
In comparison, unsupervised learning is finding structure in data.
Classification and regression:
-
Decision Tree and Ensemble Learning: random forests, gradient-boosted trees.
- naive Bayes
- linear models: support vector machines, logistic regression, linear regression
- alternating least squares (ALS): collaborative filtering
- isotonic regression
Structured prediction: graphical models (Bayesian network)
Clustering
See the main article about Clustering.
- k-means
- Gaussian mixture
- power iteration clustering (PIC)
- latent Dirichlet allocation (LDA)
- streaming k-means
Dimensionality Reduction
See the main article about Dimensionality Reduction.
- singular value decomposition (SVD)
-
principal component analysis (PCA): find dimensions (and the associated subspace) in a Euclidean space that explain the most sample variance (minimize the residuals).
Tools
Machine Learning:
Deep Learning:
- Python:
Pylearn2
, Theano
- Java: Deeplearning4j
- C++/CUDA: Caffe, cuda-convnet2
- TensorFlow