Note: Contents in bold are included in Coursera Machine Learning lectures. A few topics are not identified: regularized regression, neural networks, and anomaly detection.
Figure: scikit-learn machine learning algorithm map. dlib has an alternative map.
Learning problems can be roughly categorized as either supervised or unsupervised. Supervised learning builds a statistical model to predict or estimate an output (label) based on some inputs: classification if label is categorical, regression if label is quantitative. Unsupervised learning describes the relationships and structure among a set of inputs: dimensionality reduction, clustering.
Other areas of machine learning: Reinforcement learning is concerned with maximizing the reward of a given agent (person, business, etc).
linear regression
Standardization is required in case of different units.
C++/CUDA:
JVM (Java, Scala):
R:
glmnet
, randomForest
, gbm
, e1071
(interface to libsvm), caret
,
and more;
Python: scikit-learn sklearn
; Pylearn2
, Theano
;
Benchmark for GLM, RF, GBM: For the algorithms it supports, H2O is the fastest and as accurate on data over 10M records that fit in memory of a single machine. Benchmark for GBM