Data Mining (DM) is the ad hoc application of Machine Learning (ML) algorithms to extracting knowledge or patterns from apparently unstructured data. To utilize ML algorithms for DM, one has to abstract the problem in their domain into a set of features.
Figure: MMDS Course Overview {Leskovec2014}
Example data mining workflow in Apache Spark:
Distributed file systems (DFS) and MapReduce: tools for creating parallel algorithms.
Search engine technologies: PageRank, link-spam detection, hubs-and-authorities.
Graphs mining: social network graphs.
An item is an elementary object; a basket is a set of items, aka an itemset. The frequent itemsets problem is to find itemsets that appear in many baskets.
Algorithms: association rules, market-baskets, A-Priori Algorithm and its improvements, FP-growth (frequent pattern) algorithm.
Finding Similar Documents: minhashing and locality-sensitive hashing.
singular-value decomposition (SVD), latent semantic indexing.
perceptrons,
support-vector machines (SVM): gradient descent, stochastic gradient descent, limited-memory BFGS (L-BFGS).
nearest neighbor
recommendation systems: making recommendations based upon previously collected data.
Software Platforms:
Figure: Marketing workflow on KNIME Analytics Platform