1. Introduction to visual programming and data mining workflows. Data input, visualization, data selection and interactive data exploration. Scatterplot visualization, choice of projection.
2. Classification. Classification trees. Confusion matrix. Scoring of classification models. Classification accuracy and AUC. Data sampling, training and test sets. Cross-validation. A glimpse into logistic regression, random forests, and SVM. Statistical comparison of classifiers.
3. Regression. Linear and polynomial regression. Regularization. Effects of regularization on accuracy in training and test sets. Parameter search. Other regression techniques (random forests).
4. Clustering. Hierarchical clustering. Explorative data analysis with clustering and data projections. k-means clustering. DBSCAN clustering. Time and space complexity. Cluster scoring and selection of number of clusters.
5. Data projections. Principal component analysis. Multi-dimensional scaling. TSNE.
6. Analysis of unstructured data, like images and sequences. Data embedding. Deep models.