After a short introduction to data mining, we study and discuss several advanced data mining techniques. The data mining techniques that will be addressed are divided into the following categories:
- k-nearest neighbors, decision trees, Bayesian classifiers, LDA, logistic regression, support-vector machines, neural nets, rule-based classifiers, as well as techniques for combining classifiers in ensembles (bagging and boosting)
- common issues: under- and overfitting, model-bias, bias-variance decomposition
- evaluation techniques for classifiers: hold-out, cross validation
- Clustering: k-means and k-medoids, density based clustering (DBSCAN), Expectation-Maximizatiion-based clustering
- Outlier detection
- Pattern mining: frequent itemset mining, subgroup discovery
During the coverage of these topics, several foundational concepts in machine learning and data mining will be treated, such as bias-variance decomposition, maximum likelihood learning, minimal description length principle, etc.
The course will also contain a practical component in which we will make use of the data mining suite Knime. A group project will be carried out using this data mining tool, or a tool of the students' choice.