This information sheet indicates how the course will be organized at pandemic code level yellow and green.
If the colour codes change during the academic year to orange or red, modifications are possible, for example to the teaching and evaluation methods.

Data mining

Course Code :2001WETGDT
Study domain:Computer Science
Academic year:2020-2021
Semester:2nd semester
Contact hours:45
Study load (hours):168
Contract restrictions: No contract restriction
Language of instruction:English
Exam period:exam in the 2nd semester
Lecturer(s)Bart Goethals
Toon Calders

3. Course contents *

After a short introduction to data mining, we study and discuss several advanced data mining techniques. The data mining techniques that will be addressed are divided into the following categories:

  • Classification:
    • k-nearest neighbors, decision trees, Bayesian classifiers, LDA, logistic regression, support-vector machines, neural nets, rule-based classifiers, as well as techniques for combining classifiers in ensembles (bagging and boosting)
    • common issues: under- and overfitting, model-bias, bias-variance decomposition
    • evaluation techniques for classifiers: hold-out, cross validation
  • Clustering: k-means and k-medoids, density based clustering (DBSCAN), Expectation-Maximizatiion-based clustering
  • Outlier detection
  • Pattern mining: frequent itemset mining, subgroup discovery

During the coverage of these topics, several foundational concepts in machine learning and data mining will be treated, such as bias-variance decomposition, maximum likelihood learning, minimal description length principle, etc.

The course will also contain a practical component in which we will make use of the data mining suite Knime. A group project will be carried out using this data mining tool, or a tool of the students' choice.