Data mining

Course Code :2001WETGDT
Study domain:Computer Science
Academic year:2017-2018
Semester:2nd semester
Contact hours:45
Study load (hours):168
Contract restrictions: No contract restriction
Language of instruction:English
Exam period:exam in the 2nd semester
Lecturer(s)Bart Goethals
Toon Calders

3. Course contents *

After a short introduction to data mining, we study and discuss several advanced data mining techniques. The data mining techniques that will be addressed are divided into the following categories:

  • Classification:
    • k-nearest neighbors, decision trees, Bayesian classifiers, LDA, logistic regression, support-vector machines, neural nets, rule-based classifiers, as well as techniques for combining classifiers in ensembles (bagging and boosting)
    • common issues: under- and overfitting, model-bias, bias-variance decomposition
    • evaluation techniques for classifiers: hold-out, cross validation
  • Clustering: k-means and k-medoids, density based clustering (DBSCAN), Expectation-Maximizatiion-based clustering
  • Outlier detection
  • Pattern mining: frequent itemset mining, subgroup discovery

During the coverage of these topics, several foundational concepts in machine learning and data mining will be treated, such as bias-variance decomposition, maximum likelihood learning, minimal description length principle, etc.

The course will also contain a practical component in which we will make use of the data mining suite Knime. A group project will be carried out using this data mining tool, or a tool of the students' choice.