This information sheet indicates how the course will be organized at pandemic code level yellow and green.
If the colour codes change during the academic year to orange or red, modifications are possible, for example to the teaching and evaluation methods.

Algorithmic Foundations of Data Science

Course Code :2001WETATD
Study domain:Computer Science
Bi-anuall course:Taught in academic years starting in an odd year
Academic year:2020-2021
Semester:2nd semester
Contact hours:45
Credits:6
Study load (hours):168
Contract restrictions: No contract restriction
Language of instruction:English
Exam period:exam in the 2nd semester
Lecturer(s)Floris Geerts

3. Course contents *

This course is about algorithms that have been developed for big data. The algorithms are mostly sketch or summary-based, i.e., based on a small summary of the data, properties of interest are computed in an approximate way. The correctness of these algorithms often require some kind of probabilistic analysis. As a consequence, the course involves a substantial amount of mathematics and of probability theory in particular.

The course covers algorithms for

  • Approximate counting (Morris, KMV, (Hyper)Loglog)
  • Approximate frequency (Misra-Gries, Count Min, Count Sketch)
  • Heavy hitters 
  • Approximate moments (AGMS sketch)
  • Property testing (sorting, clustering)
  • Sublinear approximation algorithms (number of connected components)
  • Locality Sensitive Hashing (LSH)

This list may change from year to year.