Algorithmic Foundations of Data Science

Course Code :2001WETATD
Study domain:Computer Science
Bi-anuall course:Taught in academic years starting in an odd year
Academic year:2019-2020
Semester:2nd semester
Contact hours:45
Credits:6
Study load (hours):168
Contract restrictions: No contract restriction
Language of instruction:English
Exam period:exam in the 2nd semester
Lecturer(s)Floris Geerts

3. Course contents *

This course is about algorithms that have been developed for big data. The algorithms are mostly sketch or summary-based, i.e., based on a small summary of the data, properties of interest are computed in an approximate way. The correctness of these algorithms often require some kind of probabilistic analysis. As a consequence, the course involves a substantial amount of mathematics and of probability theory in particular.

The course covers algorithms for

  • Approximate counting (Morris, KMV, (Hyper)Loglog)
  • Approximate frequency (Misra-Gries, Count Min, Count Sketch)
  • Heavy hitters 
  • Approximate moments (AGMS sketch)
  • Property testing (sorting, clustering)
  • Sublinear approximation algorithms (number of connected components)
  • Locality Sensitive Hashing (LSH)

This list may change from year to year.