This course is about algorithms that have been developed for big data. The algorithms are mostly sketch or summary-based, i.e., based on a small summary of the data, properties of interest are computed in an approximate way. The correctness of these algorithms often require some kind of probabilistic analysis. As a consequence, the course involves a substantial amount of mathematics and of probability theory in particular.
The course covers algorithms for
- Approximate counting (Morris, KMV, (Hyper)Loglog)
- Approximate frequency (Misra-Gries, Count Min, Count Sketch)
- Heavy hitters
- Approximate moments (AGMS sketch)
- Property testing (sorting, clustering)
- Sublinear approximation algorithms (number of connected components)
- Locality Sensitive Hashing (LSH)
This list may change from year to year.