The subject of the topic changes every year, depending on the state-of-the-art in data science, and is discussed with the students in detail at the start of the course.
During the academic year 2016-2017, the project involved participation in a contest organized by the well known Kaggle Machine Learning Competitions, more specifically, the Outbrain Click prediction contest: https://www.kaggle.com/c/outbrain-click-prediction.
Currently, Outbrain pairs relevant content with curious readers in about 250 billion personalized recommendations every month across many thousands of sites. In this competition, Kagglers are challenged to predict which pieces of content its global base of users are likely to click on. Improving Outbrain’s recommendation algorithm will mean more users uncover stories that satisfy their individual tastes.
The dataset for this challenge contains a sample of users’ page views and clicks, as observed on multiple publisher sites in the United States between 14-June-2016 and 28-June-2016. Each viewed page or clicked recommendation is further accompanied by some semantic attributes of those documents.
The data to be analyzed was over 2 billion tuples and 100GB uncompressed.
The entire class participated as a single team, in which each student developed a solution to the problem, which was combined into a single ensemble solution. Our team ended up on the 24th place in the leaderboard of up to 1000 participating teams worldwide.