Multi-modal transfer learning through self-supervision for real-time venue mapping. 01/09/2020 - 31/08/2024


Venue mapping is a special case of the reverse geocoding problem. Given user's GPS coordinates, an accuracy radius and a list of venues located inside that radius, we want to derive which venue did the user visit. Unfortunately, noise in the signal, and especially in dense urban areas, limits our ability to achieve satisfactory results. Resent research shows that it is possible to improve the results by incorporating temporal and behavioral knowledge into the venue mapping model. As a company specializing in analyzing sensor data, such as accelerometer, gyroscope and GPS, from mobile devices, Sentiance has a vast amount of data for thousands of users. An open question is how to represent the data so that the model could be trained in fully data-driven fashion. Manually creating rules or labelling millions of venues is not an option and would not result in a scalable, future-proof solution. Restricted by the lack of labelled data, we studied the latest achievements in Deep self-supervised learning in order to design a model that would be able to autonomously reveal the internal patterns available in the unlabeled data. In order to guarantee rich generalization capabilities of our model, we searched for ways to incorporate more knowledge into our model by means of publicly available data and Transfer learning. Despite the fact that such datasets exist, we faced another problem – the format of the data is so different from our in-house data, that none of the existing Transfer Learning techniques could be applied directly. Finally, to tackle this challenge we studied the fields of Multimodal learning and Multi-task learning. In this project we propose training a series of Deep learning models with a novel architecture that would result in a new state-of-the-art solution for the venue mapping problem.


Research team(s)