Research team

Expertise

Development and study of advanced methods for data storage, cleaning, processing and querying of vast amounts of data.

Sub‐quadratic graph neural networks: finding a good tradeoff between efficiency and expressivity. 01/10/2022 - 30/09/2026

Abstract

This project situates itself in the area of graph learning, an increasingly popular area in machine learning, and focusses on the development of a theoretical framework for designing and analyzing expressive, yet efficient, graph neural networks. In spite of advances in hardware, when designing graph neural networks one has to take efficiency into consideration. This implies, for example, that most graph neural networks use update functions that require a linear amount of computation. A consequence is that such networks can only learn simple functions. Although more advanced graph neural networks have been proposed, which can learn more complex functions, their applicability is limited. This is due to the fact that quadratic (or more) computation is needed, which is out of reach of large graph datasets. In this project, we aim to understand what graph neural networks can achieve *in-between* this linear and quadratic cost. We propose to formalize, study and analyze sub-quadratic graph neural networks. Such networks are still feasible (less than quadratic) and still powerful (more than what linear networks can achieve). Furthermore, a number of very recent graph neural networks fall into this sub-quadratic category. Apart from developing a mathematical framework for sub-quadratic graph neural networks, we also study their capabilities, both from a theoretical and practical point of view.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Vector embeddings as database views. 01/01/2022 - 31/12/2025

Abstract

Over the past decade, vector embedding methods have been developed as a means of enabling machine learning over structured data such as graphs or, more generally, relational databases. While the empirical effectiveness of vector embeddings for focused learning tasks and application domains is well-researched, exactly what information of the structured data is encoded in embeddings is less understood. In this project, we postulate that by looking at embeddings through the lens of database research, we can gain more insight in what information embeddings contain. Concretely, we propose to design query languages in which vector embeddings can naturally be expressed. In this setting, questions concerning the kind of information that is encoded in the embedded vectors can naturally be phrased as a query rewriting using views problem, which we will study. Furthermore, by taking into account structural properties of embedding queries, we open the door to a transfer of methods in databases to vector embeddings, and back. In particular, database methods for incremental query evaluation and query sampling can be applied for the efficient learning of embedding parameters, while, conversely, embeddings can be exploited for database indexing.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Declarative Languages and Artificial Intelligence. 01/01/2021 - 31/12/2025

Abstract

A network to foster the cooperation between research groups with an interest in the use of declarative methods, promoting international cooperation and stimulating the Flemish groups in maintaining the high quality of their research.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

FWO Sabbatical Leave 2023-2024 (Prof. F. Geerts). 01/08/2023 - 31/01/2024

Abstract

In this research project we delve deeper into the connections between graph neural networks and database query languages. Indeed, it has recently been shown that most graph neural network architectures can be viewed as some query in a query language with aggregation. As a consequence, results on the expressive power of these query languages naturally transfer to results on the expressive power of graph neural networks. This bridge between database theory and graph learning opens up many interesting avenues for further research and the transferal of techniques between these two areas. We here highlight two such avenues. The first relates to the question whether recent advances in query processing (in particular optimal worst-case join algorithms) can be leveraged to improve the efficiency of learning graph neural networks. The second relates to extending graph neural networks over other domains than the reals such that they can naturally perform computations over say booleans, semirings or other algebraic structures. This would substantially increase their applicability. Using the connection to database query languages, where such generalised semantics have been studied in depth, we aim to obtain a detailed picture of how algebraic properties of the underlying domain influence the expressive power of graph neural networks. The viewpoint of graph neural network from such a computational perspective is currently high on the agenda in the context of neural algorithmic reasoning.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

BOF Sabbatical 2023 - F. Geerts 2023 - Deepen the connection between graph learning methods and database theory. 01/08/2023 - 31/01/2024

Abstract

Investigation of exchange of techniques between database theory and graph learning. The focus will be on the characterization of the expressive power of graph learning in terms logic based equivalences

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Updates en Provenance in Data en Knowledge Bases. 01/01/2016 - 31/12/2019

Abstract

This project is concerned with systems that store, manage, restructure, and provide access to data and knowledge. A classical such system is an enterprise database system. More recent systems, however, may consists of cooperating applications that are distributed over a network. Even the entire World Wide Web is being envisaged more and more as a global data and knowledge base. While a lot of research and development has already been devoted to making data and knowledge bases efficient and accessible, only recently attention has shifted to sharing, exchanging, annotating, updating, and transforming data and knowledge. When this happens, it is important to know what has changed, why it was changed, and how. This new type of data is called provenance data. Current systems can be enriched so that provenance data can be managed in unison with ordinary data.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Accelerating Inference in Probabilistic Programs and Databases. 01/01/2015 - 31/12/2018

Abstract

The main objective of this project is to develop a unifying framework for accelerating probabilistic querying in both probabilistic databases and in probabilistic programming. The project is based on the observation that for several of these particular types of queries, algorithms for query answering as well as theoretical insights have often been studied in only one of the two areas of PP and PDB in isolation. Within the intended context of this project, our goal is to generalize and adapt these results for use in each of the other area and for obtaining a more principled understanding of their underlying issues and commonalities.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Computational models for big data algorithms. 01/03/2014 - 31/07/2015

Abstract

A central theme in computer science is the design of efficient algorithms. However, recent experiments show that many standard algorithms degrade significantly in the presence of big data. This is particularly true when evaluating classes of queries in the context of databases. Unfortunately, existing theoretical tools for analyzing algorithms cannot tell whether or not an algorithm will be feasible on big data. Indeed, algorithms that are considered to be tractable in the classical sense are not tractable anymore when big data is concerned. This calls for a revisit of classical complexity theoretical notions. The development of a formal foundation and an accompanying computational complexity to study tractability in the context of big data is the main goal of this project.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

A Scalable, Distributed Infrastructure for Probabilistic Databases. 01/11/2013 - 30/04/2017

Abstract

Probabilistic databases lie at the intersection of databases and probabilistic graphical models. Our past work in this field started at Stanford University more than 6 years ago with the development of the Trio probabilistic database system. Still today, probabilistic databases provide an emerging field of research with many interesting and yet unexplored aspects. With this proposal, we motivate for the exploration of a new, distributed and scalable, infrastructure for probabilistic databases. Rather than building a full-fledged database engine from scratch, we motivate for the specific investigation of how existing approaches (including our own prior works) can be adapted to a distributed setting in order to accelerate both the data management and the probabilistic inference via parallel query evaluations for an SQL-like environment. Currently, there exists no distributed probabilistic database system. Machine Learning approaches, on the one hand, have previously investigated distributed probabilistic inference but do not support SQL. Current distributed database engines, on the other hand, do not handle probabilistic inference or any form of uncertain data management. With this project, we aim to fill this gap between Databases and Machine Learning approaches that so far has not been investigated in the literature. We believe that the proposed topic provides a number of intriguing and challenging aspects for a PhD thesis, both from a theoretical and from a systems-engineering perspective.

Researcher(s)

  • Promoter: Geerts Floris
  • Promoter: Theobald Martin
  • Fellow: Blanco Hernan

Research team(s)

Project type(s)

  • Research Project

Querying distributed dynamic data collections. 01/01/2013 - 31/12/2016

Abstract

The aim of this proposal is to study and develop techniques for querying such dynamic distributed data collections. Our approach is based on three pillars: (1) the study of navigational query languages for linked data; (2) the study of distributed computing methods for distributed query evaluation; and, (3) the use of provenance as a mechanism for monitoring alterations to data sources.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Probabilistic data cleaning. 01/01/2013 - 31/12/2016

Abstract

The goal of this project is to study and develop probabilistic data cleaning techniques. Data cleaning refers to the process of detecting and repairing errors, duplicates and anomalies in data. In response to the large amounts of "dirty" data in today's digital society, the data quality problem is enjoying a lot of interest from various disciplines in computer science.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

A principled approach for improving data quality: bridging theory and practice. 01/09/2011 - 31/08/2021

Abstract

The improvement of the quality of data has been recognised as the number one challenge for data management. The need for effective methods to detect errors in data, to identify objects from unreliable data sources, and to repair the errors is evident. Indeed, there is an increasing demand for data quality tools in the current digital society and industries in particular, to add accuracy and value to business processes. To accommodate for those needs, further fundamental research in data quality is required and its practical potential is to be realised. More specifically, building upon previous research, a uniform data quality dependency framework is to be developed to improve data quality in a variety of application domains.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

A principled approach for improving data quality: bridging theory and practice. 01/09/2011 - 31/08/2014

Abstract

The improvement of the quality of data has been recognised as the number one challenge for data management. The need for effective methods to detect errors in data, to identify objects from unreliable data sources, and to repair the errors is evident. Indeed, there is an increasing demand for data quality tools in the current digital society and industries in particular, to add accuracy and value to business processes. To accommodate for those needs, further fundamental research in data quality is required and its practical potential is to be realised. More specifically, building upon previous research, a uniform data quality dependency framework is to be developed to improve data quality in a variety of application domains.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project