Research team

ADReM Data Lab (ADReM)

Expertise

Analysis of dynamic network data Most works in network analysis concentrate on static graphs and find patterns such as the most in influential nodes in the network. Very few existing methods are able to deal with repeated interactions between nodes in a network. The main goal of the research in this topic is hence to fill this gap by developing methods to identify patterns in interactions between network nodes. We studied so-called information channels that indicate information flows. Process Mining In process mining the object of study are logs generated by business processes. Consider for instance a log generated by a leave request system, recording activities such as users logging in, opening a new request, managers approving requests, emails being sent by the system, etc. In process mining such logs are analyzed to better understand, monitor, and improve the business processes. One tasks in this context is detecting complex events. Complex events can be used to find pre-defined security problems or abnormalities. Often, however, anomalies may occur that are not foreseen in the systems. In order to be able to handle such cases, anomaly detection techniques are necessary. With the following work on model-based anomaly detection using dynamic Bayesian networks, we won the Business Process Intelligence challenge at the BPM 2018 conference: S. Pauwels and T. Calders. Detecting and Explaining Drifts in Yearly Grant Applications. In BPM Workshop Business Process Intelligence (BPI), 2018. Fairness-Aware Machine Learning In contemporary society we are continuously being profiled; banks have profiles to divide up people according to credit risk, insurance companies profile clients for accident risk, telephone companies profile users on their calling behavior, web corporations profile users according to their interests and preferences based on web activity and visitation patterns. These profiles are more and more built automatically by machine learning methods trained on historical data. Within society there are growing concerns that these machine learning methods do not have ethical or moral restrictions. Recent studies show indeed that in circumstances where historical data is biased, or when there is omitted variable bias, automatically learned methods may take decisions that could be considered discriminatory. Apart from ethical considerations, there are also legal restrictions to the use of profiling methods that blindly optimize accuracy without taking unwanted discriminatory effects into account. The recent General Data Protection Regulation (GDPR; Regulation (EU) 2016/679) explicitly mentions profiling (Art. 22 GDPR Automated individual decision-making, including profiling) as an activity in which decisions should not be based on personal data and suitable measures should be in place to safeguard the data subjects rights and freedoms and legitimate interests. Most profiling techniques, however, do not consider anti-discrimination legislation and may unintentionally produce models that are unfair and hence do not safeguard the data subjects freedoms. A further complication is that often detecting whether a model is unfair, is highly non-trivial.

Digitalisation and Tax (DigiTax). 03/07/2019 - 31/12/2024

Abstract

Digital transformation has caused changes in all aspects of human life. In the DigiTax project, we look at the tax implications of this process from two perspectives. First, we examine the challenges that digitalisation brings to the tax area. For example, in the digital economy multinationals have more opportunities to shift profits to low-tax countries. Where should these profits be taxed? Also, increasingly robots are entering the labor force market, from automated driving cars to chatbots. Should they be considered a separate taxable entity, and if so, how do these robots need to be taxed? More generally, we will look at: (a) which tax regimes come under pressure, (b) is there a need to change the traditional tax concepts and if so which new tax concepts can be developed to contribute to a fairer taxation, (c) who is legitimately authorized, and (d) how to implement the change? Second, and vice versa, we study the opportunities that digitalisation creates for the fairness of taxation and the efficiency and effectiveness of the tax authorities. For example, how can improved data mining algorithms or the inclusion of novel data sources help to develop more accurate, understandable and discrimination-free fraud detection systems that minimize tax non-compliance or tax-evasion? Or how can blockchain technology improve transparency, tax compliance and trust between authorities and taxpayers? We will specifically look at the opportunities that data mining, internet of things (IoT) and blockchain technology bring to the tax domain. This project explicitly calls for a multidisciplinary approach, studying the technological, legal, economical and societal implications of digitalisation and tax.

Researcher(s)

Research team(s)

Mining and Exploiting Interaction Patterns in Networks. 01/01/2018 - 31/12/2021

Abstract

Most works in network analysis concentrate on static graphs and find patterns such as which are the most influential nodes in the network. Very few existing methods are able to deal with repeated interactions between nodes in a network. The main goal of the project is hence to fill this gap by developing methods to identify patterns in interactions between network nodes. These interaction patterns could characterize information propagation in social networks, or money streams in financial transaction networks. We consider three orthogonal dimensions. The first one is the pattern type. We consider, among others, temporal paths, information cascade trees and cycles. To guide our choice of which patterns to study, we get inspiration from three real-world cases: two interaction networks with payment data, one for which the task is marketing related, and one for default prediction, and one social network with an application in microfinance. The second dimension is how to use the query pattern: exhaustively find all occurrences of the patterns, or as a participation query that finds nodes that participate more often in a pattern of interest. Finally, the third dimension concerns the computational model: offline, one-pass, or streaming. It is important to scale up to large interaction networks. In summary, the novelty of our proposal lies in the combination of streaming techniques, pattern mining, and social network analysis, validated on three real-world cases.

Researcher(s)

Research team(s)

Flanders AI. 01/07/2019 - 31/12/2020

Abstract

The Flemish AI research program aims to stimulate strategic basic research focusing on AI at the different Flemish universities and knowledge institutes. This research must be applicable and relevant for the Flemish industry. Concretely, 4 grand challenges 1. Help to make complex decisions: focusses on the complex decision-making despite the potential presence of wrongful or missing information in the datasets. 2. Extract and process information at the edge: focusses on the use of AI systems at the edge instead of in the cloud through the integration of software and hardware and the development of algorithms that require less power and other resources. 3. Interact autonomously with other decision-making entities: focusses on the collaboration between different autonomous AI systems. 4. Communicate and collaborate seamlessly with humans: focusses on the natural interaction between humans and AI systems and the development of AI systems that can understand complex environments and can apply human-like reasoning.

Researcher(s)

Research team(s)

Foundations of inductive databases for data mining. 01/01/2006 - 31/12/2009

Abstract

In this project, we study the realization of an inductive database model. The most important steps in the realization of such a model are : a) a uniform representation of patterns and data; b) a query-language for querying the data and the patterns; c) the integration of existing optimization techniques into the physical layer.

Researcher(s)

Research team(s)

Complete and heuristic methods for guaranteeing privacy in data mining. 01/01/2005 - 31/12/2007

Abstract

The aim of data mining is to find useful information, such as trends and patterns, from large databases. These databases often contain confidential or personal information. Therefore, it is important to assess to what degree the application of data mining techniques can harm the privacy of individuals. In this project, we want to develop methods that assess the degree of disclosure of private information by a data mining operation. Since complete methods probably have a too high complexity, we will also pay attention to incomplete, heuristic methods.

Researcher(s)

Research team(s)

Database support for interactive data mining 01/10/2003 - 30/09/2006

Abstract

This project aims at a systematic study of the possibilities and problems for a database system for data mining. The development of a database system for data mining brings up a lot of fundamental questions. How will we represent the data? In which way can we integrate the data mining algorithms in query languages? How can we optimize the queries? A theoretical and fundamental approach to these questions is the central theme in this project.

Researcher(s)

Research team(s)

Data mining: mining methods, their complexities and query languages. 01/10/2001 - 30/09/2003

Abstract

Because of recent developments in knowledge engineering and hardware, large digital databases have become common. The value of these databases, however, is not only determined by their size, but also by the possibility of analysis. Data mining is the discovery of previously unknown dependencies in data. The goal is to find and reveal structure in the data, rather than going into the details of the data. These kinds of meta-data determine the value of the database. There are already important applications of data mining in the industry. The advantages of data mining for a company are huge. Since data mining is concerned with the meta-level of the data, there are many similarities with artificial intelligence and knowledge engineering. Because of this, data mining is regularly referred to as `knowledge discovery in databases.' Data mining is also strongly linked with OLAP online analytical processing. The research of data mining techniques started in the early nineties and grew enormously since 1995. Association rules are a type of rules commonly studied in data mining. There are a lot of known algorithms for finding this type of rules. However, this type of rules is very elementary. It would be interesting to study more general rules. A possible description language is first order logic. There are multiple complexity measures that can be studied: complexity in the number of tuples, in the number of attributes, and in the length of the rules. An interesting problem is the classification of these rules by their complexities. Afterwards, the research can go into studying patterns that allow algorithms with acceptable running times. In my graduate thesis, the search for a certain type of more general rules is studied. Another aspect of data mining is the following: how can a user of database- and OLAP-systems make maximal use of data mining tools? In this perspective, there is a need for expressive query languages that give the user the opportunity to query the database in a simple way. Another point of interest is the notion of `genericity'. Generic queries are queries that are independent of the chosen data structure. In data mining, however, many of the methods are strongly dependent on the physical shape of the data. At the moment, there is a lot of interest in data mining. At the UIA (University of Antwerp), a project on data mining, funded by FWO, is running. The proposed project is an extension to this project. The understanding of theoretical foundations is important. In contrast with much of the current research, that is performed in an ad-hoc way, this project has as goal to enlarge the theoretical knowledge in data mining.

Researcher(s)

Research team(s)

    Data mining: mining methods, their complexities and query languages. 01/10/1999 - 30/09/2001

    Abstract

    Because of recent developments in knowledge engineering and hardware, large digital databases have become common. The value of these databases, however, is not only determined by their size, but also by the possibility of analysis. Data mining is the discovery of previously unknown dependencies in data. The goal is to find and reveal structure in the data, rather than going into the details of the data. These kinds of meta-data determine the value of the database. There are already important applications of data mining in the industry. The advantages of data mining for a company are huge. Since data mining is concerned with the meta-level of the data, there are many similarities with artificial intelligence and knowledge engineering. Because of this, data mining is regularly referred to as `knowledge discovery in databases.' Data mining is also strongly linked with OLAP online analytical processing. The research of data mining techniques started in the early nineties and grew enormously since 1995. Association rules are a type of rules commonly studied in data mining. There are a lot of known algorithms for finding this type of rules. However, this type of rules is very elementary. It would be interesting to study more general rules. A possible description language is first order logic. There are multiple complexity measures that can be studied: complexity in the number of tuples, in the number of attributes, and in the length of the rules. An interesting problem is the classification of these rules by their complexities. Afterwards, the research can go into studying patterns that allow algorithms with acceptable running times. In my graduate thesis, the search for a certain type of more general rules is studied. Another aspect of data mining is the following: how can a user of database- and OLAP-systems make maximal use of data mining tools? In this perspective, there is a need for expressive query languages that give the user the opportunity to query the database in a simple way. Another point of interest is the notion of `genericity'. Generic queries are queries that are independent of the chosen data structure. In data mining, however, many of the methods are strongly dependent on the physical shape of the data. At the moment, there is a lot of interest in data mining. At the UIA (University of Antwerp), a project on data mining, funded by FWO, is running. The proposed project is an extension to this project. The understanding of theoretical foundations is important. In contrast with much of the current research, that is performed in an ad-hoc way, this project has as goal to enlarge the theoretical knowledge in data mining.

    Researcher(s)

    Research team(s)