Research team

Engineering Management

Expertise

Explainable AI Behavioral Data Mining

Explaining AI Models to Gain Insight into the Models and Learn about the World. 01/04/2021 - 31/03/2025

Abstract

The project is about explaining the decisions made by Artificial Intelligence (AI) prediction models, and the use thereof to gain global insights into the models and knowledge of the world. Advances in AI are spurred mainly by deep learning (artificial neural networks) and the availability of massive image, textual and behavioural data. This has led to great predictive accuracies, with positive economical and societal implications, but also to very complex models. Explaining the predictions of such "black box" models has gained increasing attention of the AI research community. However, the current approaches and results only scratch the surface of the potential of this "explainable AI" research. The main objective of this proposal is to push the frontiers of the research by putting forward the Evidence Counterfactual (EdC) as a paradigm within explainable AI. The project will look at how the Evidence Counterfactual can be used to generate explanations that lead to novel insights into the AI model and the world (improve insight), and to validate the new methodologies in a variety of applications, ranging from insurance to political science. Trying to explain how things work is a central driver in science. In that context, this project is not only a fundamental but also a logical next step in AI research.

Researcher(s)

Research team(s)

Explaining prediction models on high-dimensional behavioral and textual data. 01/10/2020 - 30/09/2022

Abstract

As a consequence of digitalization, more aspects of people's lives are being captured. Examples include visiting particular physical locations or webpages, liking Facebook pages, etc. This behavioral data holds significant predictive power. For example, what you like on Facebook can be predictive for your IQ, product interest, and even creditworthiness. Deep learning has been shown to outperform other prediction techniques in making accurate predictions using behavioral data. Combining behavioral data and deep learning unfortunately results in incomprehensible black box predictions. Three reasons why: (1) behavioral data is very high-dimensional (up to millions of features), (2) the data is sparse, so every feature is only of relevance for a few data instances, and (3) the deep learning model is complex and non-linear. Consequently, although the combination of deep learning and behavioral data is so predictive, it is very difficult to understand why the model is making certain predictions, leading to skepticism to use it in practice. The main contribution of this research, financed by FWO, is to design new algorithms that explain the complex deep learning prediction models. This comprehensibility issue is a research area that has gained attention in the data mining community because of the implications it has on model deployment and transparency towards users. We will validate our findings in several applications.

Researcher(s)

Research team(s)

Digitalisation and Tax (DigiTax). 03/07/2019 - 31/12/2024

Abstract

Digital transformation has caused changes in all aspects of human life. In the DigiTax project, we look at the tax implications of this process from two perspectives. First, we examine the challenges that digitalisation brings to the tax area. For example, in the digital economy multinationals have more opportunities to shift profits to low-tax countries. Where should these profits be taxed? Also, increasingly robots are entering the labor force market, from automated driving cars to chatbots. Should they be considered a separate taxable entity, and if so, how do these robots need to be taxed? More generally, we will look at: (a) which tax regimes come under pressure, (b) is there a need to change the traditional tax concepts and if so which new tax concepts can be developed to contribute to a fairer taxation, (c) who is legitimately authorized, and (d) how to implement the change? Second, and vice versa, we study the opportunities that digitalisation creates for the fairness of taxation and the efficiency and effectiveness of the tax authorities. For example, how can improved data mining algorithms or the inclusion of novel data sources help to develop more accurate, understandable and discrimination-free fraud detection systems that minimize tax non-compliance or tax-evasion? Or how can blockchain technology improve transparency, tax compliance and trust between authorities and taxpayers? We will specifically look at the opportunities that data mining, internet of things (IoT) and blockchain technology bring to the tax domain. This project explicitly calls for a multidisciplinary approach, studying the technological, legal, economical and societal implications of digitalisation and tax.

Researcher(s)

Research team(s)

Project website

Flanders AI. 01/07/2019 - 31/12/2021

Abstract

The Flemish AI research program aims to stimulate strategic basic research focusing on AI at the different Flemish universities and knowledge institutes. This research must be applicable and relevant for the Flemish industry. Concretely, 4 grand challenges 1. Help to make complex decisions: focusses on the complex decision-making despite the potential presence of wrongful or missing information in the datasets. 2. Extract and process information at the edge: focusses on the use of AI systems at the edge instead of in the cloud through the integration of software and hardware and the development of algorithms that require less power and other resources. 3. Interact autonomously with other decision-making entities: focusses on the collaboration between different autonomous AI systems. 4. Communicate and collaborate seamlessly with humans: focusses on the natural interaction between humans and AI systems and the development of AI systems that can understand complex environments and can apply human-like reasoning.

Researcher(s)

Research team(s)

Mining and Exploiting Interaction Patterns in Networks. 01/01/2018 - 31/12/2021

Abstract

Most works in network analysis concentrate on static graphs and find patterns such as which are the most influential nodes in the network. Very few existing methods are able to deal with repeated interactions between nodes in a network. The main goal of the project is hence to fill this gap by developing methods to identify patterns in interactions between network nodes. These interaction patterns could characterize information propagation in social networks, or money streams in financial transaction networks. We consider three orthogonal dimensions. The first one is the pattern type. We consider, among others, temporal paths, information cascade trees and cycles. To guide our choice of which patterns to study, we get inspiration from three real-world cases: two interaction networks with payment data, one for which the task is marketing related, and one for default prediction, and one social network with an application in microfinance. The second dimension is how to use the query pattern: exhaustively find all occurrences of the patterns, or as a participation query that finds nodes that participate more often in a pattern of interest. Finally, the third dimension concerns the computational model: offline, one-pass, or streaming. It is important to scale up to large interaction networks. In summary, the novelty of our proposal lies in the combination of streaming techniques, pattern mining, and social network analysis, validated on three real-world cases.

Researcher(s)

Research team(s)

The Development and Use of Datamining Techniques for Better Decision Making. 01/10/2015 - 30/09/2025

Abstract

This project concerns the development of data mining techniques with applications in the broad business administration domain. From a theoretical perspective, several rule induction techniques (AntMiner+ and ALBA) and data analysis frameworks have been developed. The final acceptability of the models is always of primary concern in the research by including domain knowledge and focusing on comprehensible data mining models. From an application perspective, the P.I. works mainly in a credit risk management and marketing setting, as well as innovatively applied data mining in the software engineering, auditing and corporate performance domains. Current and future research further expands on previous findings, among others moving from classification to regression techniques and frameworks. Additionally, a strong focus on the use of networked data is envisioned, one of the key research directions in current data mining research. How to obtain and apply such data when no explicit social network is available, such as in the banking industry, constitutes one of the core theoretical research objectives. Marketing applications include the prediction of response, churn and wallet share, while an interesting risk management application is credit scoring, both at the retail and corporate level.

Researcher(s)

Research team(s)

Explaining deep learning models for behavioral data. 01/10/2018 - 30/09/2020

Abstract

As a consequence of digitalization, more aspects of people's lives are being captured. Examples include visiting particular physical locations or webpages, liking Facebook pages, etc. This behavioral data holds significant predictive power. For example, what you like on Facebook can be predictive for your IQ, product interest, and even creditworthiness. Deep learning has been shown to outperform other prediction techniques in making accurate predictions using behavioral data. Combining behavioral data and deep learning unfortunately results in incomprehensible black box predictions. Three reasons why: (1) behavioral data is very high-dimensional (up to millions of features), (2) the data is sparse, so every feature is only of relevance for a few data instances, and (3) the deep learning model is complex and non-linear. Consequently, although the combination of deep learning and behavioral data is so predictive, it is very difficult to understand why the model is making certain predictions, leading to skepticism to use it in practice. The main contribution of this research proposal is to design new algorithms that explain the complex deep learning prediction models. This comprehensibility issue is a research area that has gained attention in the data mining community because of the implications it has on model deployment and transparency towards users. We will validate our findings in several applications.

Researcher(s)

Research team(s)

Fraud Detection by Finding Patterns in the Dynamics of Shareholders Networks. 01/10/2018 - 31/12/2019

Abstract

The current fight against fiscal fraud is confronted with a number of significant challenges, as fraudsters adopt ever growing complex structures and operate in an organized fashion. In this project we will investigate how the shareholder network structures change over time in legitimate versus fraudulent cases. To do so, we will apply data mining techniques on a unique dataset that we have obtained from a European tax administration, with data of company ownership networks from 2006 till today.

Researcher(s)

Research team(s)

Fraud Detection Data mining. 02/01/2017 - 31/12/2017

Abstract

Fraud detection using data mining in a government (fiscal) setting. This project is a collaboration between the University of Antwerp (Antwerp Tax Academy) and the Financial Administration of the Belgian government.

Researcher(s)

Research team(s)

How political news affects and is affected by citizens in the social media age. Theoretical challenges and empirical opportunities 01/01/2017 - 31/12/2020

Abstract

In a democracy, citizens need knowledge about politics. The mass media are traditionally considered as key actors in providing this necessary information. Ample studies on agenda-setting and framing have shown time and again that the news media have a profound influence on what people know, and how they think about politics. The question is to what extent it is possible to maintain many of these classic insights in the digital era. The increasing importance of the Internet and in particular social media as a means of communication and information has likely changed how people learn about what is going on in the world, and about politics more specifically. For instance, the agenda-setting and framing role of the media is challenged, because social media use puts the underlying causal mechanism, from mass media to the public, into question. More and more journalists are influenced by discussions on blogs, Facebook, Twitter and other platforms. In addition, politicians have more digital opportunities to directly influence the public while bypassing the traditional media. In short, we aim to study consume and engage with political news and how they are affected by it, but also on how journalists and politicians are, in turn, influenced by people's engagement with the news. Digital media not only challenge some of the established theoretical insights but simultaneously also offer new opportunities to study how information spreads and how the public deals with it. Today, it is possible to map all online news and all citizens' digital reactions to it (comments, likes, tweets). This makes it possible to study much more accurately agenda-setting processes by how people interact with news. Framing, as well, can be studied now much more precisely and especially drawing on much larger samples of citizens and media messages. In addition, analyzing digital text and expressed opinion in social media allows demographic and attitudinal profiling of citizens that could strongly increase our knowledge of the individual moderators of agenda-setting and framing effects. To make sense of this unprecedented source of written language and digital behaviour, we opt for a multidisciplinary collaboration between computational linguistics, data mining and social sciences. The appropriateness of social scientific theories of agenda-setting and framing will be put to the test in a digital context by means of big data analyses. Computational linguistics techniques will be used to automatically analyze the topics addressed in social media text, the opinions expressed about these topic, and the profiles of the social media users expressing these opinions. The possibilities of digital text analysis, however, go beyond testing classic media effects theories such as agenda-setting and framing. Our ambition is to use the new data opportunities to develop new theoretical insights by discovering underlying patterns in an inductive fashion. By applying data mining techniques on the data of users' digital behavior and searching for underlying patterns, we may obtain insights into which events, persons and topics ordinary citizens 'like' and want to 'share'. Concretely, we aim to study one planned major political event, the 2019 Belgian election campaign, and one non-planned or unexpected event in the course of 2018. We expect that the information flows in both types of events are structurally different. For each event we plan a survey and a large quantitative data collection covering about four weeks, with content drawn from all major online news websites, and the social media platforms Twitter and Facebook.

Researcher(s)

Research team(s)

Economic valuation of Behavioral Data in Predictive Modeling. 01/10/2016 - 31/12/2016

Abstract

Previous research has shown that there is tremendous predictive value in fine-grained behavioral data (such as browsing, location or payment data) to make predictions about consumer behavior. This project focuses on the question: if data is used from a large set of players in a coalition, what is the value of the data of each of the data providers?

Researcher(s)

Research team(s)

Innovative credit scoring modeling using textual and social network data. 01/10/2015 - 04/09/2017

Abstract

It is the purpose of this research project to come up with new, original and groundbreaking approaches for credit risk modeling through innovations in input data to model different aspects of credit risk. This research project will focus on the potential of social network and textual input data, and consists of four research objectives.

Researcher(s)

Research team(s)

An index to measure the perception of the ECB communication. 23/02/2015 - 15/03/2015

Abstract

The objective of the project is to consolidate the computation procedure for the index and to validate the preliminary results obtained with a view to develop a tool to regularly assess the public perception of ECB's monetary policy communication.

Researcher(s)

Research team(s)

Data mining for tax fraud detection. 01/01/2015 - 31/12/2018

Abstract

With the globalisation of the world's economies and ever-evolving financial structures, fraud has become one of the main dissipaters of government wealth and possibly a major contributor in the slowing down of economies in general. Automated data mining systems that look for fraud patterns in historical data, have been on the rise to tackle this problem. In this multidisciplinary project, we will develop, apply and validate new data mining techniques to accurately predict which entities (be it companies or persons) are likely fraudsters, by considering concepts as different data types, privacy, intuitiveness and comprehensibility of the predictions. This project will make use of existing contacts with the federal government, and will be the formal start of a novel research theme within the Antwerp Tax Academy.

Researcher(s)

Research team(s)

A publicly available Economic Uncertainty Index for all G8 countries using text mining techniques. 01/10/2014 - 30/09/2015

Abstract

In this project we focus on the question: how can we measure economic policy uncertainty (EPU)? We recently proposed an EPU index for Belgium, by mining online news articles from all major newspapers. Given the promising results, we aim to apply this text mining-based methodology to other countries as well (G8 countries), and create a publicly available website where the index is automatically updated for all countries on a weekly basis.

Researcher(s)

Research team(s)

An index to measure the perception of the ECB communication. 19/05/2014 - 31/08/2014

Abstract

In this research project, the development and implementation of a quantitative tool is foreseen, that should be available on a regular basis, which computes an index to assess the public perceptions of the ECB's monetary policy communication.

Researcher(s)

Research team(s)

Predictive Analysis on A card data. 01/04/2014 - 30/04/2015

Abstract

This project represents a formal research agreement between UA and on the other hand Stad Antwerpen. UA provides Stad Antwerpen research results mentioned in the title of the project under the conditions as stipulated in this contract.

Researcher(s)

Research team(s)

Big Data Mining for Customer Analytics. 01/01/2014 - 31/12/2017

Abstract

Companies are currently storing massive volumes of data, which remain largely untapped. Data mining aims at finding interesting patterns in these vast amounts of data. The discovered patterns allow companies to better understand and even predict the needs and wishes of their customers. Response modelling is a common marketing application where we want to predict which customers are likely interested in a new product offering, based on historical patterns. The ability to predict the future has huge advantages for businesses, both in terms of increased profits and decreased losses. Even within this field, an important challenge nowadays that we try to address in this project, is how to analyze the huge amounts of granular behavioural data that is available to improve the performance of the resulting data mining models. Current data mining techniques can hardly be applied on such "big data".

Researcher(s)

Research team(s)

Innovative credit scoring modeling using textual and social network data. 01/10/2013 - 30/09/2015

Abstract

It is the purpose of this research project to come up with new, original and groundbreaking approaches for credit risk modeling through innovations in input data to model different aspects of credit risk. For financial institutions, it is important to know which input data has the best prediction performance, how these data should be handled and which intrinsic characteristics should be taken into account to obtain the most accurate credit risk prediction. This research project will focus on the potential of social network and textual input data, and consists of four research objectives

Researcher(s)

Research team(s)

New opportunities in digital advertising for publishers (DiDaM). 01/10/2013 - 30/09/2014

Abstract

The overall goal is to leverage available data assets of news publishers by finding patterns that allow a personalized approach, e.g. in the targeting of online ads. The insights that will be obtained by mining such data with specialized "big data analytics" algorithms, will be used for premium advertising and performance advertising (selling by clicks on ads).

Researcher(s)

Research team(s)

Project website

Fraud Detection using data mining. 17/05/2013 - 30/09/2013

Abstract

Patterns are looked for in fraud data with the use of data mining techniques. Specifically tailored big data mining techniques will be validated on the obtained anonymized transactional data and fraud labels.

Researcher(s)

Research team(s)

Bi-graph based social network analysis and learning. 01/10/2012 - 30/09/2016

Abstract

Many real-life networks are bi-partite in nature, meaning the nodes of the network can be separated in two disjoint types and edges exist only between nodes of different type. Think for example of academic authors being linked to the papers they have authored, or mobile devices linked to the locations they visited. Quite often only the projected network is used: a network of authors, linked if they share a paper, or a network of mobile devices, linked if they visited the same location. This however leads to substantial information loss and an increase in network size. Although network analysis and learning has emerged as an important field in both social sciences, humanities and computer science, very little work exists on this specific type of network. In this project we will define new metrics to analyze the global properties of such networks, study their evolution over time, develop tailored network learning techniques, and validate our designs with large-scale network data. We shall specifically focus on three real-life cases: the author-paper network using public data as well using as data from University of Antwerp, the customer-payment receiver network using data from a large European bank, and finally a mobile device-location network using data from a US-based ad exchange. Our findings should lead to novel insights into human behavior, theory building and improved predictive modeling.

Researcher(s)

Research team(s)

Social network learning for marketing and finance. 01/01/2012 - 31/12/2015

Abstract

Social network data is very valuable for marketing purposes, as social relationships tend to be made between people with similar characteristics, a concept known as homophily. Addressing network neighbors of current customers can therefore be a very efficient marketing strategy. In this project, we will develop and apply advanced social network analysis algorithms for marketing and finance applications.

Researcher(s)

Research team(s)

Analyzing the impact of news and market reports on Belgian stock prices through text mining. 01/01/2011 - 31/12/2013

Abstract

In this project we will investigate how general and stock market specific news items can be analysed with advanced text mining techniques to automatically predict the effect on Belgian stock prices. Insights will be obtained into which news providers and which combinations of words have the largest effect. The developed system will be evaluated as a trading tool, as well as decision support system for investors.

Researcher(s)

Research team(s)

The Development and Use of Datamining Techniques for Better Decision Making. 01/10/2010 - 30/09/2015

Abstract

This project concerns the development of data mining techniques with applications in the broad business administration domain. From a theoretical perspective, several rule induction techniques (AntMiner+ and ALBA) and data analysis frameworks have been developed. The final acceptability of the models is always of primary concern in the research by including domain knowledge and focusing on comprehensible data mining models. From an application perspective, the P.I. works mainly in a credit risk management and marketing setting, as well as innovatively applied data mining in the software engineering, auditing and corporate performance domains. Current and future research further expands on previous findings, among others moving from classification to regression techniques and frameworks. Additionally, a strong focus on the use of networked data is envisioned, one of the key research directions in current data mining research. How to obtain and apply such data when no explicit social network is available, such as in the banking industry, constitutes one of the core theoretical research objectives. Marketing applications include the prediction of response, churn and wallet share, while an interesting risk management application is credit scoring, both at the retail and corporate level.

Researcher(s)

Research team(s)