Advances in data mining for risk and uncertainty using textual and behavioural data
16 June 2017
University of Antwerp - Stadscampus - Promotiezaal Grauwzusters - Lange Sint-Annastraat 7 - 2000 Antwerp (route: UAntwerpen, Stadscampus
Prof David Martens
PhD defence Ellen Tobback - Faculty of Applied Economics
Technological advancements and digital transformations have enabled the massive collection and storage of 'new' data sources, such as social media data, browsing data and mobile data. Most of these new data sources can be classified as 'big data'. They are often high-dimensional and sparse, and require specifically tailored data mining algorithms. Amongst high-dimensional data sources are textual data, relational data and behavioural data. Textual data is data in textual form, relational data describes connections and interactions between two entities, and behavioural data provides evidence of a person's behaviours, actions and interests. While all three data types have been successfully used for various applications, such as churn prediction and targeted advertising, their potential use for credit scoring and policy making remains largely unexplored.
The aim of this dissertation is to investigate how to leverage unconventional data sources for credit risk prediction and policy uncertainty. We explore the added value of these new data sources and accompanying techniques using six real-life data sets. The dissertation can be divided into two main parts. In the first part, we use behavioural, relational and traditional data to predict retail and corporate credit risk. More specifically, we use relational data on a company's directors and managers to predict bankruptcy, payment data to predict retail loan default and Facebook data for microfinance credit scoring.
We model the data in a relational manner, creating networks between the borrowers using the relational and behavioural data. In the second part, we focus on the use of textual data for policy applications. We investigate how advanced text mining techniques can be applied to quantify the information that is contained in news articles. Using this information, we try to measure economic policy uncertainty and the media's perception of the European Central Bank's communication. With these applications, we show that there is a place for data science in the world of economics and policy making.