Research team

Expertise

Development and study of advanced methods for Data Mining, Big Data Analytics, Recommender Systems, Data Cleaning, and other technologies related to the management and analysis of large amounts of data.

Interpretable rule-based recommender systems. 01/11/2023 - 31/10/2026

Abstract

Recommender systems help users identify the most relevant items from a huge catalogue. In recent independent evaluation studies of recommender systems, baseline association rule models are competitive with more complex state-of-the-art methods. Moreover, rule-based recommender algorithms have several exciting properties, such as the potential to be interpretable, the ability to identify local patterns and the support of context-aware predictions. First, we survey various existing recommendation algorithms with different biases and prediction strategies and evaluate them independently. Besides accuracy, we evaluate coverage and diversity and analyse the structure of the resulting rule models, which are essential towards understanding interpretability. Second, we propose to gap the bridge between recommender systems and recent multi-label classification based on learning an optimal set of rules w.r.t. to a custom loss function. We study if a decision-theoretic framework can guarantee the identification of the optimal rules for recommender systems under a loss function combining accuracy, complexity and diversity. We account for characteristics unique to recommender datasets, such as skewed distribution, implicit feedback and scale. Finally, we adopt new rule-based algorithms that are interpretable and more accurate. We apply them for healthcare recommendations to improve intensive care unit monitoring and online bandit learning for large-scale websites for e-commerce and news.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Exploring Unlearning Methods to Ensure the Privacy, Security, and Usability of Recommender Systems. 01/11/2023 - 31/10/2025

Abstract

Machine learning algorithms have proven highly effective in analyzing large amounts of data and identifying complex patterns and relationships. One application of machine learning that has received significant attention in recent years is recommender systems, which are algorithms that analyze user behavior and other data to suggest items or content that a user may be interested in. However, these systems may unintentionally retain sensitive, outdated, or faulty information. Posing a risk to user privacy, system security, and usability. In this research proposal, we aim to address this challenge by investigating methods for machine "unlearning", which would allow information to be efficiently "forgotten" or "unlearned" from machine learning models. The main objective of this proposal is to develop the foundation for future machine unlearning methods. We first evaluate current unlearning methods and explore novel adversarial attacks on these methods' verifiability, efficiency, and accuracy to gain new insights and develop the theory of unlearning. Using our gathered insights, we seek to create novel unlearning methods that are verifiable, efficient, and don't lead to unnecessary accuracy degradation. Through this research, we seek to make significant contributions to the theoretical foundations of machine unlearning while also developing unlearning methods that can be applied to real-world problems.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Serendipity Engine: towards surprising and interesting urban experiences. 01/10/2022 - 30/09/2026

Abstract

Concerns exist regarding the controlling and restricting nature of today's recommender systems. The trend is towards serving predictable, popular and homogeneous content, which is often referred to as "filter bubbles". In an urban context, this means that people are no longer exposed to the diversity of cities and their inhabitants, which has negative consequences for the open and democratic character of the city. This is a timely issue that needs urgent attention and there is a societal call for a transition towards applications that promote serendipity. However, what is missing today is a clear understanding of the meaning and value of serendipity in urban environments, and how this can be engendered in digital applications. In this project, we will develop such an understanding and identify the potential role of governing organisations in introducing serendipity to urban information systems. Additionally, the project will investigate how developers can design for serendipity. This will be studied on the level of data, algorithms and design. This approach is inspired by the theory of affordances and the findings that (digital) environments can be designed to afford serendipity. The affordances (in terms of data, algorithms and design) will be designed, developed and validated using Living Lab methodologies in three urban pilot scenarios. To support this Living Lab approach, a novel research methodology will be developed to study users' experienced serendipity.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Qualitative Evaluation of Machine Learning Models. 01/11/2020 - 31/10/2024

Abstract

A common and recently widely accepted problem in the field of machine learning is the black box nature of many algorithms. In practice, machine learning algorithms are typically being viewed in terms of their inputs and outputs, but without any knowledge of their internal workings. Perhaps the most notorious examples in this context are artificial neural networks and deep learning techniques, but they are certainly not the only techniques that suffer from this problem. Matrix factorisation models for recommendation systems, for example, suffer from the same lack of interpretability. Our research focuses on applying and adapting pattern mining techniques to gain meaningful insights in big data algorithms by analyzing them in terms of both their input and output, also allowing us to compare different algorithms and discover the hidden biases that lead to those differences.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Research Program Artificial Intelligence 01/01/2023 - 31/12/2023

Abstract

The Flanders AI Research Program focuses on demand-driven, leading-edge, generic AI research for numerous applications in the health and care sector and industry, for governments and their citizens. The requirements were indicated by users from these application domains.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Flanders AI 01/01/2022 - 31/12/2022

Abstract

The Flemish AI research program aims to stimulate strategic basic research focusing on AI at the different Flemish universities and knowledge institutes. This research must be applicable and relevant for the Flemish industry. Concretely, 4 grand challenges 1. Help to make complex decisions: focusses on the complex decision-making despite the potential presence of wrongful or missing information in the datasets. 2. Extract and process information at the edge: focusses on the use of AI systems at the edge instead of in the cloud through the integration of software and hardware and the development of algorithms that require less power and other resources. 3. Interact autonomously with other decision-making entities: focusses on the collaboration between different autonomous AI systems. 4. Communicate and collaborate seamlessly with humans: focusses on the natural interaction between humans and AI systems and the development of AI systems that can understand complex environments and can apply human-like reasoning.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Contextual anomaly detection for complex industrial assets (CONSCIOUS). 01/01/2021 - 30/06/2023

Abstract

CONSCIOUS (Contextual aNomaly deteCtIon for cOmplex indUstrial aSsets) focusses on context-aware anomaly detection in industrial machines and processes. In these complex environments, anomaly detection remains a major challenge caused by the highly dynamic conditions in which these assets operate. The overall objective is to research effective solutions to achieve a more accurate, robust, timely and interpretable anomaly detection in complex, heterogenous data from industrial assets by accounting for confounding contextual factors. The results will be validated on multiple real-world use cases in different domains. In this project, Sirris will collaborate with Skyline Communications, Duracell Batteries, I-care, Yazzoom, KU Leuven and University of Antwerp.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Interpretable Qualitative Evaluation for Online Recommender Systems. 01/10/2020 - 30/09/2021

Abstract

Individuals often rely on recommendations provided by others in making routine, daily decisions. Algorithms, mimicking this behaviour, are vital to the success of e-commerce services. However, a remaining open question is why algorithms make these recommendations. This is problematic given that, the most accurate machine learning algorithms are black-box models, and we have a dynamic environment were possibly multiple models are deployed and periodically re-trained. Since any organisation requires human oversight and decision-making, there is a need for insight into user behaviour and interactions with recommendations made by black-box machine learning algorithms. Traditionally, two recommender systems are compared based on a single metric, such as click-through-rate after an A/B test. We will assess the performance of online recommender systems qualitatively by uncovering patterns that are characteristic for the differences in targeted users and items. We propose to adopt interpretable machine learning, where the goal is to produce explanations that can be used to guide processes of human understanding and decisions. We propose to mine interpretable association rules and generate, possibly grouped, counterfactual explanations why recommender system A performs better (or worse) than recommender system B.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Francqui Chair 2019-2020 Prof. Luc De Raedt (KULeuven). 01/10/2019 - 30/09/2020

Abstract

Proposed by the University, the Francqui Foundation each year awards two Francqui Chairs at the UAntwerp. These are intended to enable the invitation of a professor from another Belgian University or from abroad for a series of ten lessons. The Francqui Foundation pays the fee for these ten lessons directly to the holder of a Francqui Chair.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Rsearch Programm Artificial Intelligence. 01/07/2019 - 31/12/2021

Abstract

The Flemish AI research program aims to stimulate strategic basic research focusing on AI at the different Flemish universities and knowledge institutes. This research must be applicable and relevant for the Flemish industry. Concretely, 4 grand challenges 1. Help to make complex decisions: focusses on the complex decision-making despite the potential presence of wrongful or missing information in the datasets. 2. Extract and process information at the edge: focusses on the use of AI systems at the edge instead of in the cloud through the integration of software and hardware and the development of algorithms that require less power and other resources. 3. Interact autonomously with other decision-making entities: focusses on the collaboration between different autonomous AI systems. 4. Communicate and collaborate seamlessly with humans: focusses on the natural interaction between humans and AI systems and the development of AI systems that can understand complex environments and can apply human-like reasoning.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Pattern based recommender systems. 01/04/2019 - 31/03/2023

Abstract

The goal of this project is to develop and study new algorithms for recommender systems that can be used in the Froomle platform. The focus of this research will be more specifically towards new methods that make use of recent developments in pattern mining.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Personaliesd search. 01/01/2019 - 31/12/2022

Abstract

It is our ambition to fundamentally move forward the state-of-the-art in personalised search - with a focus on e-commerce, e-news and video - by studying and developing new personalised search algorithms taking into account both the searched keywords and the full picture of the user's on- site and in-store (offline) behaviour. We will address the following research questions. First, in the context of personalised search, how can we measure and evaluate success? Personalised search is a relatively young research domain and as such there is not yet a standardised framework or benchmark dataset for evaluating performance, as there is in learning-to-rank or recommender systems. It is our goal to develop such a standardised framework and create a benchmark dataset that can be used across experiments. Additionally, given this project's unique position on the border between research and industry, we can not only measure the performance of the algorithms offline, but also online, with Froomle's existing clients. It is our expectation that clients in different industries will have different measures of success, e.g. clients in media may want to keep users engaged, whereas clients in retail might want to shorten the path to a purchase. Hence, we aim to identify these KPIs and lay down a framework for evaluation for each. Concretely, our goal is to do a live test in retail, in video and in news, evaluating the results with the KPI's developed specifically for the corresponding domain. Second, how can personal and search relevance be combined to determine an optimal ranking of items personalised to the individual? In order to provide the user with relevant search results ranked to their personal tastes, one needs to establish a means of combining (at least) two measures of relevance: relevance to the query and relevance to the person. Both measures can again be composites of multiple "features", e.g. pageviews, purchases, etc. for personal relevance and query match-score, authority and recency for search relevance. Here, we aim to identify which features can be relevant in delivering an optimal personalised search experience, e.g. pageviews and recency, but not authority and purchases. Then, we address the problem of combining these scores. This problem is anything but trivial and a static combination of personal and search relevance do not suffice. To solve this problem, we will develop at least one ranking algorithm that can transform multiple inputs into an optimal ranking, personalised to the individual. This requires that we will define at least one new learning objective that takes into account this personal aspect of the optimal ranking. Furthermore we will measure the corresponding performance improvement on at least one live application according to the principles and methodology derived by research question 1. Third, can we build an integrating ranking solution that approaches the problem of personalised search as a problem of optimally inferring the user's intent, rather than a problem of optimally combining the user's query with his historical behaviour? From this then builds the final research question. Rather than optimally combining query-based relevance with behaviour-based relevance, can we instead approach search as a recommendation problem, where a search query is merely an extra tool in our tool belt that will help us determine the user's current intent? Our goal is to develop at least one such algorithm and measure the corresponding performance improvement on at least one live application. Developing these new algorithms for personalised search and a framework for evaluation will allow Froomle to add personalised search to their current offering of advanced recommender systems. This will be an important step in bridging the gap between the giants of technology and other, traditionally offline businesses with a focus on e-commerce, e-news and video. 


Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Foundations of Recommender Systems. 01/10/2017 - 30/09/2021

Abstract

Recommender systems are algorithms that are most well known for their applications in e-commerce. Given a specific customer and a large number of products, they automatically find the most relevant products to that specific customer. However, their relevance goes well beyond. They can also recommend genes responsible for diseases, words relevant to documents, tags relevant to a photo, courses of interest to a student etc. The existing research on recommender systems is almost fully determined by the datasets that are (publicly) available. Therefore, the following fundamental question remains largely unstudied: "Given two datasets, how can we determine which of both has the highest quality for generating recommendations?" Furthermore, the cornerstone of recommender systems research is the evaluation of the recommendations that are made by the recommender system. Most existing research relies upon historical datasets for assessing the quality of recommendations. There is however no convincing evidence that the performance of recommendations on historical datasets is a good proxy for their performance in real-life settings. Hence, also a second fundamental question remains largely unstudied: "How does the real-life performance of recommender systems correlate with measures that can be computed on historical data?" By means of this project proposal, we set out to answer these two questions, which are foundational to recommender systems research.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Data mining continuous speech: Modeling infant speech acquisition by extracting building blocks and patterns in spoken language 01/10/2017 - 30/09/2019

Abstract

Complex use of language, and in particular speech, is one of the defining characteristics of humans, setting us apart from animals. In the last few decades, speech recognition has found many applications and is now, for example, a standard feature on modern smartphones. However, the flexible and powerful learning capacities of human infants have still not been equalled by any machine. Young children find a way to make sense of all the speech they hear and generalize it in a way that the patterns in the speech sounds can be disentangled, understood and repeated. In a separate line of research, the field of machine learning and data mining, algorithms have been developed to discover patterns in data. The information that can be extracted from all the available data has become an important aspect of business, if we look at video recommendation systems or the financial sector. The idea of my research is to develop and study techniques inspired by these data mining algorithms, in order to extract patterns from speech. The inherent difficulties of continuous and noisy speech have to be overcome, as it cannot just be processed in the same way as discrete and exact data. After adapting these methods and applying them to speech, I will use them in the scientific research on the building blocks of speech, evaluating their relevance and validity. Furthermore, using these, I will investigate what aspects of speech children need, and subsequently use, to learn about these building blocks.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Guiding networked societies, linking data science and modelling 01/01/2017 - 31/12/2021

Abstract

Networks of interconnected autonomous computing entities more and more support our society, interacting and influencing each other in complex and unforeseen ways. Examples are smart grids, intelligent traffic lights, logistics and voluntary peer-to-peer clouds as well as socio-technical systems or more generally the Internet of Things. Understanding the characteristics and dynamics of these systems both at the local and global scale is crucial in order to be able to guide such systems to desirable states. The partners participating in this WOG proposal each study crucial features of such complex systems, or they are experts in related fields that offer complementary techniques to analyze the massive data that is generated by them. Bringing these orthogonal fields of expertise together in a scientific research community, promises to give great opportunity for cross-fertilization and the development of novel analysis and control techniques.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

City of Things 01/01/2017 - 31/12/2020

Abstract

Cities are relying on Internet of Things (IoT) to make their infrastructure smart by using advanced sensing and control devices within the city's infrastructure with the goal of improving urban living, city's experience, etc. Analysis of the data generated by a wide range of sensors and actuators allows controlling the city in a better and more automated way, with respect to e.g. the view on the city's mobility patterns. To realize a smart city infrastructure we consider three layers: the network/sensor layer, i.e. a city-wide network based on a variety of communication technologies and its protocol stacks together with a variety of sensors allowing the collection of raw data; a data layer, dealing with the continuous stream of data and its techniques for processing, storing, mining; an application layer, responsible for interpreting the processed data stream for more optimally controlling the city. The network/sensor layer will be covered by the MOSAIC research group (Dept. Mathematics and Computer Science, Chris Blondia and Steven Latré), while the data layer will be dealt with by the ADREM research group (Dept. Mathematics and Computer Science, Bart Goethals) and finally the application layer is the responsibility of the Transport and Regional Economics research group (Dept. Transport and Regional Economics, Eddy Van de Voorde and Thierry Vaneslander). The general aim of this project is to bring together the expertise present at the University of Antwerp at each of these layers, in order to bundle the research and come up – through an intensive collaboration - with a framework covering the three layers. More specifically, we will build an integrated smart city platform, tailored towards mobility, that allows to capture, process, analyze, interpret and control smart city data in general and mobility data particularly. As discussed in the next section, this will result in important novel research contributions in each of the three layers and will result in a proof-of-concept where the research results are combined into a demonstrator.

Researcher(s)

Research team(s)

Project website

Project type(s)

  • Research Project

City of Things (CoT). 01/05/2016 - 30/04/2020

Abstract

As everyday devices are being connected to the Internet, research on large-scale wireless sensors networks specifically and Internet of Things (IoT) generally are becoming more and more important. There is a considerable research and innovation effort related to the deployment of smart cities using this IoT technology. However, there are still plenty of hurdles to move from R&D to implementation and real mass-scale deployment of wireless sensors networks. Moreover, the city itself is a treasure of data to be explored if the right sensors can be installed. Testbeds are the preferred tools for academic and industrial researchers to evaluate their research but a large-scale multi-technology smart city research infrastructure is currently the missing link. The City of Things research infrastructure will build a multi-technology and multi-level testbed in the city of Antwerp. As a result, 100 locations around the city of Antwerp and its harbour will be equipped with gateways supporting multiple wireless IoT protocols. These gateways will connect with hundreds of wireless sensors and actuators, measuring smart city parameters such as traffic flows, noise, air pollution, etc.

Researcher(s)

Research team(s)

Project website

Project type(s)

  • Research Project

Validation of the ADReM personalization algorithms as the basis for a spin-off. 01/05/2016 - 30/04/2017

Abstract

Personalization technology allows a company to present to every individual customer a personalized selection of relevant products out of a huge catalog. As a result of its research activities in the field of recommender systems, the ADReM research group has gained knowledge and expertise about personalization technology. This projects aims to start up a spin-off company to valorize this knowledge and expertise.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Declarative methods in computer science. 01/01/2016 - 31/12/2020

Abstract

To cope with the need to build increasingly large and complex software systems, there is a growing demand for declarative approaches which abstract away unnecessary details and focus on the functionality of the systems. The network wants to further promote the development of such approaches which emerge from work in databases, functional and logic programming.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Hypermodelling strategies on multi-stream time-series data for operational optimization (HYMOP). 01/10/2015 - 30/09/2019

Abstract

HYMOP aims at tackling a collective challenge put forward by a number of Flemish lead user companies: optimizing the operation and maintenance of a fleet of industrial machines. Realizing innovative modeling and data processing/analysis techniques that are able to cope with large amounts of complex data in real-time will allow these lead users, 12 of which are brought together in our Industrial Advisory Committee, to exploit the huge potential and currently underexplored opportunity enabled by ever more sensorized and connected industrial equipment.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Data mining continuous speech: Modeling infant speech acquisition by extracting building blocks and patterns in spoken language. 01/10/2015 - 30/09/2017

Abstract

Complex use of language, and in particular speech, is one of the defining characteristics of humans, setting us apart from animals. In the last few decades, speech recognition has found many applications and is now, for example, a standard feature on modern smartphones. However, the flexible and powerful learning capacities of human infants have still not been equalled by any machine. Young children find a way to make sense of all the speech they hear and generalize it in a way that the patterns in the speech sounds can be disentangled, understood and repeated. In a separate line of research, the field of machine learning and data mining, algorithms have been developed to discover patterns in data. The information that can be extracted from all the available data has become an important aspect of business, if we look at video recommendation systems or the financial sector. The idea of my research is to develop and study techniques inspired by these data mining algorithms, in order to extract patterns from speech. The inherent difficulties of continuous and noisy speech have to be overcome, as it cannot just be processed in the same way as discrete and exact data. After adapting these methods and applying them to speech, I will use them in the scientific research on the building blocks of speech, evaluating their relevance and validity. Furthermore, using these, I will investigate what aspects of speech children need, and subsequently use, to learn about these building blocks.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Research in the field of the pattern mining. 01/10/2015 - 30/09/2016

Abstract

This project represents a research contract awarded by the University of Antwerp. The supervisor provides the Antwerp University research mentioned in the title of the project under the conditions stipulated by the university.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Integration of pattern mining in an industrial environment. 01/07/2015 - 30/06/2016

Abstract

In this project our aim is to further develop MIME, an interactive tool for interactive pattern analysis that has been developed at the University of Antwerp, and to make it suitable for industrial use. This will enable companies to find interesting and actionable patterns in their data that can be used for important operational decision making much more easily and quickly.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Data fusion and structured input and output Machine Learning techniques for automated clinical coding. 01/01/2014 - 31/12/2017

Abstract

This project will improve the state of the art in automated clinical coding by analyzing heterogeneous data sources and defining them in a semantic structure and by developing novel data fusion and machine learning techniques for structured input and output.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Exascience Life Pharma. 01/07/2013 - 31/12/2015

Abstract

This project represents a formal research agreement between UA and on the other hand Janssen. UA provides Janssen research results mentioned in the title of the project under the conditions as stipulated in this contract.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Fraud Detection using data mining. 17/05/2013 - 30/09/2013

Abstract

Patterns are looked for in fraud data with the use of data mining techniques. Specifically tailored big data mining techniques will be validated on the obtained anonymized transactional data and fraud labels.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

An integrated informatics platform for mass spectrometry-based protein assays (InSPECtor). 01/03/2013 - 28/02/2017

Abstract

This project represents a research agreement between the UA and on the onther hand IWT. UA provides IWT research results mentioned in the title of the project under the conditions as stipulated in this contract.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Evolving graph patterns. 01/01/2013 - 31/12/2016

Abstract

The goals of this project are first addressed from a theoretical perspective. Furthermore, techniques are studied using both synthetic and real experimental data. The concept of evolving graph patterns is relevant for a large series of application domains. However, we will particularly validate our approaches with bioinformatics applications, for which the extraction of this new pattern type is highly interesting.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Development of an automated software platform to support and improve the quality of clinical coding. 01/01/2013 - 31/12/2013

Abstract

The goal of this project is to develop algorithms and software to improve the quality of the clinical coding process in hospitals, and to design a valorization plan. The algorithms will automatically identify coding anomalies and suggest codes using state-of-the-art machine learning techniques. We will define a business development plan, attract potential customers, and aim to attract follow-up funding.

Researcher(s)

  • Promoter: Van den Bulcke Tim
  • Co-promoter: Goethals Bart
  • Co-promoter: Luyckx Kim
  • Co-promoter: Luyten Leon
  • Co-promoter: Smets Koen

Research team(s)

Project type(s)

  • Research Project

Verifiable Outlier Mining. 01/10/2012 - 30/09/2015

Abstract

In a nutshell, the aim of this project is to provide easy-to-understand descriptions that assist humans in manual outlier verification. We propose a novel research direction called "verifiable outlier mining" tackling open challenges in the automatic extraction of outlier descriptions. In our example, descriptions could indicate how one patient is deviating from others. Such descriptions will include the relevant attributes (e.g., "age" and "skin humidity" in our example), but also regular objects as witnesses from which the outlier is deviating. To accomplish this, the main topic addressed by this proposal is the statistically founded and scalable selection of attribute combinations highlighting the outlierness of an object.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Development of the BioGraph technology for valorisation in life sciences industries. 01/03/2012 - 28/02/2013

Abstract

BioGraph is a data mining technology, developed at the University of Antwerp, for unsupervised biomedical knowledge discovery via automatically generated hypotheses in integrated knowledge databases. For this technology, we study business development opportunities and develop a specific off-the-shelf application for the interpretation of microarray studies.

Researcher(s)

Research team(s)

    Project type(s)

    • Research Project

    Instant Interactive Data Exploration. 01/01/2012 - 31/12/2015

    Abstract

    Today, we can easily store massive amounts of information, but we lack the means to exploratively analyse databases of this scale. That is, currently, there is no technology that allows to 'wander' around the data, and make discoveries by following intuition, or simple serendipity. While standard data mining is aimed at finding highly interesting results, it is typically computationally demanding and time consuming, and hence not suited for exploring large databases. To address this problem, we propose to study instant, interactive and adaptive data mining as a new data mining paradigm. Our goal is to study methods that give high-quality (possibly approximate) results instantly, presented understandably, interactively and adaptive as to allow the user to steer the method to the most informative areas in the database rapidly.

    Researcher(s)

    • Promoter: Goethals Bart
    • Co-promoter: Tatti Nikolaj
    • Co-promoter: Vreeken Jilles

    Research team(s)

    Project type(s)

    • Research Project

    BIOMINA: Pattern for the life sciences. 01/01/2012 - 30/06/2015

    Abstract

    Biomina (Biomedical Informatics Expertise Centre Antwerpen) is an interdisciplinary research collaboration between UA and UZA. It aims at the development of innovative techniques for the analysis and interpretation of heterogeneous biomedical data. Biomina operates on the integration point of clinical data and 'omics data (genome, proteome, transcriptome, ...). Structuring, integration and analysis of these data is the core activity. As a centralized expertise center and research platform, it enables systems biology and translational systems medicine research.

    Researcher(s)

    Research team(s)

    Project type(s)

    • Research Project

    Data mining for privacy in social networks. 01/01/2011 - 31/12/2014

    Abstract

    This is a fundamental research project financed by the Research Foundation - Flanders (FWO). The project was subsidized after selection by the FWO-expert panel.

    Researcher(s)

    Research team(s)

    Project type(s)

    • Research Project

    Finding characteristic pattern sets through compression. 01/10/2010 - 30/09/2013

    Abstract

    I propose to study the foundations of using compression as a means for finding compact, characteristic sets of patterns. Of particular interest is the investigation of the possibility of finding such patterns directly from data, and moreover, studying how recent insights in Minimum Description Length theory and Statistics can enhance the discovery of these patterns, and vice-versa. The ultimate goal of this project is to develop the theory that allows us to mine highly useful patterns directly from any database.

    Researcher(s)

    Research team(s)

    Project type(s)

    • Research Project

    Database Summarization. 01/01/2010 - 31/12/2011

    Abstract

    In this research we aim to find ways of summarizing a database by using the patterns that occur within it. Employing state of the art data mining techniques, the goal is to retrieve a concise subset of all patterns, that characterize the data as well as possible.

    Researcher(s)

    Research team(s)

    Project type(s)

    • Research Project

    Theoretical Foundations of Finding Significant Patterns in Data Mining. 01/10/2009 - 30/09/2012

    Abstract

    The umbrella topic of the research is to study the theoretical foundations of pattern mining in binary data. The rest of the section discusses different specific directions of the research. * Axiomatic Approach for Defining Measure of Significance of Itemsets : Our goal is to study whether this tradeoff holds in general? Our hypothesis is that the property required by the APRIORI algorithm poses strong conditions on how and what prior information can be used. * Pattern Mining for Datasets with Specific Form: Our goal is to study how such specific prior knowledge can be infused into pattern mining: The goal is to study whether we can use this information in defining a significance measure but also whether this information can be used for deducing efficient algorithms.

    Researcher(s)

    Research team(s)

    Project type(s)

    • Research Project

    Analysis of high-throughput data by means of support vector machines and kernel-based techniques: feature selection and adaptive model building. 01/10/2009 - 30/09/2011

    Abstract

    In many real-life applications, information gathered from measurements is essential to ensure the quality of products and to enable control of a production process. These measurements are typically obtained from online hardware analysers (e.g. thermometers, flow meters, etc). However, there are many characteristics that cannot be obtained through online equipment and for which time-consuming and computationally expensive analysis is required. For this reason models are typically used to predict the results of such an analysis from the process variables. The analysis is then used as a confirmation of the model. Models are sometimes also used to predict online hardware analysers. Online analysers may fail due to corrosion or drift from their calibration point. In this project we address a number of issues related to the construction of models using Support Vector Machines. Our interest in building models using SVMs has several reasons. - It is well-known that SVMs can handle high-dimensional data without suffering from the curse of dimensionality. - The use of kernels enables nonlinear modelling. - SVMs can be made insensitive to noise and outliers. - Finally, the ability of SVMs to identify "unusual" data points makes it useful in detecting outliers and anomalies. The issues we aim to address in this project are the following. I. Feature selection and incorporation of prior knowledge It is the aim to investigate whether similar results can be obtained for Support Vector Regression and how well the technique applies to single-class problems. II. Adaptive model building Techniques that can handle the adaptivity of the inferential sensor at all levels, and especially when the mathematical model needs to be partially rebuilt, are still in their infancy and are the second topic of this research project.

    Researcher(s)

    Research team(s)

    Project type(s)

    • Research Project

    Finding Characteristic Pattern Sets through Compression. 01/10/2009 - 30/09/2010

    Abstract

    Most pattern discovery algorithms easily generate very large numbers of patterns, making results impossible to understand and hard to use. In this project, we propose to develop and study general techniques to using compression as a means for finding compact, characteristic sets of patterns. Such pattern sets should contain only high quality patterns that are of direct interest to the user and her application.

    Researcher(s)

    Research team(s)

    Project type(s)

    • Research Project

    Principles of Pattern Set Mining for structured data. 01/07/2009 - 31/12/2013

    Abstract

    In this project, we propose to develop and study general techniques to mining sets of patterns directly. Such pattern sets should contain only high quality patterns that are of direct interest to the user and her application.By developing pattern set mining techniques, we hope to to lift pattern mining techniques from the local to the global level, which in turn should contribute to a better understanding of the role of pattern mining techniques in data mining and machine learning.

    Researcher(s)

    Research team(s)

    Project type(s)

    • Research Project

    Intelligent analysis and data-mining of mass spectrometry-based proteome data. 01/07/2009 - 30/06/2013

    Abstract

    Mass spectrometry is a powerful analytical technique to elucidate the structure of molecules, like proteins. Until now a significant fraction of the data coming from MS analysis remains uninterpretable. This projects aims to apply state-of-the-art data mining techniques to a large set of mass spectra, aiming to find new relevant patterns that may point towards unknown structural modifications.

    Researcher(s)

    Research team(s)

    Project type(s)

    • Research Project

    Machine learning for data mining and its applications. 01/01/2009 - 31/12/2013

    Abstract

    The research community aims at strengthening and coordinating the Flemish research about machine learning for datamining in general, and important applications such as bio-informatics and textmining in particular. Flemish participants: Computational Modeling Lab (VUB), CNTS (UA), ESAT-SISTA (KU Leuven), DTAI (KU Leuven), ADReM (UA).

    Researcher(s)

    Research team(s)

    Project type(s)

    • Research Project

    Principles of Pattern Set Mining. 01/01/2009 - 31/12/2012

    Abstract

    The overall goals of this project are 1) to establish a general computational framework for pattern set mining, 2) to study the computational properties of different types of selection predicates, 3) to develop algorithms and systems for dealing with pattern set mining, 4) to investigate how principles of constraint programming apply to pattern set mining, 5) to evaluate pattern set mining techniques on standard data mining and machine learning tasks, both conceptually and experimentally, and 6) to study representational and application aspects of pattern set mining.

    Researcher(s)

    Research team(s)

    Project type(s)

    • Research Project

    Analysis of high-throughput data by means of support vector machines and kernel-based techniques: feature selection and adaptive model building. 01/10/2007 - 18/01/2010

    Abstract

    In many real-life applications, information gathered from measurements is essential to ensure the quality of products and to enable control of a production process. These measurements are typically obtained from online hardware analysers (e.g. thermometers, flow meters, etc). However, there are many characteristics that cannot be obtained through online equipment and for which time-consuming and computationally expensive analysis is required. For this reason models are typically used to predict the results of such an analysis from the process variables. The analysis is then used as a confirmation of the model. Models are sometimes also used to predict online hardware analysers. Online analysers may fail due to corrosion or drift from their calibration point. In this project we address a number of issues related to the construction of models using Support Vector Machines. Our interest in building models using SVMs has several reasons. - It is well-known that SVMs can handle high-dimensional data without suffering from the curse of dimensionality. - The use of kernels enables nonlinear modelling. - SVMs can be made insensitive to noise and outliers. - Finally, the ability of SVMs to identify "unusual" data points makes it useful in detecting outliers and anomalies. The issues we aim to address in this project are the following. I. Feature selection and incorporation of prior knowledge It is the aim to investigate whether similar results can be obtained for Support Vector Regression and how well the technique applies to single-class problems. II. Adaptive model building Techniques that can handle the adaptivity of the inferential sensor at all levels, and especially when the mathematical model needs to be partially rebuilt, are still in their infancy and are the second topic of this research project.

    Researcher(s)

    • Promoter: Goethals Bart
    • Promoter: Verdonk Brigitte
    • Fellow: Smets Koen

    Research team(s)

    Project type(s)

    • Research Project

    Foundations of inductive databases for data mining. 01/01/2006 - 31/12/2009

    Abstract

    In this project, we study the realization of an inductive database model. The most important steps in the realization of such a model are : a) a uniform representation of patterns and data; b) a query-language for querying the data and the patterns; c) the integration of existing optimization techniques into the physical layer.

    Researcher(s)

    Research team(s)

    Project type(s)

    • Research Project

    IQ - Inductive queries for mining patterns and models. 01/09/2005 - 31/08/2008

    Abstract

    Given the present distinct lack of a generally accepted framework for data mining, the quest for such a framework is a major research priority. The most promising approach to this task is taken by inductive databases (IDBs), which contain not only data, but also patterns. Patterns can be either local patterns, such as frequent itemsets, which are of descriptive nature, or global models, such as decision trees, which are of predictive nature. In an IDB, inductive queries can be used to generate (mine), manipulate, and apply patterns. The IDB framework is appealing as a theory for data mining, because it employs declarative queries instead of ad hoc procedural constructs. Declarative queries are often formulated using constraints and inductive querying is closely related to constraint-based data mining. The IDB framework is also appealing for data mining applications, as it supports the process of knowledge discovery in databases (KDD): the results of one (inductive) query can be used as input for another and nontrivial multi-step KDD scenarios can be supported, rather than just single data mining operations.The state-of-the-art in IDBs is that there exist various effective approaches to constraint-based mining (inductive querying) of local patterns, such as frequent itemsets and sequences, most of which work in isolation. The proposed project aims to significantly advance the state-of-the-art by developing the theory of and practical approaches to inductive querying (constraint-based mining) of global models, as well as approaches to answering complex inductive queries that involve both local patterns and global models. Based on these, showcase applications/IDBs in the area of bioinformatics will be developed, where users will be able to query data about drug activity, gene expression, gene function and protein sequences, as well as frequent patterns (e.g., subsequences in proteins) and predictive models (e.g., for drug activity or gene function).

    Researcher(s)

    Research team(s)

    Project type(s)

    • Research Project