Research team

Expertise

Dr. Bittremieux's research deals with developing advanced machine learning techniques to uncover novel knowledge from mass spectrometry-based proteomics and metabolomics data. While his current research mainly focuses on how deep learning can be used to analyze mass spectrometry data he is interested in a wide variety of bioinformatics problems. An important part of his work involves developing insights and computational approaches for quality control in biological mass spectrometry.

Bioinformatics network for proteomics and mass spectrometry 01/01/2024 - 31/12/2028

Abstract

Proteomics, the study of proteins and their functions, is a critical area in biology and medicine. With mass spectrometry (MS), researchers can analyze large amounts of proteomics samples, leading to valuable insights into complex biological processes. MS datasets require specialized data analysis techniques, which has led to the development of several powerful bioinformatics tools and pipelines for mass spectrometry-based proteomics. Nevertheless, the increasingly large volume and complex nature of MS-based proteomics data pose significant challenges that hinder progress in the field. To address these, there is a need for an open and collaborative approach to science. We have identified four key challenges that we will address through this Scientific Research Network (SRN): - Highly performant bioinformatics tools: As proteomics datasets grow in size, computational bottlenecks arise. Through this SRN, we will foster the development of highly performant and interoperable bioinformatics tools and workflows to process these datasets efficiently, enabling faster and more transparent analyses. - Machine learning integration: While machine learning holds great promise for proteomics data analysis, integrating it into practical workflows remains complex. Our SRN will work to bridge this gap, making machine learning techniques more accessible and seamlessly integrated into routine analyses. - Effective benchmarking: The diversity of analysis approaches makes it challenging to compare methods effectively. Our objective is to establish standardized benchmarking methods that allow researchers to systematically evaluate and improve their analysis pipelines. - Community building and educational resources: Proteomics data analysis requires specialized knowledge that is continuously evolving, making it difficult for young scientists and data science experts to enter the field. Our proposed SRN aims to build a supportive community for early-career researchers and create high-quality educational resources that facilitate the learning curve and provide accessible pathways for newcomers. With three research units in Flanders that are global leaders in MS-based proteomics, this SRN will make Flanders a focal point in the field of proteomics bioinformatics. Our collaboration with international partners will further enhance the visibility of Flemish research and contribute to a competitive position in the international research landscape, making the region attractive for ambitious and talented young researchers to work in. The six partnering research units have strong ties with the proteomics bioinformatics community within Europe and beyond, which we aim to maximally exploit to achieve our long-term goals. Indeed, instead of tackling these challenges alone, each of the six research units intends to take up a leading role in the wider research community to reach our objectives. Through this SRN, we will formalize the existing connections between the six partners and provide a clear collaborative vision and structure to drive progress and effectively mobilize the wider research community. The scope of our goals underscores the necessity of a community-scale effort. All six partners have taken up central roles in existing initiatives, such as the European Bioinformatics Community for Mass Spectrometry (EuBIC-MS), the Human Proteome Organization's Proteomics Standards Initiative (HUPO-PSI), the ELIXIR Life Science Infrastructure, and the Computational Mass Spectrometry (CompMS) interest group of the International Society for Computational Biology (ISCB), providing the critical mass of researchers required to achieve our goals.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Reference data-driven metabolomics to study the molecular composition of South African foods. 01/01/2024 - 31/12/2026

Abstract

Understanding the molecular composition of food is essential for studying its impact on human health. We have recently developed a new approach called reference data-driven metabolomics, which can perform diet readouts from untargeted metabolomics data. However, this approach currently lacks diverse and geographically representative reference data. To address this, we will expand our reference food molecular database to include indigenous and locally cultivated foods from South Africa, a region with rich cultural and culinary traditions and nutritional diversity, analyze their molecular composition using mass spectrometry, and integrate the data into the Global FoodOmics reference database. Additionally, we will develop user-friendly bioinformatics tools that simplify the data analysis process, making reference data-driven metabolomics accessible to researchers with diverse backgrounds, and study the molecular composition of indigenous South African foods. Through collaboration between South African universities and the University of Antwerp, we will combine expertise in analytical chemistry, bioinformatics, nutrition, and agricultural sciences to advance metabolomics research, expand scientific knowledge of South African diets, and provide evidence-based insights for improving nutrition and health in South African populations.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Computational mass spectrometry and artificial intelligence to unravel the immunopeptidome. 01/10/2023 - 30/09/2027

Abstract

The adaptive immune system is a crucial component of the immune response, providing specific defense against a wide range of pathogens and contributing to the development of immunological memory. Immunopeptidomics is a rapidly evolving field that uses mass spectrometry-based approaches to identify and quantify immunopeptides, which play a vital role in the recognition and elimination of infected or malignant cells by T cells. However, the annotation rate of immunopeptides from mass spectrometry data is currently severely limited, resulting in a significant loss of biological information. To overcome this challenge, we will develop specialized bioinformatics tools for analyzing mass spectrometry immunopeptidomics data. Specifically, we will develop an efficient and sensitive open modification search engine to identify immunopeptides that have undergone post-translational modifications. Furthermore, we will develop a deep learning-based de novo peptide sequencing approach optimized for the analysis of immunopeptidomics data. The tools developed in this project have the potential to significantly expand the amount of biological information that can be obtained from immunopeptidomics experiments, leading to transformational breakthroughs in the field.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Enabling mobile and data-driven pathogen monitoring through a paired nanopore squiggle–genome sequence database. 01/05/2023 - 31/12/2024

Abstract

Infectious disease monitoring is a global need, and the threat of existing and emerging pathogens poses a major challenge to public health. Nanopore sequencing is a revolutionary technology that enables portable sequencing and has shown its merit in the COVID-19 pandemic. This technology could enable existing laboratories that have no or limited infectious disease surveillance capacity to 'leapfrog' to sequencing-based pathogen monitoring. However, this potential hinges on the ability to operate in resource-limited settings, which is, to date, hindered by data storage and processing needs. The raw data, referred to as 'squiggles,' requires significant storage space and decoding it to DNA sequences requires graphical processing units (GPUs) that consume significant amounts of power. In this pandemic preparedness proof-of-concept project, we will build on advances from our IOF-SBO funded project LeapSEQ to remove significant hurdles to enable mobile and data-driven pathogen monitoring. These hurdles include: (1) a need for scalable storage solutions for squiggle data, (2) the lack of available pathogen data, and (3) improved computational solutions for interacting with squiggle data. We will tackle these problems by engineering and populating a proof-of-concept paired nanopore squiggle–genome sequence database using our portable LeapSEQ lab and by developing efficient data-driven algorithms for rapid pathogen monitoring. We will develop this database with strategic partners at ITM and UA and further explore LeapSEQ valorization potential in the context of global pathogen monitoring.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Artificial intelligence-powered knowledge base of the observed molecular universe. 01/12/2022 - 30/11/2027

Abstract

Despite recent breakthroughs in artificial intelligence (AI) that have led to disruptive advances across many scientific domains, there are still challenges in adopting state-of-the-art AI techniques in the life sciences. Notably, analysis of small molecule untargeted mass spectrometry (MS) data is still based on expert knowledge and manually compiled rules, and each experiment is analyzed in isolation without taking into account prior knowledge. Instead, this project will develop more powerful approaches in which untargeted MS data is interpreted within the context of the vast background of previously generated, publicly available data. The research hypothesis driving the proposed project is that advanced AI techniques can uncover hidden knowledge from large amounts of open MS data in public repositories to gain a deeper understanding into the molecular composition of complex biological samples. We will develop machine learning solutions to explore the observed molecular universe and build a comprehensive small molecule knowledge base. These ambitious goals build on our unique expertise in both AI and MS to create next-generation, data-driven software solutions for molecular discovery from untargeted MS data.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Bioinformatics and machine learning for large-scale metabolomics data analysis. 01/12/2022 - 30/11/2026

Abstract

Despite recent breakthroughs in artificial intelligence (AI) that have led to disruptive advances across many scientific domains, there are still challenges in adopting state-of-the-art AI techniques in the life sciences. Notably, analysis of small molecule untargeted mass spectrometry (MS) data is still based on expert knowledge and manually compiled rules, and each experiment is analyzed in isolation without taking into account prior knowledge. Instead, this project will develop more powerful approaches in which untargeted MS data is interpreted within the context of the vast background of previously generated, publicly available data. The research hypothesis driving the proposed project is that advanced AI techniques can uncover hidden knowledge from large amounts of open MS data in public repositories to gain a deeper understanding into the molecular composition of complex biological samples. We will develop machine learning solutions to explore the observed molecular universe and build a comprehensive small molecule knowledge base. These ambitious goals build on our unique expertise in both AI and MS to create next-generation, data-driven software solutions for molecular discovery from untargeted MS data.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Transferable deep learning for sequence based prediction of molecular interactions. 01/10/2019 - 30/09/2023

Abstract

Machine learning can be used to elucidate the presence or absence of interactions. In particular for life science research, the prediction of molecular interactions that underlie the mechanics of cells, pathogens and the immune system is a problem of great relevance. Here we aim to establish a fundamentally new technology that can predict unknown interaction graphs with models trained on the vast amount of molecular interaction data that is nowadays available thanks to high-throughput experimental techniques. This will be accomplished using a machine learning workflow that can learn the patterns in molecular sequences that underlie interactions. We will tackle this problem in a generalizable way using the latest generation of neural networks approaches by establishing a generic encoding for molecular sequences that can be readily translated to various biological problems. This encoding will be fed into an advanced deep neural network to model general molecular interactions, which can then be fine-tuned to highly specific use cases. The features that underlie the successful network will then be translated into novel visualisations to allow interpretation by biologists. We will assess the performance of this framework using both computationally simulated and real-life experimental sequence and interaction data from a diverse range of relevant use cases.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Intelligent quality control for mass spectrometry-based proteomics 01/10/2017 - 31/07/2021

Abstract

As mass spectrometry proteomics has matured over the past few years, a growing emphasis has been placed on quality control (QC), which is becoming a crucial factor to endorse the generated experimental results. Mass spectrometry is a highly complex technique, and because its results can be subject to significant variability, suitable QC is necessary to model the influence of this variability on experimental results. Nevertheless, extensive quality control procedures are currently lacking due to the absence of QC information alongside the experimental data and the high degree of difficulty in interpreting this complex information. For mass spectrometry proteomics to mature a systematic approach to quality control is essential. To this end we will first provide the technical infrastructure to generate QC metrics as an integral element of a mass spectrometry experiment. We will develop the qcML standard file format for mass spectrometry QC data and we will establish procedures to include detailed QC data alongside all data submissions to PRIDE, a leading public repository for proteomics data. Second, we will use this newly generated wealth of QC data to develop advanced machine learning techniques to uncover novel knowledge on the performance of a mass spectrometry experiment. This will make it possible to improve the experimental set-up, optimize the spectral acquisition, and increase the confidence in the generated results, massively empowering biological mass spectrometry.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project