Research team

Molecular Discovery in Untargeted Metabolomics through Advanced Data Science and Machine Learning 01/11/2025 - 31/10/2029

Abstract

Although our capacity for molecular discovery from biological samples via untargeted small molecule mass spectrometry (MS) has profoundly advanced over the past decades, the field still grapples with a fundamental challenge: the vast majority of MS/MS spectra remain unannotated, significantly limiting the amount of insights these studies can generate. To address this gap, my research envisions a paradigm shift from conventional heuristic-driven analysis to a robust, data-driven approach, capable of unveiling novel molecular insights from MS data. To this end, I propose a three-pronged approach to enhance MS data interpretation. First, I will develop a novel spectral library searching framework that leverages target–decoy strategies and semi-supervised machine learning to improve annotation sensitivity and confidence. Second, I will address the challenge of chimeric spectra by creating a deep learning-based deconvolution framework, enabling accurate resolution of overlapping isotopic envelopes. Third, I will design an AI-driven repository-scale molecular networking approach to uncover previously uncharacterized molecular analogs, expanding our capacity for small molecule discovery. By unlocking the wealth of unannotated MS data, this project will provide important advances for biomedical and environmental research, empowering the scientific community with next-generation tools for molecular discovery.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Bioinformatics and machine learning for large-scale metabolomics data analysis. 01/12/2022 - 30/11/2026

Abstract

Despite recent breakthroughs in artificial intelligence (AI) that have led to disruptive advances across many scientific domains, there are still challenges in adopting state-of-the-art AI techniques in the life sciences. Notably, analysis of small molecule untargeted mass spectrometry (MS) data is still based on expert knowledge and manually compiled rules, and each experiment is analyzed in isolation without taking into account prior knowledge. Instead, this project will develop more powerful approaches in which untargeted MS data is interpreted within the context of the vast background of previously generated, publicly available data. The research hypothesis driving the proposed project is that advanced AI techniques can uncover hidden knowledge from large amounts of open MS data in public repositories to gain a deeper understanding into the molecular composition of complex biological samples. We will develop machine learning solutions to explore the observed molecular universe and build a comprehensive small molecule knowledge base. These ambitious goals build on our unique expertise in both AI and MS to create next-generation, data-driven software solutions for molecular discovery from untargeted MS data.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project