Research team

AI-Driven Metadata Annotation and Quality Control for Reproducible Mass Spectrometry-Based Omics Research. 01/11/2025 - 31/10/2026

Abstract

Mass spectrometry (MS) is a cornerstone technology in proteomics and metabolomics, generating vast amounts of data that hold immense potential for discovery. However, widespread data reuse remains hindered by incomplete metadata and inconsistent quality control (QC) practices, limiting the ability of researchers to locate, compare, and integrate datasets effectively. I will address these critical barriers by developing advanced bioinformatics and machine learning solutions for automated metadata extraction and transparent QC assessment in MS-based omics. First, I will design automated workflows to systematically extract and harmonize metadata from raw MS data and scientific literature. These tools will integrate with community-driven formats and repositories such as SDRF-Proteomics and the PRIDE database, enabling structured annotation of both technical parameters and biological context for public MS data. Second, I will implement a standardized QC framework that provides both identification-free and identification-based performance metrics, allowing researchers to assess data reliability at a glance. A machine-learning-powered dashboard will further facilitate data selection by flagging anomalous datasets and highlighting quality trends. By improving metadata completeness and ensuring transparent QC, this project will unlock the full value of public MS datasets, accelerating secondary analyses, meta-studies, and AI-driven applications in MS-based omics.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project