Molecular structure mining

Datum: 12 juni 2018

Locatie: Campus Middelheim, G0.10 - Middelheimlaan 1 - 2020 Antwerpen (route: UAntwerpen, Campus Middelheim)

Tijdstip: 16 uur

Organisatie / co-organisatie: Departement Wiskunde-Informatica

Promovendus: Aida Mrzic

Promotor: Kris Laukens & Bart Goethals

Korte beschrijving: Doctoraatsverdediging Aida Mrzic - Faculteit Wetenschappen, Departement Wiskunde-Informatica


Metabolites are small biomolecules and the discipline that studies them is called metabolomics. Despite the increasing importance of metabolomics approaches, with highly relevant applications in drug and biomarker discovery, the structural elucidation of metabolites remains a challenge. The traditional way to identify an observed metabolite is through mass spectrometry, an advanced analytical technique that results in collection of unknown mass spectra to which molecular structures need to be assigned. The most common ways to do this is through searching in spectral libraries and molecular structural databases.

Both approaches have a common limitation: they can only identify molecules that are present in the used database and therefore only a somewhat limited number of known unknowns can be effectively identified in this manner. Therefore, the identification of compounds that have not been previously seen remains challenging. Since similar metabolites should result in similar mass spectra, it is logical to presume that metabolites containing the same substructures should have partially overlapping spectral properties.

In this thesis, we have explored this idea with the aim of improving the partial identification of unknown unknowns, i.e. metabolites not present in any of the existing molecular databases. We delved into the relationships between the spectral data and molecular substructures, mostly using pattern mining and pattern mining-inspired techniques. We developed several methods that can provide a (partial) identification of unknown unknowns based on fragmentation data as well as the method capable of finding correct substructure relationships for the drug-protein interactions.