Molecular structure mining

Date: 12 June 2018

Venue: Campus Middelheim, G0.10 - Middelheimlaan 1 - 2020 Antwerpen (route: UAntwerpen, Campus Middelheim)

Time: 4:00 PM

Organization / co-organization: Department of Mathematics & Computer Science

PhD candidate: Aida Mrzic

Principal investigator: Kris Laukens & Bart Goethals

Short description: PhD defence Aida Mrzic - Faculty of Science, Department of Mathematics and Computer Science


Metabolites are small biomolecules and the discipline that studies them is called metabolomics. Despite the increasing importance of metabolomics approaches, with highly relevant applications in drug and biomarker discovery, the structural elucidation of metabolites remains a challenge. The traditional way to identify an observed metabolite is through mass spectrometry, an advanced analytical technique that results in collection of unknown mass spectra to which molecular structures need to be assigned. The most common ways to do this is through searching in spectral libraries and molecular structural databases.

Both approaches have a common limitation: they can only identify molecules that are present in the used database and therefore only a somewhat limited number of known unknowns can be effectively identified in this manner. Therefore, the identification of compounds that have not been previously seen remains challenging. Since similar metabolites should result in similar mass spectra, it is logical to presume that metabolites containing the same substructures should have partially overlapping spectral properties.

In this thesis, we have explored this idea with the aim of improving the partial identification of unknown unknowns, i.e. metabolites not present in any of the existing molecular databases. We delved into the relationships between the spectral data and molecular substructures, mostly using pattern mining and pattern mining-inspired techniques. We developed several methods that can provide a (partial) identification of unknown unknowns based on fragmentation data as well as the method capable of finding correct substructure relationships for the drug-protein interactions.