Frequent pattern discovery for integrated omics data
5 July 2016
UAntwerp, Campus Middelheim, A.143 - Middelheimlaan 1 - 220 Antwerpen
Kris Laukens & Wim Vanden Berghe
PhD defence Stefan Naulaerts - Faculty of Science
Data mining has been an integral part of the current scientific workflows for several decades. However, several techniques have remained underutilised on real-life applications in which they could be of particular use. One of these cases forms the centerpoint of this doctoral thesis. Here, we focus on frequent itemset mining. This set of techniques allows researchers to rapidly generate large numbers of patterns that correspond to what has proven of high value in many biological scenarios: correlations. Moreover, frequent itemset mining is independent of the 'omics' level to which it is applied and these levels can be combined into one analysis to identify cross-omics patterns.
As such, we first explored how frequent itemsets could be applied to biological datasets, how they could be interpreted in relation to existing evidence that is traditionally used (ontologies, interaction networks) in correlation analysis and how they can be made interpretable with visualizations intuitive to life scientists. Next, we applied frequent itemset mining to draft lists of co-occuring biological elements in an unprecedented meta-analysis. We took the step to using frequent itemsets in the well-studied challenges of cancer bioinformatics and were succesful in demonstrating cancer subtypes and linking these to patient survival. Finally, we were able to use these techniques for drug compound characterization.