Data mining

Course Code :2026FBDBIC
Study domain:Biochemistry
Academic year:2019-2020
Semester:1st semester
Contact hours:30
Study load (hours):112
Contract restrictions: No contract restriction
Language of instruction:Dutch
Exam period:exam in the 1st semester
Lecturer(s)Kris Laukens
Pieter Meysman

3. Course contents *

I. introduction to different data types and data mining problems

- A formal overview of different data types in biology and medicine: quantitative data (proteome, metabolome, mRNA abundances), string data (mainly DNA and protein sequences), text, graph data (biological networks), image data

- An Introduction to the challenges of data mining. What are patterns?

- An introduction to machine learning, including an overview of the different techniques addressed later in the course.

- An introduction to algorithmic complexity.

II. Overview of data mining techniques

1. Introduction: basic exploratory analysis (univariate statistics) of quantitative data: a revision of statistical concepts (only a revision in the context of the course, since this is supposed to be known by the students).

2. Unsupervised learning: clustering, PCA, Self organizing maps

3. An introduction to classification methods: overview of classification systems, model validation (e.g. different cross-validation techniques)

4. Biomedical feature selection and dimensionality reduction

5. Supervised learning techniques (a solid introduction to commonly used techniques and algorithms): regression techniques, discriminant analysis, support vector machines, random forests, ensemble classifiers, decision trees, neural networks, naive Bayes, association rule mining

6. Intelligent optimization techniques: Hill climbing, simulated annealing and evolutionary computation, swarm intelligence, DNA and protein computation

7. Biomedical text mining

8. Visual data mining

III. Biomedical data mining application case studies

In a series of lectures, bioinformatics and biomedical informatics researchers show through real research results how these techniques can be employed to extract novel insights from biomedical data. These lectures should cover diverse data types (e.g. quantitative molecular data, molecular sequences, molecular interactions, ontologies, text, physiological measurements, patient meta-data, …) and several of the techniques addressed above. Several possible speakers are available at the University. For some lectures, external speakers may be invited (may fit within biomina seminar series).

Some possible topics:

  • reverse engineering of regulatory molecular networks
  • biomarker discovery from genome, transcriptome, proteome and metabolome data
  • finding genetic defects through next generation DNA sequence analysis
  • pattern finding in NMR spectrometry and mass spectrometry data