Course Code : | 2052FBDBMW |

Study domain: | Biomedical Sciences |

Academic year: | 2019-2020 |

Semester: | 2nd semester |

Contact hours: | 32 |

Credits: | 3 |

Study load (hours): | 84 |

Contract restrictions: | No contract restriction |

Language of instruction: | English |

Exam period: | exam in the 2nd semester |

Lecturer(s) | Erik Fransen Kris Laukens |

At the start of this course the student should have acquired the following competences:

an active knowledge of

specific prerequisites for this course

an active knowledge of

- English

- general knowledge of the use of a PC and the Internet

specific prerequisites for this course

The student should master the elementary techniques for statistical data analysis, as thaught in the Biostatistics course (1041FBDBMW).

- The student gains insights in various data types and their associated challenges, in both molecular biology (various 'omics data types) and biomedical sciences
- The student will understand how and which computational techniques can be used to address common challenges in molecular and biomedical data analysis.
- The student will understand the underlying principles of a selection of computational techniques for biomedical data mining.
- The student will be able to select the appropriate technique for a given problem.
- The student will be able to interpret the results of typical data mining task.
- At the end of the practical course, the student will be able to generate programming code for the statistical analysis of data problems using the program R. Solving the assignments not only requires the techniques highlighted during the practicals, but also techniques for which the student needs to search the built-in help functions of R and the internet.

This course offers an introduction to the advanced computational analysis of complex and / or large biomedical datasets. The course addresses the foundations of the partially overlapping fields of multivariate statistics and data mining, both from a theoretical perspective as from an applied and practical hands-on point of view. The course provides an extension to earlier courses on bioinformatics and univariate statistics and addresses following topics:

**THEORY: **

*I. introduction to different data types and data mining problems*

- A formal overview of different data types in biology and medicine: quantitative data (e.g. coming from ‘omics' platforms), string data (mainly DNA and protein sequences), text, graph data (biological networks), image data
- An introduction to the challenges of data mining and machine learning.

*II. Overview of data mining techniques*

- Introduction: preprocessing and basic exploratory analysis (univariate statistics) of quantitative data: a revision of statistical concepts (only a revision in the context of the course).
- Unsupervised learning: clustering, PCA
- An introduction to classification methods: overview of classification systems, model validation (e.g. different cross-validation techniques)
- Biomedical feature selection and dimensionality reduction
- Supervised learning techniques (a solid introduction to commonly used techniques and algorithms): regression techniques, discriminant analysis, support vector machines, random forests, ensemble classifiers, decision trees, neural networks, naive Bayes, association rule mining
- Biomedical text mining
- Visual data mining

*III. Biomedical data mining applications*

In a number of case studies, and through real research results it will be shown how these techniques can be employed to extract novel insights from biomedical data. These lectures should cover diverse data types (e.g. quantitative molecular data, molecular sequences, molecular interactions, ontologies, text, physiological measurements, patient meta-data, …) and several of the techniques addressed above.

**PRACTICE:**

The practical part will familiarize the students with the statistical programming language R. In the first place, students should be able to correctly read in a dataset, generate graphs and perform elementary data-manipulations. Subsequently, some techniques for statistical data-analysis (linear regression, ANOVA, multivariate techniques,...) are illustrated, whereby the students should be able to use the help files and search the internet for the code to solve a particular problem. In the end, programming techniques including for-loops and custom-made functions will be illustrated to facilitate repetitive analyses.

The course has an international dimension.

Class contact teachingLectures Laboratory sessions

Personal workAssignments Individually

Directed self-study

ProjectIndividually

Personal work

Directed self-study

Project

ExaminationWritten examination without oral presentation Written exam: electronic Open book Multiple-choice Open-question

Continuous assessment(Interim) tests

Project

Continuous assessment

Project

Handouts of the theoretical course and practical instructions will be provided by the lecturers.

“Introductory Statistics with R” by Peter Dalgaard. (Springer, New York, USA)

ISBN : 0-387-95475-9

“Statistics. An introduction using R” by Michael J. Crawley (Wiley, Chichester, UK)

ISBN : 0-470-02298-1

“R Graphics” by Paul Murrell (Chapman & Hall, Boca Raton, USA)

ISBN : 1-58488-486-X

Interesting articles for further reading are distributed by the lecturers or cited in the handouts.

erik.fransen@uantwerpen.be

kris.laukens@uantwerpen.be

pieter.meysman@uantwerpen.be