COMP 565 Fall 2021: Machine Learning in genomics and healthcare (4 credits)
Course Overview
Genetics is instrumental in understanding complex human phenotypes ranging from human heights to common diseases and cancers. Large-scale molecular and phenotypic profiling technologies provide exciting opportunities for conducting genetic research in a data driven way, thereby linking common diseases to novel phenotypes and novel mutations via the lense of regulatory genomics. Meanwhile, there are tremendous opportunities for methdological innovations using statistical and machine learning approaches to address some of the most important problems in genetics and healthcare that were not possible until recently.
In this topic course, we will gain a broad perspective on the current fields of computational biology with primary focus on the data-driven scalable approaches for genome-wide data and model interpretability. In particular, we will explore in-depth some of the recently developed and crucial computational methods conducted in large-scale statistical genetic analysis, multi-omics analysis, and electronic health records data mining.
Class location and time:
- Location: McConnell Engineering 12
- Time: Tuesday, Thursday 8:35-9:55
- Duration: September 2 - December 9, 2021
Class Format
There are participation mark taken at each lecture based on the questions and discussion.
Each student needs to write a 2-page review of five research papers chosen by the instructor based on the topics discussed in class.
There are five assignments. In each assignment, students will derive and/or implement the key components of some of the algorithms discussed in class and use them to analyze real or simulated dataset.
Students will be working on a course project on their own. Provided this is a research-oriented course, each student will need to come up with a suitable project based on the research topics discussed in class upon approval of the instructor. The last few lectures of the class will mainly consist of students' project presentations.
Prerequisite courses:
- Biology: BIOL 202 Basic Genetics
- Statistics: MATH 324 Statistics
- Machine learning: COMP 441 or COMP 551
- Programming language: Python or R
Instructor
Yue Li <yue[dot]yl[dot]li[at]mcgill[dot]ca>Teaching Assistant
Wenmin Zhang <wenmin[dot]zhang[at]mail[dot]mcgill[dot]ca>Evaluation
- Class participation (10%)
- Paper review (10%)
- Assignments (35%)
- Final Project proposal (5%)
- Final Project presentation (10%)
- Final project report (30%)
Relevant Textbooks
- Pattern recognition and Machine Learning by Christopher Bishop
- Machine Learning by Kevin Murphy
- No need to purchase. Relevant contents will be available on the course website.
Course Syllabus (Tentative):
- Statistical genetic approaches (15 hrs, 5 weeks):
- Bayesian polygenic risk score models
- Causal inference of genetic variants
- Heritability estimation
- Bayesian multi-trait (multi-class) modeling
- Multi-omic learning (3 hrs, 1 week)
- Latent factor analysis of genome-wide multi-omic patient data
- Deep learning in regulatory network modeling
- Computational approaches in single-cell analysis (6 hrs, 2 weeks):
- Deep autoencoding approaches for novel cell discoveries
- Representation learning of single-cell transcriptomes
- Cell type deconvolution
- Mining big data in healthcare (9 hrs, 3 weeks)
- Phenome-wide association studies
- Learning disease topics by latent topic models
- Temporal neural network approaches to model longitudinal EHR
- Graph representational learning of biomedical knowledge graphs