COMP 565 Fall 2024: ML in genomics and healthcare

Course Overview
Genetics is instrumental in understanding complex human phenotypes ranging from human heights to complex diseases. Large-scale molecular and phenotypic profiling technologies provide exciting opportunities and unique challenges for conducting research in data-driven way, thereby linking common diseases to novel phenotypes and novel mutations via the lense of regulatory genomics.
The rapid advance in machine learning research and computing technologies provide new ways to address some of the most important problems in genomics and healthcare. In this topic course, we will gain a broad perspective on the current fields of computational genomics with primary focus on using ML approaches for modeling genomic and healthcare data. We will explore in-depth some of the recently developed and crucial computational methods for modeling big data in life science and health.
Class location and time:
- Location: BIRKS Building 205
- Time: Tuesday, Thursday 8:35-9:55
- Duration: Aug. 29 - Dec. 3, 2024
Class Format
There are participation mark taken at each lecture based on the questions and discussion.
There are 10 quizzes. Typically released at the end of every week and due the end of following week.
Each student needs to write a 2-page review of five research papers chosen by the instructor based on the topics discussed in class.
There are five assignments. In each assignment, students will derive and/or implement the key components of some of the algorithms discussed in class and use them to analyze real or simulated dataset.
Students will be working on a course project in a small group. There will be a few default projects provided by the instructor. Provided this is a research-oriented course, each student can also come up with a suitable project based on the research topics discussed in class upon approval of the instructor.
Prerequisite courses:
- Biology: BIOL 202 Basic Genetics
- Statistics: MATH 324 Statistics
- Machine learning: COMP 441 or COMP 551
- A programming course in Python or R (e.g., COMP 202/205/204)
Recommended courses:
- Biology: BIOL 202 Basic Genetics
- Statistics: MATH 324 Statistics
Instructor
Yue Li, Office hours TBDTeaching Assistant
Liam HodgsonEvaluation
- Class participation (10%)
- Quizzes (10%)
- Paper review (10%)
- Assignments (40%)
- Final project report (30%)
Relevant Textbooks
- Pattern recognition and Machine Learning by Christopher Bishop
- Machine Learning by Kevin Murphy
- No need to purchase. Relevant contents will be available on the course website.
Course Syllabus (Tentative):
- Genotype-to-phenotype prediction (9 hrs, 1 week):
- Bayesian polygenic risk score models
- Heritability estimation
- Bayesian fine-mapping of genetic variants
- Explainable AI (3 hrs, 1 week)
- LIME and SHAP
- DeepLIFT
- Learning from genomic sequences (6 hrs 2 weeks)
- Convolutional neural network for modeling genomic sequence data
- Genomc foundation models
- Computational approaches in single-cell analysis (6 hrs, 2 weeks):
- Unsupervised learning for novel cell discoveries
- Representation learning of single-cell transcriptomes
- Cell-type deconvolution
- Mining big data in healthcare (9 hrs, 3 weeks)
- Learning disease topics by latent topic models
- Sequential neural network approaches to model longitudinal EHR
- Federated learning in EHR
- Graph representational learning of biomedical knowledge graphs