COMP 565 Fall 2023: ML in genomics and healthcare

Course Overview
Genetics is instrumental in understanding complex human phenotypes ranging from human heights to complex diseases. Large-scale molecular and phenotypic profiling technologies provide exciting opportunities and unique challenges for conducting research in data-driven way, thereby linking common diseases to novel phenotypes and novel mutations via the lense of regulatory genomics.
The rapid advance in machine learning research and computing technologies provide new ways to address some of the most important problems in genomics and healthcare. In this topic course, we will gain a broad perspective on the current fields of computational genomics with primary focus on using ML approaches for modeling genomic and healthcare data. We will explore in-depth some of the recently developed and crucial computational methods for modeling big data in life science and health.
Class location and time:
- Location: Maass Chemistry Building 328
- Time: Monday, Wednesday 4:05-5:25
- Duration: Aug. 30 - Dec. 4, 2023
Class Format
There are participation mark taken at each lecture based on the questions and discussion.
There are 10 quizzes. Typically released at the end of every week and due the end of following week.
Each student needs to write a 2-page review of five research papers chosen by the instructor based on the topics discussed in class.
There are five assignments. In each assignment, students will derive and/or implement the key components of some of the algorithms discussed in class and use them to analyze real or simulated dataset.
Students will be working on a course project in a small group. There will be a few default projects provided by the instructor. Provided this is a research-oriented course, each student can also come up with a suitable project based on the research topics discussed in class upon approval of the instructor.
Prerequisite courses:
- Biology: BIOL 202 Basic Genetics
- Statistics: MATH 324 Statistics
- Machine learning: COMP 441 or COMP 551
- A programming course in Python or R (e.g., COMP 202/205/204)
Recommended courses:
- Biology: BIOL 202 Basic Genetics
- Statistics: MATH 324 Statistics
Instructor
Yue Li, Office hours: 11 am every Friday in Trottier Bldg 3105 or Zoom.Teaching Assistant
Liam HodgsonEvaluation
- Class participation (10%)
- Quizzes (10%)
- Paper review (10%)
- Assignments (40%)
- Final project report (30%)
Relevant Textbooks
- Pattern recognition and Machine Learning by Christopher Bishop
- Machine Learning by Kevin Murphy
- No need to purchase. Relevant contents will be available on the course website.
Course Syllabus (Tentative):
- Statistical genetic approaches (9 hrs, 3 weeks):
- Heritability estimation
- Bayesian polygenic risk score models
- Causal inference of genetic variants
- Genomics and multi-omic learning (9 hrs, 3 week)
- Convolutional neural network for modeling genomic sequence data
- Multi-view matrix factorization of multi-omic data
- Graph-attention-based model for regulatory network modeling
- Computational approaches in single-cell analysis (6 hrs, 2 weeks):
- Unsupervised learning for novel cell discoveries
- Representation learning of single-cell transcriptomes
- Cell-type deconvolution
- Mining big data in healthcare (9 hrs, 3 weeks)
- Learning disease topics by latent topic models
- Sequential neural network approaches to model longitudinal EHR
- Graph representational learning of biomedical knowledge graphs