Date | Topic | Materials | |
January 4 | Introduction to reinforcement learning. Bandit algorithms |
RL book, chapter 1. Intro slides |
|
January 9 | Bandits: definition of multi-armed bandit, epsilon-greedy exploration, optimism, UCB. |
RL book, Sec. 2.1-2.7 Bandit slides | |
January 11 | Bandits: regret definition and analysis for epsilon-gredy and UCB, gradient-based bandits |
RL book, Sec. 2.8 David Silver slides on regret analysis Gradient bandit slides Assignment 1 posted |
|
January 16 | Wrap up of bandits: Gradient-based bandits, Thompson sampling. Markov Decision Processes. Policies and Returns | RL book, chapter 3 Slides with thanks to David Silver, Qing Wang for some slides. Tutorial on Thompson Sampling, Sec 3. |
|
January 18 | More on MDPs. Value functions. Bellman equations, policy evaluation. Policy iteration. Value iteration | RL book, chapter 4. Slides | |
January 23 | More on dynamic programming: policy iteration, value iteration, contractions. Policy evaluation using Monte-Carlo Methods and Temporal-Difference learning | RL book, Chapter 4, Sec 5.1, 6.1, 6.2, 6.3 Slides from last time, zoom white board (link to lecture to be added in MyCourses), Slides |
|
January 25 | More on TD. Control using Monte Carlo and TD, including SARSA, Q-learning if we have time) | RL book, Sec. 5.3, 5.4, 6.4, 6.5, 7.1 Assignment 1 due tomorrow; Slides |
|
January 30 | More on control and convergence results | RL book, chapter 7 Slides |
|
February 1 | More on value-based RL, function approximation | RL book, chapter 8 |
|
February 6 | More on value-based RL with function approximation |
RL book Sec. 9.1-9.4 |
|
February 8 | No lecture |
RL book chapter 9 | |
February 13 | Plannning and model-based RL | Slides
RL book chapter 8 |
|
February 15 | Policy gradient |
RL book Chapter 10 Slides with thanks to Hado Van Hasselt | |
February 20 | More on policy gradient |
No in-oerson lecture, watch John Schulman's tutorial Assignment 2 (due March 10) RL book Chapter 13 | |
February 22 | Wrap-up of policy gradient. Hierarchical Reinforcement Learning |
Policy gradient slides, HRL slides |
|
February 27 | Study break | ||
March 1 | Study break | ||
March 6 | More on hierarchical RL |
Slides Option-critic paper Termination-critic paper |
|
March 8 | No class |
|
March 13 | More on hierarchical RL |
Slides and reading to be posted Assignment 2 due Project description |
March 15 | Offline and Batch RL |
Slides (with many thanks to Sergei Levine and Emma Brunskill) Offline RL tutorial (Levine, Kumar, Tucker and Fu, 2020) |
|
March 20 | More on offline and batch RL |
BCQ slides; Sergei Levine tutorial slides |
|
March 22 | Wrap-up of offline/batch RL | Assignment 3 posted | |
March 27 | Special topics: Rewards | Slides Part 1, Part 2 Reward is enough paper On the expressivity of Markov reward paper |
|
March 29 | Special topics: Meta RL | Video lecture by Chelsea Finn Tutorials on meta-learning and meta-RL by Lillian Weng MAML paper |
|
April 3 | Never-ending / continual RL | Slides | |
April 5 | Special topics: TBD | ||
April 12 | Wrap-up: Thoughts on RL for AI | Project due |