Date Topic Materials
January 4 Introduction to reinforcement learning. Bandit algorithms RL book, chapter 1.
Intro slides
January 9 Bandits: definition of multi-armed bandit, epsilon-greedy exploration, optimism, UCB. RL book, Sec. 2.1-2.7
Bandit slides
January 11 Bandits: regret definition and analysis for epsilon-gredy and UCB, gradient-based bandits
RL book, Sec. 2.8
David Silver slides on regret analysis
Gradient bandit slides
Assignment 1 posted
January 16 Wrap up of bandits: Gradient-based bandits, Thompson sampling. Markov Decision Processes. Policies and Returns RL book, chapter 3
Slides with thanks to David Silver, Qing Wang for some slides.
Tutorial on Thompson Sampling, Sec 3.
January 18 More on MDPs. Value functions. Bellman equations, policy evaluation. Policy iteration. Value iteration RL book, chapter 4.
Slides
January 23 More on dynamic programming: policy iteration, value iteration, contractions. Policy evaluation using Monte-Carlo Methods and Temporal-Difference learning RL book, Chapter 4, Sec 5.1, 6.1, 6.2, 6.3
Slides from last time, zoom white board (link to lecture to be added in MyCourses), Slides
January 25 More on TD. Control using Monte Carlo and TD, including SARSA, Q-learning if we have time) RL book, Sec. 5.3, 5.4, 6.4, 6.5, 7.1
Assignment 1 due tomorrow;
Slides
January 30 More on control and convergence results RL book, chapter 7
Slides
February 1 More on value-based RL, function approximation RL book, chapter 8
February 6 More on value-based RL with function approximation RL book Sec. 9.1-9.4
February 8 No lecture RL book chapter 9
 
February 13 Plannning and model-based RL Slides RL book chapter 8
February 15 Policy gradient RL book Chapter 10
Slides with thanks to Hado Van Hasselt
February 20 More on policy gradient No in-oerson lecture, watch John Schulman's tutorial
Assignment 2 (due March 10) RL book Chapter 13
February 22 Wrap-up of policy gradient. Hierarchical Reinforcement Learning Policy gradient slides, HRL slides
February 27 Study break
March 1 Study break
March 6 More on hierarchical RL Slides
Option-critic paper
Termination-critic paper
March 8 No class
March 13 More on hierarchical RL Slides and reading to be posted
Assignment 2 due
Project description
March 15 Offline and Batch RL Slides (with many thanks to Sergei Levine and Emma Brunskill)
Offline RL tutorial (Levine, Kumar, Tucker and Fu, 2020)
March 20 More on offline and batch RL BCQ slides; Sergei Levine tutorial slides
March 22 Wrap-up of offline/batch RL Assignment 3 posted
March 27 Special topics: Rewards Slides Part 1, Part 2
Reward is enough paper
On the expressivity of Markov reward paper
March 29 Special topics: Meta RL Video lecture by Chelsea Finn
Tutorials on meta-learning and meta-RL by Lillian Weng
MAML paper
April 3 Never-ending / continual RL Slides
April 5 Special topics: TBD
April 12 Wrap-up: Thoughts on RL for AI Project due