COMP-579 : Reinforcement Learning

Date	Topic	Materials
January 4	Introduction to reinforcement learning. Bandit algorithms	RL book, chapter 1. Intro slides
January 9	Bandits: definition of multi-armed bandit, epsilon-greedy exploration, optimism, UCB.	RL book, Sec. 2.1-2.7 Bandit slides
January 11	Bandits: regret definition and analysis for epsilon-gredy and UCB, gradient-based bandits	RL book, Sec. 2.8 David Silver slides on regret analysis Gradient bandit slides Assignment 1 posted
January 16	Wrap up of bandits: Gradient-based bandits, Thompson sampling. Markov Decision Processes. Policies and Returns	RL book, chapter 3 Slides with thanks to David Silver, Qing Wang for some slides. Tutorial on Thompson Sampling, Sec 3.
January 18	More on MDPs. Value functions. Bellman equations, policy evaluation. Policy iteration. Value iteration	RL book, chapter 4. Slides
January 23	More on dynamic programming: policy iteration, value iteration, contractions. Policy evaluation using Monte-Carlo Methods and Temporal-Difference learning	RL book, Chapter 4, Sec 5.1, 6.1, 6.2, 6.3 Slides from last time, zoom white board (link to lecture to be added in MyCourses), Slides
January 25	More on TD. Control using Monte Carlo and TD, including SARSA, Q-learning if we have time)	RL book, Sec. 5.3, 5.4, 6.4, 6.5, 7.1 Assignment 1 due tomorrow; Slides
January 30	More on control and convergence results	RL book, chapter 7 Slides
February 1	More on value-based RL, function approximation	RL book, chapter 8
February 6	More on value-based RL with function approximation	RL book Sec. 9.1-9.4
February 8	No lecture	RL book chapter 9
February 13	Plannning and model-based RL	Slides RL book chapter 8
February 15	Policy gradient	RL book Chapter 10 Slides with thanks to Hado Van Hasselt
February 20	More on policy gradient	No in-oerson lecture, watch John Schulman's tutorial Assignment 2 (due March 10) RL book Chapter 13
February 22	Wrap-up of policy gradient. Hierarchical Reinforcement Learning	Policy gradient slides, HRL slides
February 27	Study break
March 1	Study break
March 6	More on hierarchical RL	Slides Option-critic paper Termination-critic paper
March 8	No class
March 13	More on hierarchical RL	Slides and reading to be posted Assignment 2 due Project description
March 15	Offline and Batch RL	Slides (with many thanks to Sergei Levine and Emma Brunskill) Offline RL tutorial (Levine, Kumar, Tucker and Fu, 2020)
March 20	More on offline and batch RL	BCQ slides; Sergei Levine tutorial slides
March 22	Wrap-up of offline/batch RL	Assignment 3 posted
March 27	Special topics: Rewards	Slides Part 1, Part 2 Reward is enough paper On the expressivity of Markov reward paper
March 29	Special topics: Meta RL	Video lecture by Chelsea Finn Tutorials on meta-learning and meta-RL by Lillian Weng MAML paper
April 3	Never-ending / continual RL	Slides
April 5	Special topics: TBD
April 12	Wrap-up: Thoughts on RL for AI	Project due

Schedule