COMP-579 : Reinforcement Learning

-->

Date	Topic	Materials
January 4	Introduction to reinforcement learning. Bandit algorithms	RL book, chapter 1. Intro slides
January 9	Bandits: definition of multi-armed bandit, epsilon-greedy exploration, optimism, UCB.	RL book, Sec. 2.1-2.7 Bandit slides
January 11	Bandits: regret definition and analysis for epsilon-gredy and UCB, gradient-based bandits	RL book, chapter 2 Assignment 1 posted
January 16	Wrap up of bandits: Gradient-based bandits, Thompson sampling. Markov Decision Processes (MDPs)	RL book, chapter 2 and 3.1
January 18	Value functions. Bellman equations, policy evaluation. Policy iteration. Value iteration	RL book, chapter 3.2-4.1 Slides
January 23	More on dynamic programming: policy iteration, value iteration, contractions.	RL book, Chapter 4.1-4.8 Slides
January 25	Policy evaluation using Monte-Carlo Methods and Temporal-Difference	RL book, Sec. 5.1 5.2 6.1 6.2 Slides
January 30	Learning Control using Monte Carlo and TD, including SARSA	RL book, Sec. 5.3 5.4 5.6 5.7 6.3 6.4 7.1 7.2 7.3 7.5 Assignment 1 due Assignment 2 posted Slides
February 1	Q-learning	RL book, Sec. 6.5-6.7 9.1-9.3 Q-Learning slides
February 6	Function Value Approximation, DQN, Eligibility Traces	RL book Sec. 10.2 10.5 16.5 12.1 12.2 12.4 12.5 Slides David Silver's lecture on RL with function approximation
February 8	More on Eligibility Trace and TD(λ)	RL book chapter 12 Slides
February 13	Plannning and model-based RL	Slides RL book chapter 8
February 15	Deep model-based RL and planning	RL book end of chapter 8, PlaNet Paper, Dreamer Paper, MuZero Paper Slides Assignment 2 due Project information posted
February 20	Policy-gradient methods: Policy Gradient Theorem and REINFORCE	RL book chapter 13.7 13.1-13.3 Slides
February 22	Policy-gradient methods: Actor-critic	RL book chapter 13.4 13.5 13.6 Slides Assignment 3 posted
February 27	Policy-gradient methods: Deterministic Policy Gradient, DDPG, TRPO, PPO	DPG paper, DDPG paper, , TRPO paper, PPO paper Slides
February 29	Review	Slides
March 5	Study break
March 7	Study break
March 12	Hierachical RL	Slides Options paper Option-critic architecture
March 14	More on hierarchical RL
March 19	Wrap-up of HRL. Off-policy RL	Assignement 3 due Slides on HRL Slides on off-policy learning
March 21	Offline and batch RL	Slides
March 26	Where do rewards come from? Inverse RL. Learning from Preferences	Inverse RL slides (with thanks to Pieter Abeell) Preferences-based learning. RL from human feedback
March 28	Where do rewards come from? Learning from preferences and human feedback	Slides (more info to be posted)
April 2	RL from Human Feedback in LLMs	Slides
April 4	Never-ending / continual RL	Slides Continual RL survey
April 9	Slides

Schedule