COMP-579 : Reinforcement Learning

-->

Date	Topic	Materials
January 6	Introduction to reinforcement learning. Bandit algorithms	RL book, chapter 1. Intro slides
January 8	Bandits: definition of multi-armed bandit, epsilon-greedy exploration, optimism, UCB.	RL book, Sec. 2.1-2.7 Bandit slides
January 13	Bandits: regret definition and analysis for epsilon-gredy and UCB, gradient-based bandits	RL book, chapter 2 Assignment 1 posted Bandit slides
January 15	Wrap up of bandits: Gradient-based bandits, Thompson sampling. Markov Decision Processes (MDPs)	RL book, chapter 2 and 3.1 Bandit slides
January 20	Value functions. Bellman equations, policy evaluation. Policy iteration. Value iteration	RL book, chapter 3.2-4.1 Slides
January 22	More on dynamic programming: policy iteration, value iteration, contractions.	RL book, Chapter 4.1-4.8 Slides
January 27	Policy evaluation using Monte-Carlo Methods and Temporal-Difference	RL book, Sec. 5.1 5.2 6.1 6.2 Slides
January 29	Learning Control using Monte Carlo and TD, including SARSA	RL book, Sec. 5.3 5.4 5.6 5.7 6.3 6.4 7.1 7.2 7.3 7.5 Assignment 1 due Assignment 2 posted Slides
February 3	Q-learning	RL book, Sec. 6.5-6.7 9.1-9.3 Q-Learning slides
February 5	Function Value Approximation, DQN, Eligibility Traces	RL book Sec. 10.2 10.5 16.5 12.1 12.2 12.4 12.5 Slides David Silver's lecture on RL with function approximation
February 10	More on Eligibility Trace and TD(λ)	RL book chapter 12 Slides
February 12	Plannning and model-based RL	Slides RL book chapter 8
February 17	Deep model-based RL and planning	RL book end of chapter 8, PlaNet Paper, Dreamer Paper, MuZero Paper Slides Assignment 2 due Project information posted
February 19	Policy-gradient methods: Policy Gradient Theorem and REINFORCE	RL book chapter 13.7 13.1-13.3 Slides
February 24	Policy-gradient methods: Actor-critic	RL book chapter 13.4 13.5 13.6 Slides Assignment 3 posted
February 26	Policy-gradient methods: Deterministic Policy Gradient, DDPG	DPG paper, DDPG paper, Slides
March 3	Study break
March 5	Study break
March 10	TRPO, PPO, Review	TRPO paper, PPO paper Slides
March 12	Hierachical RL	Slides Options paper Option-critic architecture Project proposal due
March 17	Wrap-up of HRL. Off-policy RL	Assignment 3 due Slides on HRL Slides on off-policy learning
March 19	Offline and batch RL	Slides Offline RL tutorial (Levine, Kumar, Tucker and Fu, 2020)
March 24	Where do rewards come from? Inverse RL. Learning from Preferences	Inverse RL and Preference-based learning Inverse RL survey
March 26	Large Language Models (LLMs) and Reinforcement Learning from Human Feedback (RLHF)	Slides (more info to be posted)
March 30	More on RL for LLMs	Slides (more info to be posted)
April 2	Alignment for RLHF. Continual RL	Slides
April 7	Never-ending / continual RL	Slides Continual RL survey
April 9	RL applications	Slides. If we have time: Reward is enough

Schedule