-->
Date Topic Materials
January 4 Introduction to reinforcement learning. Bandit algorithms RL book, chapter 1.
Intro slides
January 9 Bandits: definition of multi-armed bandit, epsilon-greedy exploration, optimism, UCB. RL book, Sec. 2.1-2.7
Bandit slides
January 11 Bandits: regret definition and analysis for epsilon-gredy and UCB, gradient-based bandits
RL book, chapter 2
Assignment 1 posted
January 16 Wrap up of bandits: Gradient-based bandits, Thompson sampling.
Markov Decision Processes (MDPs)
RL book, chapter 2 and 3.1
January 18 Value functions. Bellman equations, policy evaluation. Policy iteration. Value iteration RL book, chapter 3.2-4.1
Slides
January 23 More on dynamic programming: policy iteration, value iteration, contractions. RL book, Chapter 4.1-4.8
Slides
January 25 Policy evaluation using Monte-Carlo Methods and Temporal-Difference RL book, Sec. 5.1 5.2 6.1 6.2

Slides
January 30 Learning Control using Monte Carlo and TD, including SARSA RL book, Sec. 5.3 5.4 5.6 5.7 6.3 6.4 7.1 7.2 7.3 7.5
Assignment 1 due
Assignment 2 posted
Slides
February 1 Q-learning RL book, Sec. 6.5-6.7 9.1-9.3
Q-Learning slides
February 6 Function Value Approximation, DQN, Eligibility Traces RL book Sec. 10.2 10.5 16.5 12.1 12.2 12.4 12.5
Slides
David Silver's lecture on RL with function approximation
February 8 More on Eligibility Trace and TD(λ) RL book chapter 12
Slides
February 13 Plannning and model-based RL Slides RL book chapter 8
February 15 Deep model-based RL and planning RL book end of chapter 8, PlaNet Paper, Dreamer Paper, MuZero Paper
Slides
Assignment 2 due
Project information posted
February 20 Policy-gradient methods: Policy Gradient Theorem and REINFORCE RL book chapter 13.7 13.1-13.3
Slides
February 22 Policy-gradient methods: Actor-critic RL book chapter 13.4 13.5 13.6
Slides
Assignment 3 posted
February 27 Policy-gradient methods: Deterministic Policy Gradient, DDPG, TRPO, PPO DPG paper, DDPG paper, , TRPO paper, PPO paper
Slides
February 29 Review Slides
March 5 Study break
March 7 Study break
March 12 Hierachical RL Slides
Options paper
Option-critic architecture

March 14 More on hierarchical RL
March 19 Wrap-up of HRL. Off-policy RL Assignement 3 due
Slides on HRL
Slides on off-policy learning
March 21 Offline and batch RL Slides
March 26 Where do rewards come from? Inverse RL. Learning from Preferences Inverse RL slides (with thanks to Pieter Abeell)
Preferences-based learning. RL from human feedback
March 28 Where do rewards come from? Learning from preferences and human feedback Slides (more info to be posted)
April 2 RL from Human Feedback in LLMs Slides
April 4 Never-ending / continual RL Slides
Continual RL survey
April 9 Slides