-->
Date Topic Materials
January 6 Introduction to reinforcement learning. Bandit algorithms RL book, chapter 1.
Intro slides
January 8 Bandits: definition of multi-armed bandit, epsilon-greedy exploration, optimism, UCB. RL book, Sec. 2.1-2.7
Bandit slides
January 13 Bandits: regret definition and analysis for epsilon-gredy and UCB, gradient-based bandits
RL book, chapter 2
Assignment 1 posted
Bandit slides
January 15 Wrap up of bandits: Gradient-based bandits, Thompson sampling.
Markov Decision Processes (MDPs)
RL book, chapter 2 and 3.1
Bandit slides
January 20 Value functions. Bellman equations, policy evaluation. Policy iteration. Value iteration RL book, chapter 3.2-4.1
Slides
January 22 More on dynamic programming: policy iteration, value iteration, contractions. RL book, Chapter 4.1-4.8
Slides
January 27 Policy evaluation using Monte-Carlo Methods and Temporal-Difference RL book, Sec. 5.1 5.2 6.1 6.2

Slides
January 29 Learning Control using Monte Carlo and TD, including SARSA RL book, Sec. 5.3 5.4 5.6 5.7 6.3 6.4 7.1 7.2 7.3 7.5
Assignment 1 due
Assignment 2 posted
Slides
February 3 Q-learning RL book, Sec. 6.5-6.7 9.1-9.3
Q-Learning slides
February 5 Function Value Approximation, DQN, Eligibility Traces RL book Sec. 10.2 10.5 16.5 12.1 12.2 12.4 12.5
Slides
David Silver's lecture on RL with function approximation
February 10 More on Eligibility Trace and TD(λ) RL book chapter 12
Slides
February 12 Plannning and model-based RL Slides RL book chapter 8
February 17 Deep model-based RL and planning RL book end of chapter 8, PlaNet Paper, Dreamer Paper, MuZero Paper
Slides
Assignment 2 due
Project information posted
February 19 Policy-gradient methods: Policy Gradient Theorem and REINFORCE RL book chapter 13.7 13.1-13.3
Slides
February 24 Policy-gradient methods: Actor-critic RL book chapter 13.4 13.5 13.6
Slides
Assignment 3 posted
February 26 Policy-gradient methods: Deterministic Policy Gradient, DDPG DPG paper, DDPG paper,
Slides
March 3 Study break
March 5 Study break
March 10 TRPO, PPO, Review TRPO paper, PPO paper
Slides
March 12 Hierachical RL Slides
Options paper
Option-critic architecture

Project proposal due
March 17 Wrap-up of HRL. Off-policy RL Assignment 3 due
Slides on HRL
Slides on off-policy learning
March 19 Offline and batch RL Slides
Offline RL tutorial (Levine, Kumar, Tucker and Fu, 2020)
March 24 Where do rewards come from? Inverse RL. Learning from Preferences Inverse RL and Preference-based learning
Inverse RL survey
March 26 Large Language Models (LLMs) and Reinforcement Learning from Human Feedback (RLHF) Slides (more info to be posted)
March 30 More on RL for LLMs Slides (more info to be posted)
April 2 Alignment for RLHF. Continual RL Slides
April 7 Never-ending / continual RL Slides
Continual RL survey
April 9RL applications Slides. If we have time: Reward is enough