Date Topic Materials
January 9 Introduction to reinforcement learning. Bandit algorithms RL book, chapters 1,2.
Intro slides
Bandit slides
January 11 More on bandits and exploration RL book, chapter 2.
Bandit slides
January 16 More on Bandits. RL book, chapters 2. Also see Csaba's book and blog
January 18 Finite MDPs, Bellman equations, policy evaluation RL book chapter 4
January 23 Control, optimality equations, value iteration, policy iteration. RL book chapter 4.
January 25 Monte-Carlo Methods, Temporal-Difference Learning RL book, chapters 5 and 6
January 30 More on TD learning, Multi-step Bootstrapping RL book, chapter 6 and 7
February 1 More on multi-step Bootstrapping RL book, chapter 7
February 6 Planning and learning with tabular methods RL book, chapter 8
February 8 More on planning and learning with tabular methods RL book, chapter 8
February 13 SARSA, Q-Learning and model-free control TBA
February 15 Temporal abstraction Option's paper
February 20 On-policy control with function approximation RL book, chapter 10
February 22 Off-policy learning with function approximation RL book, chapter 11
February 27 More on off-policy learning. Eligibility traces. RL book, chapters 11, 12
March 1 Eligibility traces. RL book, chapter 12
March 6 Study break
March 8 Study break
March 13 LSTD, LSPI, Fitted-Q
March 15 In-class midterm exam
March 20 Policy Gradient Methods RL book, chapter 13
March 22 More on gradient-based methods TBD
March 27 Frontiers: learning options using gradient-based methods TBD
March 29 Frontiers: Meta-learning TBD
April 3 Frontiers: Intrinsic motivation and reward origins TBD
April 5 Frontiers: Generalized value functions TBD
April 10 Frontiers: TBD TBD
April 12 Wrap-up TBD

January 11