Course Schedule
| wk | Lecture | Notes | Links | Reading material |
|---|---|---|---|---|
| 1/26 | Introduction to Deep Decision Making | Part 1: What is this class about? Part 2: Why is decision making hard? | 1. A framework for behavioural cloning, Bain and Sommut, 1999. 2. A reduction of imitation learning and structured prediction to no-regret online learning, Ross et al., 2011. | |
| 2/2 | Supervised Learning for Decision Making | Part 1: Training Neural Networks Part 2: Variants of Behavior Cloning policies | 1. Behavior Cloning (ALVINN) 2. Variational Autoencoder 3. Generative Adversarial Networks 4. Case study papers: VINN, RT-1, Dobb-E, Implicit BC, BeT, C-BeT, Diffusion Policy | |
| 2/9 | [Tutorial] Supervised Learning for Decision Making | Setting up decision making environments and model training | ||
| 2/16 | Case studies of supervised decision making | Examples of supervised learning working in the real world | ||
| 2/23 | Decision making without expert data | Part 1: Formalism for Bandit problem Part 2: Algorithms for Bandit problems | ||
| 3/1 | Sequential Decision making | Part 1: Motivation and formalism Part 2: Core concepts of value and policy iteration | ||
| 3/8 | Q-learning: from tables to Atari | Part 1: Why Q function? Part 2: Deep Q functions: What goes wrong and how to make them work? Part 3: Variants of DQN | ||
| 3/15 | Policy Optimization | Part 1: MC-based optimization (CEM) Part 2: Differentiable versions (REINFORCE) Part 3: Trust region / proximal policy optimization | ||
| 3/22 | SPRING BREAK | |||
| 3/29 | [Tutorial] Visual and Temporal Policy Learning | |||
| 4/5 | Guest Lecture - Mahi Shafiullah | |||
| 4/12 | Decision making with world models | Part 1: Classical approaches (LQR / iLQR / DDP) Part 2: Model-based RL Part 3: case study: Dreamer v3 | ||
| 4/19 | Decision making with Tree Search | MCTC (AlphaGo, AlphaZero) | ||
| 4/26 | Revisiting Decision making with expert data | Inverse RL and offline RL | ||
| 5/3 | Course Project Presentations |