Course Schedule
wk | Lecture | Notes | Links | Reading material |
---|---|---|---|---|
1/26 | Introduction to Deep Decision Making | Part 1: What is this class about? Part 2: Why is decision making hard? | 1. A framework for behavioural cloning, Bain and Sommut, 1999. 2. A reduction of imitation learning and structured prediction to no-regret online learning, Ross et al., 2011. | |
2/2 | Supervised Learning for Decision Making | Part 1: Training Neural Networks Part 2: Variants of Behavior Cloning policies | 1. Behavior Cloning (ALVINN) 2. Variational Autoencoder 3. Generative Adversarial Networks 4. Case study papers: VINN, RT-1, Dobb-E, Implicit BC, BeT, C-BeT, Diffusion Policy | |
2/9 | [Tutorial] Supervised Learning for Decision Making | Setting up decision making environments and model training | ||
2/16 | Case studies of supervised decision making | Examples of supervised learning working in the real world | ||
2/23 | Decision making without expert data | Part 1: Formalism for Bandit problem Part 2: Algorithms for Bandit problems | ||
3/1 | Sequential Decision making | Part 1: Motivation and formalism Part 2: Core concepts of value and policy iteration | ||
3/8 | Q-learning: from tables to Atari | Part 1: Why Q function? Part 2: Deep Q functions: What goes wrong and how to make them work? Part 3: Variants of DQN | ||
3/15 | Policy Optimization | Part 1: MC-based optimization (CEM) Part 2: Differentiable versions (REINFORCE) Part 3: Trust region / proximal policy optimization | ||
3/22 | SPRING BREAK | |||
3/29 | [Tutorial] Visual and Temporal Policy Learning | |||
4/5 | Guest Lecture - Mahi Shafiullah | |||
4/12 | Decision making with world models | Part 1: Classical approaches (LQR / iLQR / DDP) Part 2: Model-based RL Part 3: case study: Dreamer v3 | ||
4/19 | Decision making with Tree Search | MCTC (AlphaGo, AlphaZero) | ||
4/26 | Revisiting Decision making with expert data | Inverse RL and offline RL | ||
5/3 | Course Project Presentations |