9/1 | HW-0 Release | Campuswire | Fundamentals of Linear Algebra and Deep Learning |
9/3 | Part 1: Course overview Part 2: Supervised learning for control | | 1. A framework for behavioural cloning, Bain and Sommut, 1999. 2. A reduction of imitation learning and structured prediction to no-regret online learning, Ross et al., 2011. |
9/10 | Part 1: Introduction to RL Part 2: Value functions | | Part 1: Chapter 1 of RL book. Part 2: Chapter 3 of RL book. |
9/10 | HW-0 Due | | |
9/10 | HW-1 Release | | Imitation via Supervision. |
9/17 | Deep Q Learning | | 1. Human-level control through deep reinforcement learning, Mnih et al., 2015. 2. Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel et al., 2017. 3. Agent 57, Deepmind blog post |
9/24 | HW-1 Due | | |
9/24 | HW-2 Release | | Deep Q Learning++. |
9/24 | Policy gradients | | 1. REINFORCE, Williams, 1992. 2. A Natural Policy Gradient, Kakade, 2002. 3. Proximal Policy Optimization Algorithms, Schulman et al. 2017. |
10/1 | Part 1: Actor-critic methods Part 2: Distributed RL | | 1. Deterministic Policy Gradient Algorithms, Silver et al., 2014. 2. Asynchronous Methods for Deep Reinforcement Learning, Mnih et al. 2016. 3. Massively Parallel Methods for Deep Reinforcement Learning, Nair et al. 2015. |
10/8 | HW-2 Due | | |
10/8 | HW-3 Release | | RL with Policy Gradients. |
10/8 | Exploration in RL | | 1. Exploration strategies in deep RL, blog by Lilian Weng, 2020 2. Intrinsic Motivation Systems for Autonomous Mental Development, Oudeyer et al. 2007. 3. Curiosity-driven Exploration by Self-supervised Prediction, Pathak et al. 2017. |
10/9 | Project Proposals Due | | |
10/15 | Generalization in RL | | 1. Quantifying Generalization in RL, blog by OpenAI 2018 2. Supersizing Self-supervision: Learning to Grasp from 50K Tries and 700 Robot Hours, Pinto and Gupta 2016 3. Visual Imitation Made Easy, Young et al. 2020 |
10/22 | HW-3 Due | | |
10/22 | HW-4 Release | | Exploration with Bandits. |
10/22 | Imitation Learning | | 1. Apprenticeship Learning via Inverse Reinforcement Learning, Abbeel and Ng 2004 2. Generative Adversarial Imitation Learning, Ho et al. 2016 3. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations, Rajeswaran et al. 2018 |
10/29 | HW-4 Due | | |
10/29 | Control and planning | | 1. Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems, Li and Todorov 2004 2. Benchmarking Model-Based Reinforcement Learning, Wang et al. 2019 |
11/5 | Latent State Discovery -- John Langford | | Guest Lecture |
11/12 | Multi-Agent Learning -- Noam Brown | | Guest Lecture |
11/19 | Unsupervised Reinforcement Learning -- Denis Yarats | | Guest Lecture |
12/3 | Current frontiers | | |
12/10 | Final Project Presentations and Writeups | | |