[Optional] Supervised & Unsupervised Learning, Neural Networks. [Must have] Good knowledge of linear algebra and probability.
Course description
This course covers the theory and practice of Reinforcement Learning (RL), focusing on sequential decision-making under uncertainty. Topics include Markov Decision Processes (MDPs), value functions, Bellman operators, dynamic programming, Monte Carlo methods, temporal-difference learning, function approximation, deep RL, policy gradients, actor–critic algorithms, exploration strategies, and safe/model-based RL for robotics and control applications.
Course objectives
Equip students with the ability to model, analyze, and implement RL algorithms
Develop theoretical understanding of convergence, stability, and bias–variance trade-offs
Enable application of RL methods in simulation and control contexts
Provide hands-on experience implementing classic and modern RL algorithms
Learning outcomes
Formulate control and decision problems as MDPs/POMDPs
Derive and implement dynamic programming, Monte Carlo, and temporal-difference algorithms
Analyze convergence properties and stability under function approximation
Design and implement policy gradient and actor–critic algorithms
Integrate model-based and safe RL components (e.g., MPC) for real-world tasks
Week 8 — in class, closed book. One A4 cheat sheet allowed.
Coverage: Topics from Weeks 1–7 (MDPs, Bellman equations, DP, MC, TD, Function Approximation, DQN)
Projects
Course Project: Students select a domain (preferably robotics or control), implement one or more RL algorithms, and write a 6–8 page report including methodology, experiments, and analysis.
Project Proposal: Week 11
Checkpoint: Week 14
Final Presentation: Week 15
Topics may include safe RL, model-based RL, multi-agent RL, exploration strategies, or novel applications.