Reinforcement Learning

Prerequisites

[Optional]

[Must have]

This course covers the theory and practice of Reinforcement Learning (RL), focusing on sequential decision-making under uncertainty. Topics include Markov Decision Processes (MDPs), value functions, Bellman operators, dynamic programming, Monte Carlo methods, temporal-difference learning, function approximation, deep RL, policy gradients, actor–critic algorithms, exploration strategies, and safe/model-based RL for robotics and control applications.

Equip students with the ability to model, analyze, and implement RL algorithms
Develop theoretical understanding of convergence, stability, and bias–variance trade-offs
Enable application of RL methods in simulation and control contexts
Provide hands-on experience implementing classic and modern RL algorithms

Formulate control and decision problems as MDPs/POMDPs
Derive and implement dynamic programming, Monte Carlo, and temporal-difference algorithms
Analyze convergence properties and stability under function approximation
Design and implement policy gradient and actor–critic algorithms
Integrate model-based and safe RL components (e.g., MPC) for real-world tasks

Week	Topic	Slides
1	Introduction to RL, Agent–Environment Interface	Lecture 1
2	Markov Decision Processes (MDPs)	Lecture 2
3	Value Functions, Bellman Equations	Lecture 3
4	Dynamic Programming	Lecture 4
5	Monte Carlo Methods	Lecture 5
6	Temporal Difference Learning	Lecture 6
7	Midterm Review & Exam	—
8	Policy Gradient Methods	Lecture 7
9	Actor–Critic Methods	Lecture 8
10	Deep Reinforcement Learning (I)	Lecture 9
11	Deep Reinforcement Learning (II)	Lecture 10
12	Applications in Robotics, Games, NLP	Lecture 11
13	Project Work & Advanced Topics (Safe RL, MARL)	Lecture 12
14	Presentations & Integration	Lecture 13
15	Final Review & Exam	—

Sutton & Barto, Reinforcement Learning: An Introduction (2nd ed.)
Puterman, Markov Decision Processes
Bertsekas, Dynamic Programming and Optimal Control
Selected papers and online resources (e.g., Gymnasium, Stable Baselines3, OpenAI Spinning Up)

Week 8 — in class, closed book. One A4 cheat sheet allowed.
Coverage: Topics from Weeks 1–7 (MDPs, Bellman equations, DP, MC, TD, Function Approximation, DQN)

Course Project: Students select a domain (preferably robotics or control), implement one or more RL algorithms, and write a 6–8 page report including methodology, experiments, and analysis.
Project Proposal: Week 11
Checkpoint: Week 14
Final Presentation: Week 15
Topics may include safe RL, model-based RL, multi-agent RL, exploration strategies, or novel applications.