Reinforcement Learning

Drones (Agents) goal ? Environment / World

Prerequisites

    [Optional] Supervised & Unsupervised Learning, Neural Networks. [Must have] Good knowledge of linear algebra and probability.

Course description

  • This course covers the theory and practice of Reinforcement Learning (RL), focusing on sequential decision-making under uncertainty. Topics include Markov Decision Processes (MDPs), value functions, Bellman operators, dynamic programming, Monte Carlo methods, temporal-difference learning, function approximation, deep RL, policy gradients, actor–critic algorithms, exploration strategies, and safe/model-based RL for robotics and control applications.

Course objectives

  • Equip students with the ability to model, analyze, and implement RL algorithms
  • Develop theoretical understanding of convergence, stability, and bias–variance trade-offs
  • Enable application of RL methods in simulation and control contexts
  • Provide hands-on experience implementing classic and modern RL algorithms

Learning outcomes

  • Formulate control and decision problems as MDPs/POMDPs
  • Derive and implement dynamic programming, Monte Carlo, and temporal-difference algorithms
  • Analyze convergence properties and stability under function approximation
  • Design and implement policy gradient and actor–critic algorithms
  • Integrate model-based and safe RL components (e.g., MPC) for real-world tasks

Course Schedule (15 Weeks) and Materials

Week Topic Slides
1 Introduction to RL, Agent–Environment Interface Lecture 1
2 Markov Decision Processes (MDPs) Lecture 2
3 Value Functions, Bellman Equations Lecture 3
4 Dynamic Programming Lecture 4
5 Monte Carlo Methods Lecture 5
6 Temporal Difference Learning Lecture 6
7 Midterm Review & Exam
8 Policy Gradient Methods Lecture 7
9 Actor–Critic Methods Lecture 8
10 Deep Reinforcement Learning (I) Lecture 9
11 Deep Reinforcement Learning (II) Lecture 10
12 Applications in Robotics, Games, NLP Lecture 11
13 Project Work & Advanced Topics (Safe RL, MARL) Lecture 12
14 Presentations & Integration Lecture 13
15 Final Review & Exam

Textbooks

  • Sutton & Barto, Reinforcement Learning: An Introduction (2nd ed.)
  • Puterman, Markov Decision Processes
  • Bertsekas, Dynamic Programming and Optimal Control
  • Selected papers and online resources (e.g., Gymnasium, Stable Baselines3, OpenAI Spinning Up)

Midterm exam

  • Week 8 — in class, closed book. One A4 cheat sheet allowed.
  • Coverage: Topics from Weeks 1–7 (MDPs, Bellman equations, DP, MC, TD, Function Approximation, DQN)

Projects

  • Course Project: Students select a domain (preferably robotics or control), implement one or more RL algorithms, and write a 6–8 page report including methodology, experiments, and analysis.
  • Project Proposal: Week 11
  • Checkpoint: Week 14
  • Final Presentation: Week 15
  • Topics may include safe RL, model-based RL, multi-agent RL, exploration strategies, or novel applications.