Reinforcement Learning

Year:

2nd year

Semester:

Programme main editor:

I2CAT

Onsite in:

AU, UBB

Remote:

ECTS range:

5-7 ECTS

Professors

Francesco De Pellegrini

Professors

Laura Dioşan

UBB

Naresh Modina

CNAM

Prerequisites:

Students are required to have taken an introductory machine learning course.

Good knowledge on probability and statistics is expected.

Bases on Markov Chains are recommended, but this is not a prerequisite.

Pedagogical objectives:

This course provides an overview of reinforcement learning (RL) methods. Both theoretical and programming aspects will be extensively explored in this course in order to acquire a solid expertise on both. By the end of the course, students should:

Understand the notion of stochastic approximations and their relation with RL;
Understand the basis of Markov decision theory;
Apply Dynamic Programming methods to solve the Bellman equations;
Master the basic techniques of Reinforcement Learning: Monte Carlo, Time-difference and Policy Gradient;
Study a proof of convergence for RL algorithms;
Master more advanced techniques such as actor-critic methods and deep RL.

Evaluation modalities:

Final exam, lab and research project reports.

All students in the class will also conduct a research project in the field of reinforcement learning and write a short 5-page paper. Subjects will be provided during the first-class session, related to Constrained RL and Delayed RL.

Description:

This course will introduce machine learning techniques based on stochastic approximations and MDP models, i.e., SARSA, Q-learning, policy gradient. Two homework assignments will focus on implementing these techniques, in order to learn how to master them by direct implementation. A project in teams of 2/3 students will permit to address more advanced techniques and problems in the field of RL and more in general the application of Markov theory for modeling and optimization.

Lectures:

Course Overview. Introduction to Markov decision theory, stochastic approximations, and reinforcement learning;
Stochastic approximations: the Robbins-Monro algorithm;
Criteria for convergence;
Application to admission control problems;
Markov decision processes: definitions, average cost and discounted cost;
Bellman equations. Solutions based on Dynamic Programming;
Monte Carlo methods for Reinforcement Learning;
Time Difference methods: SARSA and Q-Learning;
Proof of convergence of Q-Learning;
Policy gradient: REINFORCE;
Actor-critic methods;
Multi-armed bandits;
Deep-reinforcement Learning.

Lab assignments:

Practice of stochastic approximation on a traffics admission problem;
Practice of Montecarlo, Q-learning and SARSA on gridworld (discounted cost);
Practice of buffer management with admission control (average cost).

Required teaching material

Bibliography: • Artificial Intelligence: A modern approach, S. Russell and P. Norvig, Prentice Hall, 3rd edition, 2010. • Reinforcement Learning: An Introduction, R. S. Sutton and A. G. Barto, MIT Press, 1992

Teaching volume:

lessons:

28-42 hours

Exercices:

Supervised lab:

0-28 hours

Project:

0-3 hours

Devices:

Laboratory-Based Course Structure
Open-Source Software Requirements