Matteo PIROTTA (Facebook AI) – “Introduction to exploration-exploitation in reinforcement learning “
March 8, 2:00 pm - 3:15 pm
The Statistical Seminar: Every Monday at 2:00 pm.
Time: 2:00 pm – 3:15 pm
Date: 8th of March 2021
Matteo PIROTTA (Facebook AI) – “Introduction to exploration-exploitation in reinforcement learning ”
Abstract: One of the major challenges in online reinforcement learning (RL) is to trade off between exploration of the environment to gather information and exploitation of the samples observed so far to perform a near-optimal policy. While the exploration-exploitation trade-off has been widely studied in the multi-armed bandit literature, the RL setting poses specific challenges due to the dynamical nature of the environment.
In this seminar, we will review basic notions of RL (i.e., Markov decision process, value function, value iteration) and we will introduce the regret minimization problem in the finite-horizon setting in environments with finite states and actions. Then we will study how algorithmic principles, such as optimism in face of uncertainty, are instantiated in RL and what are their theoretical guarantees. Finally, we will briefly discuss the most recent results on the topic and the remaining open questions.
Cristina BUTUCEA (CREST), Alexandre TSYBAKOV (CREST), Karim LOUNICI (CMAP) , Zoltan SZABO (CMAP)