BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//CREST - ECPv5.1.3//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:CREST
X-ORIGINAL-URL:https://crest.science
X-WR-CALDESC:Events for CREST
BEGIN:VTIMEZONE
TZID:Europe/Paris
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:20210328T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:20211031T010000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=Europe/Paris:20210215T170000
DTEND;TZID=Europe/Paris:20210215T181500
DTSTAMP:20210928T154601
CREATED:20210125T141851Z
LAST-MODIFIED:20210125T141851Z
UID:12517-1613408400-1613412900@crest.science
SUMMARY:Jiantao JIAO (UC Berkeley) - "Sharp Minimax Rates for Imitation Learning"
DESCRIPTION:\nThe Statistical Seminar: Every Monday at 2:00 pm.\nTime: 5:00 pm – 6:15 pm Exceptionally\nDate: 15th of February 2021\nPlace: Visio\nJiantao JIAO (UC Berkeley) – “Sharp Minimax Rates for Imitation Learning” \nAbstract: We establish sharp minimax bounds on Imitation Learning (IL) in episodic Markov Decision Processes (MDPs) with a state space S. We focus on the known-transition setting where the learner is provided a dataset of N length-H trajectories from a deterministic expert policy and knows the MDP transition. We show the minimax rate is Theta (|S|H^{3/2}/N) while the unknown-transition setting suffers from a larger sharp rate Theta(|S|H^2/N)~\citep{rajaraman2020fundamental}. Our upper bound is established using the Mimic-MD algorithm in~\citet{rajaraman2020fundamental} which we prove to be computationally efficient\, and the lower bound is established by proving a two-way reduction between IL and the value estimation problem of the unknown expert policy under any given reward function\, as well as linear functional estimation with subsampled observations. We further show that under the additional assumption that the expert is optimal for the true reward function\, there exists an efficient algorithm\, which we term as Mimic-Mixture\, that provably achieves suboptimality O(1/N) for arbitrary 3-state MDPs with rewards only at the terminal layer. In contrast\, no algorithm can achieve suboptimality O(\sqrt{H}/N) with high probability if the expert is not constrained to be optimal. Our work formally establishes the benefit of the expert optimal assumption in the known transition setting\, while~\citet{rajaraman2020fundamental} showed it does not help when transitions are unknown. \nOrganizers:\nCristina BUTUCEA (CREST)\, Alexandre TSYBAKOV (CREST)\, Karim LOUNICI (CMAP) \, Zoltan SZABO (CMAP)\nSponsors:\nCREST-CMAP\n \n\n
URL:https://crest.science/event/jiantao-jiao/
CATEGORIES:Statistics
ATTACH;FMTTYPE=:
END:VEVENT
END:VCALENDAR