2 code implementations • 22 Sep 2022 • Luca Viano, Angeliki Kamoutsi, Gergely Neu, Igor Krawczuk, Volkan Cevher
Thanks to PPM, we avoid nested policy evaluation and cost updates for online IL appearing in the prior literature.
no code implementations • 31 Dec 2021 • Angeliki Kamoutsi, Goran Banjac, John Lygeros
We consider large-scale Markov decision processes (MDPs) with an unknown cost function and employ stochastic convex optimization tools to address the problem of imitation learning, which consists of learning a policy from a finite set of expert demonstrations.
no code implementations • 28 Dec 2021 • Angeliki Kamoutsi, Goran Banjac, John Lygeros
We consider large-scale Markov decision processes with an unknown cost function and address the problem of learning a policy from a finite set of expert demonstrations.