no code implementations • 18 Jul 2017 • Eric Mazumdar, Roy Dong, Vicenç Rúbies Royo, Claire Tomlin, S. Shankar Sastry
We formulate a multi-armed bandit (MAB) approach to choosing expert policies online in Markov decision processes (MDPs).
Systems and Control