no code implementations • 19 Jul 2022 • Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor
For the AC-based approach we provide convergence rate guarantees to a stationary point in the basic AC case and to a global optimum in the NAC case.
no code implementations • 24 May 2021 • Avinash Mohan, Arpan Chattopadhyay, Shivam Vinayak Vatsa, Anurag Kumar
Limiting the policy to this class reduces the problem to obtaining a queue switching policy at queue emptiness instants.
no code implementations • 16 Feb 2021 • Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor
We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones.
no code implementations • 5 Nov 2019 • Mohammadi Zaki, Avinash Mohan, Aditya Gopalan
We give a new algorithm for best arm identification in linearly parameterised bandits in the fixed confidence setting.