no code implementations • NeurIPS 2021 • Yi Wan, Abhishek Naik, Richard S. Sutton
We extend the options framework for temporal abstraction in reinforcement learning from discounted Markov decision processes (MDPs) to average-reward MDPs.
no code implementations • 17 Apr 2021 • Katya Kudashkina, Yi Wan, Abhishek Naik, Richard S. Sutton
Our algorithms and experiments are the first to treat MBRL with expectation models in a general setting.
no code implementations • 2 Oct 2020 • Anirban Santara, Sohan Rudra, Sree Aditya Buridi, Meha Kaushik, Abhishek Naik, Bharat Kaul, Balaraman Ravindran
In this work, we present MADRaS, an open-source multi-agent driving simulator for use in the design and evaluation of motion planning algorithms for autonomous driving.
1 code implementation • 29 Jun 2020 • Yi Wan, Abhishek Naik, Richard S. Sutton
We introduce learning and planning algorithms for average-reward MDPs, including 1) the first general proven-convergent off-policy model-free control algorithm without reference states, 2) the first proven-convergent off-policy model-free prediction algorithm, and 3) the first off-policy learning algorithm that converges to the actual value function rather than to the value function plus an offset.
no code implementations • 4 Oct 2019 • Abhishek Naik, Roshan Shariff, Niko Yasui, Hengshuai Yao, Richard S. Sutton
Discounted reinforcement learning is fundamentally incompatible with function approximation for control in continuing tasks.
1 code implementation • 20 Jul 2017 • Anirban Santara, Abhishek Naik, Balaraman Ravindran, Dipankar Das, Dheevatsa Mudigere, Sasikanth Avancha, Bharat Kaul
Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the expert's behavior is available as a fixed set of trajectories.