no code implementations • 6 Jan 2022 • Yueyang Liu, Adithya M. Devraj, Benjamin Van Roy, Kuang Xu
We study the performance of an agent that attains a bounded information ratio with respect to a bandit environment with a Gaussian prior distribution and a Gaussian likelihood function when applied instead to a Bernoulli bandit.
no code implementations • 18 Feb 2021 • Adithya M. Devraj, Benjamin Van Roy, Kuang Xu
The information ratio offers an approach to assessing the efficacy with which an agent balances between exploration and exploitation.
no code implementations • 24 Feb 2020 • Adithya M. Devraj, Sean P. Meyn
Sample complexity bounds are a common performance metric in the Reinforcement Learning literature.
no code implementations • 7 Feb 2020 • Shuhang Chen, Adithya M. Devraj, Ana Bušić, Sean Meyn
This is motivation for the focus on mean square error bounds for parameter estimates.
no code implementations • NeurIPS 2020 • Shuhang Chen, Adithya M. Devraj, Fan Lu, Ana Bušić, Sean P. Meyn
Based on multiple experiments with a range of neural network sizes, it is found that the new algorithms converge quickly and are robust to choice of function approximation architecture.
1 code implementation • NeurIPS 2019 • Adithya M. Devraj, Jianshu Chen
We consider a generic empirical composition optimization problem, where there are empirical averages present both outside and inside nonlinear loss functions.
no code implementations • 25 Apr 2019 • Shuhang Chen, Adithya M. Devraj, Ana Bušić, Sean P. Meyn
The objective in this paper is to obtain fast converging reinforcement learning algorithms to approximate solutions to the problem of discounted cost optimal stopping in an irreducible, uniformly ergodic Markov chain, evolving on a compact subset of $\mathbb{R}^n$.
no code implementations • 28 Dec 2018 • Adithya M. Devraj, Ioannis Kontoyiannis, Sean P. Meyn
Value functions derived from Markov decision processes arise as a central component of algorithms as well as performance metrics in many statistics and engineering applications of machine learning techniques.
no code implementations • 17 Sep 2018 • Adithya M. Devraj, Ana Bušić, Sean Meyn
There are two well known SA techniques that are known to have optimal asymptotic variance: the Ruppert-Polyak averaging technique, and stochastic Newton-Raphson (SNR).
no code implementations • NeurIPS 2017 • Adithya M. Devraj, Sean Meyn
The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects.
no code implementations • 12 Jul 2017 • Adithya M. Devraj, Sean P. Meyn
The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects.
no code implementations • 6 Apr 2016 • Adithya M. Devraj, Sean P. Meyn
The algorithm introduced in this paper is intended to resolve two well-known problems with this approach: In the discounted-cost setting, the variance of the algorithm diverges as the discount factor approaches unity.