Search Results for author: Adithya M. Devraj

Found 12 papers, 1 papers with code

Gaussian Imagination in Bandit Learning

no code implementations6 Jan 2022 Yueyang Liu, Adithya M. Devraj, Benjamin Van Roy, Kuang Xu

We study the performance of an agent that attains a bounded information ratio with respect to a bandit environment with a Gaussian prior distribution and a Gaussian likelihood function when applied instead to a Bernoulli bandit.

A Bit Better? Quantifying Information for Bandit Learning

no code implementations18 Feb 2021 Adithya M. Devraj, Benjamin Van Roy, Kuang Xu

The information ratio offers an approach to assessing the efficacy with which an agent balances between exploration and exploitation.

Q-learning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast Learning

no code implementations24 Feb 2020 Adithya M. Devraj, Sean P. Meyn

Sample complexity bounds are a common performance metric in the Reinforcement Learning literature.

Q-Learning

Zap Q-Learning With Nonlinear Function Approximation

no code implementations NeurIPS 2020 Shuhang Chen, Adithya M. Devraj, Fan Lu, Ana Bušić, Sean P. Meyn

Based on multiple experiments with a range of neural network sizes, it is found that the new algorithms converge quickly and are robust to choice of function approximation architecture.

OpenAI Gym Q-Learning

Stochastic Variance Reduced Primal Dual Algorithms for Empirical Composition Optimization

1 code implementation NeurIPS 2019 Adithya M. Devraj, Jianshu Chen

We consider a generic empirical composition optimization problem, where there are empirical averages present both outside and inside nonlinear loss functions.

Zap Q-Learning for Optimal Stopping Time Problems

no code implementations25 Apr 2019 Shuhang Chen, Adithya M. Devraj, Ana Bušić, Sean P. Meyn

The objective in this paper is to obtain fast converging reinforcement learning algorithms to approximate solutions to the problem of discounted cost optimal stopping in an irreducible, uniformly ergodic Markov chain, evolving on a compact subset of $\mathbb{R}^n$.

Q-Learning

Differential Temporal Difference Learning

no code implementations28 Dec 2018 Adithya M. Devraj, Ioannis Kontoyiannis, Sean P. Meyn

Value functions derived from Markov decision processes arise as a central component of algorithms as well as performance metrics in many statistics and engineering applications of machine learning techniques.

General Reinforcement Learning

Optimal Matrix Momentum Stochastic Approximation and Applications to Q-learning

no code implementations17 Sep 2018 Adithya M. Devraj, Ana Bušić, Sean Meyn

There are two well known SA techniques that are known to have optimal asymptotic variance: the Ruppert-Polyak averaging technique, and stochastic Newton-Raphson (SNR).

Q-Learning Stochastic Optimization

Zap Q-Learning

no code implementations NeurIPS 2017 Adithya M. Devraj, Sean Meyn

The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects.

Q-Learning

Fastest Convergence for Q-learning

no code implementations12 Jul 2017 Adithya M. Devraj, Sean P. Meyn

The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects.

Q-Learning reinforcement-learning +1

Differential TD Learning for Value Function Approximation

no code implementations6 Apr 2016 Adithya M. Devraj, Sean P. Meyn

The algorithm introduced in this paper is intended to resolve two well-known problems with this approach: In the discounted-cost setting, the variance of the algorithm diverges as the discount factor approaches unity.

Unity

Cannot find the paper you are looking for? You can Submit a new open access paper.