no code implementations • 22 Feb 2022 • Nithia Vijayan, Prashanth L. A
We propose policy gradient algorithms for solving a risk-sensitive reinforcement learning (RL) problem in on-policy as well as off-policy settings.
no code implementations • 9 Jul 2021 • Nithia Vijayan, Prashanth L. A
We propose policy gradient algorithms which learn risk-sensitive policies in a reinforcement learning (RL) framework.
no code implementations • 6 Jan 2021 • Nithia Vijayan, Prashanth L. A
From these results, we infer that the first algorithm converges at a rate that is comparable to the well-known REINFORCE algorithm in an off-policy RL context, while the second algorithm exhibits an improved rate of convergence.
no code implementations • 26 Feb 2020 • Nirav Bhavsar, Prashanth L. A
We introduce biased gradient oracles to capture a setting where the function measurements have an estimation error that can be controlled through a batch size parameter.
no code implementations • 8 Feb 2019 • Vinay Praneeth Boda, Prashanth L. A
Motivated by such applications, we formulate the correlated bandit problem, where the objective is to find the arm with the lowest mean-squared error (MSE) in estimating all the arms.
1 code implementation • 8 Aug 2018 • Prashanth L. A, Shalabh Bhatnagar, Nirav Bhavsar, Michael Fu, Steven I. Marcus
We introduce deterministic perturbation schemes for the recently proposed random directions stochastic approximation (RDSA) [17], and propose new first-order and second-order algorithms.
no code implementations • 12 May 2014 • Prashanth L. A
We study a risk-constrained version of the stochastic shortest path (SSP) problem, where the risk measure considered is Conditional Value-at-Risk (CVaR).