Search Results for author: Prashanth L. A

Found 7 papers, 1 papers with code

A policy gradient approach for optimization of smooth risk measures

no code implementations22 Feb 2022 Nithia Vijayan, Prashanth L. A

We propose policy gradient algorithms for solving a risk-sensitive reinforcement learning (RL) problem in on-policy as well as off-policy settings.

reinforcement-learning Reinforcement Learning (RL)

Policy Gradient Methods for Distortion Risk Measures

no code implementations9 Jul 2021 Nithia Vijayan, Prashanth L. A

We propose policy gradient algorithms which learn risk-sensitive policies in a reinforcement learning (RL) framework.

Policy Gradient Methods reinforcement-learning +1

Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint

no code implementations6 Jan 2021 Nithia Vijayan, Prashanth L. A

From these results, we infer that the first algorithm converges at a rate that is comparable to the well-known REINFORCE algorithm in an off-policy RL context, while the second algorithm exhibits an improved rate of convergence.

Off-policy evaluation

Non-asymptotic bounds for stochastic optimization with biased noisy gradient oracles

no code implementations26 Feb 2020 Nirav Bhavsar, Prashanth L. A

We introduce biased gradient oracles to capture a setting where the function measurements have an estimation error that can be controlled through a batch size parameter.

Stochastic Optimization

Correlated bandits or: How to minimize mean-squared error online

no code implementations8 Feb 2019 Vinay Praneeth Boda, Prashanth L. A

Motivated by such applications, we formulate the correlated bandit problem, where the objective is to find the arm with the lowest mean-squared error (MSE) in estimating all the arms.

Random directions stochastic approximation with deterministic perturbations

1 code implementation8 Aug 2018 Prashanth L. A, Shalabh Bhatnagar, Nirav Bhavsar, Michael Fu, Steven I. Marcus

We introduce deterministic perturbation schemes for the recently proposed random directions stochastic approximation (RDSA) [17], and propose new first-order and second-order algorithms.

Policy Gradients for CVaR-Constrained MDPs

no code implementations12 May 2014 Prashanth L. A

We study a risk-constrained version of the stochastic shortest path (SSP) problem, where the risk measure considered is Conditional Value-at-Risk (CVaR).

Cannot find the paper you are looking for? You can Submit a new open access paper.