Search Results for author: Navdeep Kumar

Found 8 papers, 1 papers with code

On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes

no code implementations11 Mar 2024 Navdeep Kumar, Yashaswini Murthy, Itai Shufaro, Kfir Y. Levy, R. Srikant, Shie Mannor

We present the first finite time global convergence analysis of policy gradient in the context of infinite horizon average reward Markov decision processes (MDPs).

Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization

no code implementations3 Sep 2023 Uri Gadot, Esther Derman, Navdeep Kumar, Maxence Mohamed Elfatihi, Kfir Levy, Shie Mannor

In robust Markov decision processes (RMDPs), it is assumed that the reward and the transition dynamics lie in a given uncertainty set.

Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel

no code implementations9 Jun 2023 Kaixin Wang, Uri Gadot, Navdeep Kumar, Kfir Levy, Shie Mannor

Robust Markov Decision Processes (RMDPs) provide a framework for sequential decision-making that is robust to perturbations on the transition kernel.

Decision Making reinforcement-learning +1

An Efficient Solution to s-Rectangular Robust Markov Decision Processes

no code implementations31 Jan 2023 Navdeep Kumar, Kfir Levy, Kaixin Wang, Shie Mannor

We present an efficient robust value iteration for \texttt{s}-rectangular robust Markov Decision Processes (MDPs) with a time complexity comparable to standard (non-robust) MDPs which is significantly faster than any existing method.

LEMMA

Policy Gradient for Reinforcement Learning with General Utilities

no code implementations3 Oct 2022 Navdeep Kumar, Kaixin Wang, Kfir Levy, Shie Mannor

The policy gradient theorem proves to be a cornerstone in Linear RL due to its elegance and ease of implementability.

reinforcement-learning Reinforcement Learning (RL)

Efficient Policy Iteration for Robust Markov Decision Processes via Regularization

1 code implementation28 May 2022 Navdeep Kumar, Kfir Levy, Kaixin Wang, Shie Mannor

But we don't have a clear understanding to exploit this equivalence, to do policy improvement steps to get the optimal value function or policy.

The Geometry of Robust Value Functions

no code implementations30 Jan 2022 Kaixin Wang, Navdeep Kumar, Kuangqi Zhou, Bryan Hooi, Jiashi Feng, Shie Mannor

The key of this perspective is to decompose the value space, in a state-wise manner, into unions of hypersurfaces.

Cannot find the paper you are looking for? You can Submit a new open access paper.