Search Results for author: Dongsheng Ding

Found 9 papers, 0 papers with code

Resilient Constrained Reinforcement Learning

no code implementations28 Dec 2023 Dongsheng Ding, Zhengyan Huan, Alejandro Ribeiro

It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward maximization objective and the constraint satisfaction, which is ubiquitous in constrained decision-making.

Decision Making reinforcement-learning +1

Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs

no code implementations NeurIPS 2023 Dongsheng Ding, Chen-Yu Wei, Kaiqing Zhang, Alejandro Ribeiro

To fill this gap, we employ the Lagrangian method to cast a constrained MDP into a constrained saddle-point problem in which max/min players correspond to primal/dual variables, respectively, and develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy.

Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning

no code implementations31 May 2023 Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanović

We examine online safe multi-agent reinforcement learning using constrained Markov games in which agents compete by maximizing their expected total rewards under a constraint on expected total utilities.

Multi-agent Reinforcement Learning reinforcement-learning +1

Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs

no code implementations6 Jun 2022 Dongsheng Ding, Kaiqing Zhang, Jiali Duan, Tamer Başar, Mihailo R. Jovanović

We study sequential decision making problems aimed at maximizing the expected total reward while satisfying a constraint on the expected total utility.

Decision Making

Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence

no code implementations8 Feb 2022 Dongsheng Ding, Chen-Yu Wei, Kaiqing Zhang, Mihailo R. Jovanović

When there is no uncertainty in the gradient evaluation, we show that our algorithm finds an $\epsilon$-Nash equilibrium with $O(1/\epsilon^2)$ iteration complexity which does not explicitly depend on the state space size.

Multi-agent Reinforcement Learning Policy Gradient Methods +1

Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes

no code implementations NeurIPS 2020 Dongsheng Ding, Kaiqing Zhang, Tamer Basar, Mihailo Jovanovic

To the best of our knowledge, our work is the first to establish non-asymptotic convergence guarantees of policy-based primal-dual methods for solving infinite-horizon discounted CMDPs.

Decision Making

Provably Efficient Safe Exploration via Primal-Dual Policy Optimization

no code implementations1 Mar 2020 Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanović

To this end, we present an \underline{O}ptimistic \underline{P}rimal-\underline{D}ual Proximal Policy \underline{OP}timization (OPDOP) algorithm where the value function is estimated by combining the least-squares policy evaluation and an additional bonus term for safe exploration.

Safe Exploration Safe Reinforcement Learning

Global exponential stability of primal-dual gradient flow dynamics based on the proximal augmented Lagrangian: A Lyapunov-based approach

no code implementations2 Oct 2019 Dongsheng Ding, Mihailo R. Jovanović

For a class of nonsmooth composite optimization problems with linear equality constraints, we utilize a Lyapunov-based approach to establish the global exponential stability of the primal-dual gradient flow dynamics based on the proximal augmented Lagrangian.

Fast Multi-Agent Temporal-Difference Learning via Homotopy Stochastic Primal-Dual Optimization

no code implementations7 Aug 2019 Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanović

We study the policy evaluation problem in multi-agent reinforcement learning where a group of agents, with jointly observed states and private local actions and rewards, collaborate to learn the value function of a given policy via local computation and communication over a connected undirected network.

Multi-agent Reinforcement Learning Stochastic Optimization

Cannot find the paper you are looking for? You can Submit a new open access paper.