no code implementations • 28 Dec 2023 • Dongsheng Ding, Zhengyan Huan, Alejandro Ribeiro
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward maximization objective and the constraint satisfaction, which is ubiquitous in constrained decision-making.
no code implementations • NeurIPS 2023 • Dongsheng Ding, Chen-Yu Wei, Kaiqing Zhang, Alejandro Ribeiro
To fill this gap, we employ the Lagrangian method to cast a constrained MDP into a constrained saddle-point problem in which max/min players correspond to primal/dual variables, respectively, and develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy.
no code implementations • 31 May 2023 • Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanović
We examine online safe multi-agent reinforcement learning using constrained Markov games in which agents compete by maximizing their expected total rewards under a constraint on expected total utilities.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 6 Jun 2022 • Dongsheng Ding, Kaiqing Zhang, Jiali Duan, Tamer Başar, Mihailo R. Jovanović
We study sequential decision making problems aimed at maximizing the expected total reward while satisfying a constraint on the expected total utility.
no code implementations • 8 Feb 2022 • Dongsheng Ding, Chen-Yu Wei, Kaiqing Zhang, Mihailo R. Jovanović
When there is no uncertainty in the gradient evaluation, we show that our algorithm finds an $\epsilon$-Nash equilibrium with $O(1/\epsilon^2)$ iteration complexity which does not explicitly depend on the state space size.
Multi-agent Reinforcement Learning Policy Gradient Methods +1
no code implementations • NeurIPS 2020 • Dongsheng Ding, Kaiqing Zhang, Tamer Basar, Mihailo Jovanovic
To the best of our knowledge, our work is the first to establish non-asymptotic convergence guarantees of policy-based primal-dual methods for solving infinite-horizon discounted CMDPs.
no code implementations • 1 Mar 2020 • Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanović
To this end, we present an \underline{O}ptimistic \underline{P}rimal-\underline{D}ual Proximal Policy \underline{OP}timization (OPDOP) algorithm where the value function is estimated by combining the least-squares policy evaluation and an additional bonus term for safe exploration.
no code implementations • 2 Oct 2019 • Dongsheng Ding, Mihailo R. Jovanović
For a class of nonsmooth composite optimization problems with linear equality constraints, we utilize a Lyapunov-based approach to establish the global exponential stability of the primal-dual gradient flow dynamics based on the proximal augmented Lagrangian.
no code implementations • 7 Aug 2019 • Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanović
We study the policy evaluation problem in multi-agent reinforcement learning where a group of agents, with jointly observed states and private local actions and rewards, collaborate to learn the value function of a given policy via local computation and communication over a connected undirected network.