no code implementations • 15 Feb 2023 • Donghao Ying, Yuhao Ding, Alec Koppel, Javad Lavaei
The objective is to find a localized policy that maximizes the average of the team's local utility functions without the full observability of each agent in the team.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 22 May 2022 • Donghao Ying, Mengzi Amy Guo, Yuhao Ding, Javad Lavaei, Zuo-Jun Max Shen
We study convex Constrained Markov Decision Processes (CMDPs) in which the objective is concave and the constraints are convex in the state-action occupancy measure.
no code implementations • 17 Oct 2021 • Donghao Ying, Yuhao Ding, Javad Lavaei
We study entropy-regularized constrained Markov decision processes (CMDPs) under the soft-max parameterization, in which an agent aims to maximize the entropy-regularized value function while satisfying constraints on the expected total utility.