FSV: Learning to Factorize Soft Value Function for Cooperative Multi-Agent Reinforcement Learning

1 Jan 2021  ·  Yueheng Li, Tianhao Zhang, Chen Wang, Jinan Sun, Shikun Zhang, Guangming Xie ·

We explore energy-based solutions for cooperative multi-agent reinforcement learning (MARL) using the idea of function factorization in centralized training with decentralized execution (CTDE). Existing CTDE based factorization methods are susceptible to the relative overgeneralization, where finding a suboptimal Nash Equilibrium, which is a well-known game-theoretic pathology. To resolve this issue, we propose a novel factorization method for cooperative MARL, named FSV, which learns to factorize the joint soft value function into individual ones for decentralized execution. Theoretical analysis shows that FSV solves a rich class of factorization tasks. Our experiment for the well-known task of the Max of Two Quadratics game shows that FSV fully converges to global optima in the joint action space in the continuous tasks by local searching in the joint action space. We evaluate FSV on a challenging set of StarCraft II micromanagement tasks, and show that FSV significantly outperforms existing factorization multi-agent reinforcement learning methods.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here