A Maximum Mutual Information Framework for Multi-Agent Reinforcement Learning

4 Jun 2020  ·  Woojun Kim, Whiyoung Jung, Myungsik Cho, Youngchul Sung ·

In this paper, we propose a maximum mutual information (MMI) framework for multi-agent reinforcement learning (MARL) to enable multiple agents to learn coordinated behaviors by regularizing the accumulated return with the mutual information between actions. By introducing a latent variable to induce nonzero mutual information between actions and applying a variational bound, we derive a tractable lower bound on the considered MMI-regularized objective function. Applying policy iteration to maximize the derived lower bound, we propose a practical algorithm named variational maximum mutual information multi-agent actor-critic (VM3-AC), which follows centralized learning with decentralized execution (CTDE). We evaluated VM3-AC for several games requiring coordination, and numerical results show that VM3-AC outperforms MADDPG and other MARL algorithms in multi-agent tasks requiring coordination.

PDF Abstract
No code implementations yet. Submit your code now

Categories


Multiagent Systems

Datasets


  Add Datasets introduced or used in this paper