Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment, while policy gradient suffers from a variance that increases as the number of agents grows. We then present an adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination. Additionally, we introduce a training regimen utilizing an ensemble of policies for each agent that leads to more robust multi-agent policies. We show the strength of our approach compared to existing methods in cooperative as well as competitive scenarios, where agent populations are able to discover various physical and informational coordination strategies.

PDF Abstract NeurIPS 2017 PDF NeurIPS 2017 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
SMAC+ Def_Armored_sequential MADDPG Median Win Rate 90.6 # 4
SMAC+ Def_Infantry_sequential MADDPG Median Win Rate 100 # 1
SMAC+ Def_Outnumbered_sequential MADDPG Median Win Rate 81.3 # 2
SMAC+ Off_Complicated_sequential MADDPG Median Win Rate 0.0 # 3
SMAC+ Off_Distant_sequential MADDPG Median Win Rate 0.0 # 3
SMAC+ Off_Hard_sequential MADDPG Median Win Rate 0.0 # 3
SMAC+ Off_Near_sequential MADDPG Median Win Rate 75.0 # 3
SMAC+ Off_Superhard_sequential MADDPG Median Win Rate 0.0 # 2

Methods