TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
SMAC+	Def_Armored_parallel	COMA	Median Win Rate	0.0	# 6
SMAC+	Def_Armored_sequential	COMA	Median Win Rate	0.0	# 9
SMAC+	Def_Infantry_parallel	COMA	Median Win Rate	50.0	# 6
SMAC+	Def_Infantry_sequential	COMA	Median Win Rate	28.1	# 11
SMAC+	Def_Outnumbered_parallel	COMA	Median Win Rate	0.0	# 4
SMAC+	Def_Outnumbered_sequential	COMA	Median Win Rate	0.0	# 5
SMAC+	Off_Complicated_parallel	COMA	Median Win Rate	0.0	# 4
SMAC+	Off_Complicated_sequential	COMA	Median Win Rate	0.0	# 3
SMAC+	Off_Distant_parallel	COMA	Median Win Rate	0.0	# 3
SMAC+	Off_Distant_sequential	COMA	Median Win Rate	0.0	# 3
SMAC+	Off_Hard_parallel	COMA	Median Win Rate	0.0	# 3
SMAC+	Off_Hard_sequential	COMA	Median Win Rate	0.0	# 3
SMAC+	Off_Near_parallel	COMA	Median Win Rate	20.0	# 4
SMAC+	Off_Near_sequential	COMA	Median Win Rate	0.0	# 4
SMAC+	Off_Superhard_parallel	COMA	Median Win Rate	0.0	# 1
SMAC+	Off_Superhard_sequential	COMA	Median Win Rate	0.0	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/counterfactual-multi-agent-policy-gradients/smac-on-smac-off-superhard-parallel)](https://paperswithcode.com/sota/smac-on-smac-off-superhard-parallel?p=counterfactual-multi-agent-policy-gradients)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/counterfactual-multi-agent-policy-gradients/smac-on-smac-off-superhard-sequential)](https://paperswithcode.com/sota/smac-on-smac-off-superhard-sequential?p=counterfactual-multi-agent-policy-gradients)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/counterfactual-multi-agent-policy-gradients/smac-on-smac-off-complicated-sequential)](https://paperswithcode.com/sota/smac-on-smac-off-complicated-sequential?p=counterfactual-multi-agent-policy-gradients)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/counterfactual-multi-agent-policy-gradients/smac-on-smac-off-distant-parallel)](https://paperswithcode.com/sota/smac-on-smac-off-distant-parallel?p=counterfactual-multi-agent-policy-gradients)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/counterfactual-multi-agent-policy-gradients/smac-on-smac-off-distant-sequential)](https://paperswithcode.com/sota/smac-on-smac-off-distant-sequential?p=counterfactual-multi-agent-policy-gradients)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/counterfactual-multi-agent-policy-gradients/smac-on-smac-off-hard-parallel)](https://paperswithcode.com/sota/smac-on-smac-off-hard-parallel?p=counterfactual-multi-agent-policy-gradients)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/counterfactual-multi-agent-policy-gradients/smac-on-smac-off-hard-sequential)](https://paperswithcode.com/sota/smac-on-smac-off-hard-sequential?p=counterfactual-multi-agent-policy-gradients)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/counterfactual-multi-agent-policy-gradients/smac-on-smac-def-outnumbered-parallel)](https://paperswithcode.com/sota/smac-on-smac-def-outnumbered-parallel?p=counterfactual-multi-agent-policy-gradients)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/counterfactual-multi-agent-policy-gradients/smac-on-smac-off-complicated-parallel)](https://paperswithcode.com/sota/smac-on-smac-off-complicated-parallel?p=counterfactual-multi-agent-policy-gradients)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/counterfactual-multi-agent-policy-gradients/smac-on-smac-off-near-parallel)](https://paperswithcode.com/sota/smac-on-smac-off-near-parallel?p=counterfactual-multi-agent-policy-gradients)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/counterfactual-multi-agent-policy-gradients/smac-on-smac-off-near-sequential)](https://paperswithcode.com/sota/smac-on-smac-off-near-sequential?p=counterfactual-multi-agent-policy-gradients)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/counterfactual-multi-agent-policy-gradients/smac-on-smac-def-outnumbered-sequential)](https://paperswithcode.com/sota/smac-on-smac-def-outnumbered-sequential?p=counterfactual-multi-agent-policy-gradients)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/counterfactual-multi-agent-policy-gradients/smac-on-smac-def-armored-parallel)](https://paperswithcode.com/sota/smac-on-smac-def-armored-parallel?p=counterfactual-multi-agent-policy-gradients)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/counterfactual-multi-agent-policy-gradients/smac-on-smac-def-infantry-parallel)](https://paperswithcode.com/sota/smac-on-smac-def-infantry-parallel?p=counterfactual-multi-agent-policy-gradients)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/counterfactual-multi-agent-policy-gradients/smac-on-smac-def-armored-sequential)](https://paperswithcode.com/sota/smac-on-smac-def-armored-sequential?p=counterfactual-multi-agent-policy-gradients)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/counterfactual-multi-agent-policy-gradients/smac-on-smac-def-infantry-sequential)](https://paperswithcode.com/sota/smac-on-smac-def-infantry-sequential?p=counterfactual-multi-agent-policy-gradients)`

Counterfactual Multi-Agent Policy Gradients

24 May 2017 · Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson ·

Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.

PDF Abstract

Code

Add Remove Mark official

opendilab/DI-engine

2,551

TonghanWang/NDQ

matteokarldonati/Counterfactual-Mul…

puyuan1996/MARL

hanhanAnderson/LSF-SAC

See all 6 implementations

Tasks

Add Remove

Autonomous Vehicles

counterfactual

SMAC+

Starcraft

Datasets

SMAC-Exp

Def_Infantry_sequential

Def_Outnumbered_sequential

Def_Infantry_parallel

Def_Armored_sequential

Def_Armored_parallel

Off_Hard_parallel

Off_Superhard_parallel

Def_Outnumbered_parallel

Off_Near_parallel

Off_Distant_parallel

Off_Complicated_parallel

Off_Superhard_sequential

Off_Hard_sequential

Off_Complicated_sequential

Off_Distant_sequential

Off_Near_sequential

Results from the Paper

Edit

Ranked #1 on SMAC+ on Off_Superhard_parallel

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
SMAC+	Def_Armored_parallel	COMA	Median Win Rate	0.0	# 6	Compare
SMAC+	Def_Armored_sequential	COMA	Median Win Rate	0.0	# 9	Compare
SMAC+	Def_Infantry_parallel	COMA	Median Win Rate	50.0	# 6	Compare
SMAC+	Def_Infantry_sequential	COMA	Median Win Rate	28.1	# 11	Compare
SMAC+	Def_Outnumbered_parallel	COMA	Median Win Rate	0.0	# 4	Compare
SMAC+	Def_Outnumbered_sequential	COMA	Median Win Rate	0.0	# 5	Compare
SMAC+	Off_Complicated_parallel	COMA	Median Win Rate	0.0	# 4	Compare
SMAC+	Off_Complicated_sequential	COMA	Median Win Rate	0.0	# 3	Compare
SMAC+	Off_Distant_parallel	COMA	Median Win Rate	0.0	# 3	Compare
SMAC+	Off_Distant_sequential	COMA	Median Win Rate	0.0	# 3	Compare
SMAC+	Off_Hard_parallel	COMA	Median Win Rate	0.0	# 3	Compare
SMAC+	Off_Hard_sequential	COMA	Median Win Rate	0.0	# 3	Compare
SMAC+	Off_Near_parallel	COMA	Median Win Rate	20.0	# 4	Compare
SMAC+	Off_Near_sequential	COMA	Median Win Rate	0.0	# 4	Compare
SMAC+	Off_Superhard_parallel	COMA	Median Win Rate	0.0	# 1	Compare
SMAC+	Off_Superhard_sequential	COMA	Median Win Rate	0.0	# 2	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Counterfactual Multi-Agent Policy Gradients

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove