TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Meta Reinforcement Learning	3-Color-Choice	StDev-Until-Exploit	AUC	11436.83	# 1
Meta Reinforcement Learning	3-Color-Choice	StDev-Until-Exploit	Final Performance	43.9%	# 1
Meta Reinforcement Learning	3-Reacher	StDev-Until-Exploit	AUC	9626.29	# 1
Meta Reinforcement Learning	3-Reacher	StDev-Until-Exploit	Final Performance	58.9%	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/concurrent-meta-reinforcement-learning/meta-reinforcement-learning-on-3-color-choice)](https://paperswithcode.com/sota/meta-reinforcement-learning-on-3-color-choice?p=concurrent-meta-reinforcement-learning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/concurrent-meta-reinforcement-learning/meta-reinforcement-learning-on-3-reacher)](https://paperswithcode.com/sota/meta-reinforcement-learning-on-3-reacher?p=concurrent-meta-reinforcement-learning)`

Concurrent Meta Reinforcement Learning

7 Mar 2019 · Emilio Parisotto, Soham Ghosh, Sai Bhargav Yalamanchi, Varsha Chinnaobireddy, Yuhuai Wu, Ruslan Salakhutdinov ·

State-of-the-art meta reinforcement learning algorithms typically assume the setting of a single agent interacting with its environment in a sequential manner. A negative side-effect of this sequential execution paradigm is that, as the environment becomes more and more challenging, and thus requiring more interaction episodes for the meta-learner, it needs the agent to reason over longer and longer time-scales. To combat the difficulty of long time-scale credit assignment, we propose an alternative parallel framework, which we name "Concurrent Meta-Reinforcement Learning" (CMRL), that transforms the temporal credit assignment problem into a multi-agent reinforcement learning one. In this multi-agent setting, a set of parallel agents are executed in the same environment and each of these "rollout" agents are given the means to communicate with each other. The goal of the communication is to coordinate, in a collaborative manner, the most efficient exploration of the shared task the agents are currently assigned. This coordination therefore represents the meta-learning aspect of the framework, as each agent can be assigned or assign itself a particular section of the current task's state space. This framework is in contrast to standard RL methods that assume that each parallel rollout occurs independently, which can potentially waste computation if many of the rollouts end up sampling the same part of the state space. Furthermore, the parallel setting enables us to define several reward sharing functions and auxiliary losses that are non-trivial to apply in the sequential setting. We demonstrate the effectiveness of our proposed CMRL at improving over sequential methods in a variety of challenging tasks.

PDF Abstract

Code

Add Remove Mark official

impredicative/irc-rss-feed-bot

Tasks

Add Remove

Efficient Exploration

Meta-Learning

Meta Reinforcement Learning

Multi-agent Reinforcement Learning

reinforcement-learning

Reinforcement Learning (RL)

Datasets

10-Monty-Hall

Results from the Paper

Edit

Ranked #1 on Meta Reinforcement Learning on 3-Reacher

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Meta Reinforcement Learning	3-Color-Choice	StDev-Until-Exploit	AUC	11436.83	# 1	Compare
Meta Reinforcement Learning	3-Color-Choice	StDev-Until-Exploit	Final Performance	43.9%	# 1	Compare
Meta Reinforcement Learning	3-Reacher	StDev-Until-Exploit	AUC	9626.29	# 1	Compare
Meta Reinforcement Learning	3-Reacher	StDev-Until-Exploit	Final Performance	58.9%	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Concurrent Meta Reinforcement Learning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove