TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
MuJoCo Games	Ant	PEMIRL	Average Return	846.18	# 2
MuJoCo Games	Point Maze	PEMIRL	Average Return	-7.37	# 1
MuJoCo Games	Sawyer Pusher	PEMIRL	Average Return	-27.16	# 1
MuJoCo Games	Sweeper	PEMIRL	Average Return	-74.17	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/meta-inverse-reinforcement-learning-with/mujoco-games-on-point-maze)](https://paperswithcode.com/sota/mujoco-games-on-point-maze?p=meta-inverse-reinforcement-learning-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/meta-inverse-reinforcement-learning-with/mujoco-games-on-sawyer-pusher)](https://paperswithcode.com/sota/mujoco-games-on-sawyer-pusher?p=meta-inverse-reinforcement-learning-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/meta-inverse-reinforcement-learning-with/mujoco-games-on-sweeper)](https://paperswithcode.com/sota/mujoco-games-on-sweeper?p=meta-inverse-reinforcement-learning-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/meta-inverse-reinforcement-learning-with/mujoco-games-on-ant)](https://paperswithcode.com/sota/mujoco-games-on-ant?p=meta-inverse-reinforcement-learning-with)`

Meta-Inverse Reinforcement Learning with Probabilistic Context Variables

NeurIPS 2019 · Lantao Yu, Tianhe Yu, Chelsea Finn, Stefano Ermon ·

Providing a suitable reward function to reinforcement learning can be difficult in many real world applications. While inverse reinforcement learning (IRL) holds promise for automatically learning reward functions from demonstrations, several major challenges remain. First, existing IRL methods learn reward functions from scratch, requiring large numbers of demonstrations to correctly infer the reward for each task the agent may need to perform. Second, existing methods typically assume homogeneous demonstrations for a single behavior or task, while in practice, it might be easier to collect datasets of heterogeneous but related behaviors. To this end, we propose a deep latent variable model that is capable of learning rewards from demonstrations of distinct but related tasks in an unsupervised way. Critically, our model can infer rewards for new, structurally-similar tasks from a single demonstration. Our experiments on multiple continuous control tasks demonstrate the effectiveness of our approach compared to state-of-the-art imitation and inverse reinforcement learning methods.

PDF Abstract NeurIPS 2019 PDF NeurIPS 2019 Abstract

Code

Add Remove Mark official

ermongroup/MetaIRL official

Tasks

Add Remove

Continuous Control

reinforcement-learning

Reinforcement Learning (RL)

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Edit

Ranked #1 on MuJoCo Games on Sawyer Pusher

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
MuJoCo Games	Ant	PEMIRL	Average Return	846.18	# 2	Compare
MuJoCo Games	Point Maze	PEMIRL	Average Return	-7.37	# 1	Compare
MuJoCo Games	Sawyer Pusher	PEMIRL	Average Return	-27.16	# 1	Compare
MuJoCo Games	Sweeper	PEMIRL	Average Return	-74.17	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Meta-Inverse Reinforcement Learning with Probabilistic Context Variables

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove