TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Montezuma's Revenge	Atari 2600 Montezuma's Revenge	Rainbow (tuned)	Average Return (NoOp)	900	# 2
Montezuma's Revenge	Atari 2600 Montezuma's Revenge	Flare	Average Return (NoOp)	1668	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/reinforcement-learning-with-latent-flow-1/montezuma-s-revenge-on-atari-2600-montezuma-s)](https://paperswithcode.com/sota/montezuma-s-revenge-on-atari-2600-montezuma-s?p=reinforcement-learning-with-latent-flow-1)`

Reinforcement Learning with Latent Flow

NeurIPS 2021 · Wenling Shang, Xiaofei Wang, Aravind Srinivas, Aravind Rajeswaran, Yang Gao, Pieter Abbeel, Michael Laskin ·

Temporal information is essential to learning effective policies with Reinforcement Learning (RL). However, current state-of-the-art RL algorithms either assume that such information is given as part of the state space or, when learning from pixels, use the simple heuristic of frame-stacking to implicitly capture temporal information present in the image observations. This heuristic is in contrast to the current paradigm in video classification architectures, which utilize explicit encodings of temporal information through methods such as optical flow and two-stream architectures to achieve state-of-the-art performance. Inspired by leading video classification architectures, we introduce the Flow of Latents for Reinforcement Learning (Flare), a network architecture for RL that explicitly encodes temporal information through latent vector differences. We show that Flare (i) recovers optimal performance in state-based RL without explicit access to the state velocity, solely with positional state information, (ii) achieves state-of-the-art performance on pixel-based challenging continuous control tasks within the DeepMind control benchmark suite, namely quadruped walk, hopper hop, finger turn hard, pendulum swing, and walker run, and is the most sample efficient model-free pixel-based RL algorithm, outperforming the prior model-free state-of-the-art by 1.9X and 1.5X on the 500k and 1M step benchmarks, respectively, and (iv), when augmented over rainbow DQN, outperforms this state-of-the-art level baseline on 5 of 8 challenging Atari games at 100M time step benchmark.

PDF Abstract NeurIPS 2021 PDF NeurIPS 2021 Abstract

Code

Add Remove Mark official

WendyShang/flare official

WendyShang/dqn_zoo official

Tasks

Add Remove

Atari Games

Continuous Control

Montezuma's Revenge

Optical Flow Estimation

reinforcement-learning

Reinforcement Learning (RL)

Video Classification

Datasets

Arcade Learning Environment

Results from the Paper

Edit

Ranked #1 on Montezuma's Revenge on Atari 2600 Montezuma's Revenge

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Montezuma's Revenge	Atari 2600 Montezuma's Revenge	Rainbow (tuned)	Average Return (NoOp)	900	# 2		Compare
Montezuma's Revenge	Atari 2600 Montezuma's Revenge	Flare	Average Return (NoOp)	1668	# 1		Compare

Methods

Add Remove

Convolution • Dense Connections • DQN • Q-Learning

Edit Social Preview

Reinforcement Learning with Latent Flow

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove