Montezuma's Revenge
28 papers with code • 1 benchmarks • 1 datasets
Montezuma's Revenge is an ATARI 2600 Benchmark game that is known to be difficult to perform on for reinforcement learning algorithms. Solutions typically employ algorithms that incentivise environment exploration in different ways.
For the state-of-the art tables, please consult the parent Atari Games task.
( Image credit: Q-map )
Most implemented papers
Uncertainty-sensitive Learning and Planning with Ensembles
The former manifests itself through the use of value function, while the latter is powered by a tree search planner.
Learning Abstract Models for Strategic Exploration and Fast Reward Transfer
Model-based reinforcement learning (RL) is appealing because (i) it enables planning and thus more strategic exploration, and (ii) by decoupling dynamics from rewards, it enables fast transfer to new reward functions.
NovelD: A Simple yet Effective Exploration Criterion
We analyze NovelD thoroughly in MiniGrid and found that empirically it helps the agent explore the environment more uniformly with a focus on exploring beyond the boundary.
Open-Ended Reinforcement Learning with Neural Reward Functions
Inspired by the great success of unsupervised learning in Computer Vision and Natural Language Processing, the Reinforcement Learning community has recently started to focus more on unsupervised discovery of skills.
Cell-Free Latent Go-Explore
In this paper, we introduce Latent Go-Explore (LGE), a simple and general approach based on the Go-Explore paradigm for exploration in reinforcement learning (RL).
Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient
We consider a hybrid reinforcement learning setting (Hybrid RL), in which an agent has access to an offline dataset and the ability to collect experience via real-world online interaction.
Redeeming Intrinsic Rewards via Constrained Optimization
However, on easy exploration tasks, the agent gets distracted by intrinsic rewards and performs unnecessary exploration even when sufficient task (also called extrinsic) reward is available.
Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement Learning
We propose a new method for count-based exploration in high-dimensional state spaces.