First return then explore

27 Apr 2020Adrien EcoffetJoost HuizingaJoel LehmanKenneth O. StanleyJeff Clune

The promise of reinforcement learning is to solve complex sequential decision problems by specifying a high-level reward function only. However, RL algorithms struggle when, as is often the case, simple and intuitive rewards provide sparse and deceptive feedback... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Atari Games Atari 2600 Berzerk Go-Explore Score 197376 # 1
Atari Games Atari 2600 Bowling Go-Explore Score 260 # 2
Atari Games Atari 2600 Centipede Go-Explore Score 1422628 # 1
Atari Games Atari 2600 Freeway Go-Explore Score 34 # 1
Atari Games Atari 2600 Gravitar Go-Explore Score 7588 # 2
Atari Games Atari 2600 Montezuma's Revenge Go-Explore Score 43791 # 1
Atari Games Atari 2600 Pitfall! Go-Explore Score 6954 # 3
Atari Games Atari 2600 Private Eye Go-Explore Score 95756 # 1
Atari Games Atari 2600 Skiing Go-Explore Score -3660 # 2
Atari Games Atari 2600 Solaris Go-Explore Score 19671 # 2
Atari Games Atari 2600 Venture Go-Explore Score 2281 # 1

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet