POPGym: Benchmarking Partially Observable Reinforcement Learning

3 Mar 2023  ยท  Steven Morad, Ryan Kortvelesy, Matteo Bettini, Stephan Liwicki, Amanda Prorok ยท

Real world applications of Reinforcement Learning (RL) are often partially observable, thus requiring memory. Despite this, partial observability is still largely ignored by contemporary RL benchmarks and libraries. We introduce Partially Observable Process Gym (POPGym), a two-part library containing (1) a diverse collection of 15 partially observable environments, each with multiple difficulties and (2) implementations of 13 memory model baselines -- the most in a single RL library. Existing partially observable benchmarks tend to fixate on 3D visual navigation, which is computationally expensive and only one type of POMDP. In contrast, POPGym environments are diverse, produce smaller observations, use less memory, and often converge within two hours of training on a consumer-grade GPU. We implement our high-level memory API and memory baselines on top of the popular RLlib framework, providing plug-and-play compatibility with various training algorithms, exploration strategies, and distributed training paradigms. Using POPGym, we execute the largest comparison across RL memory models to date. POPGym is available at https://github.com/proroklab/popgym.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Partially Observable Reinforcement Learning POPGym Gated Recurrent Unit MMER 0.349 # 1
Partially Observable Reinforcement Learning POPGym Differentiable Neural Computer MMER 0.065 # 9
Partially Observable Reinforcement Learning POPGym Elman Network MMER 0.249 # 3
Partially Observable Reinforcement Learning POPGym Independently Recurrent Neural Network MMER 0.259 # 2
Partially Observable Reinforcement Learning POPGym Legendre Memory Unit MMER 0.229 # 4
Partially Observable Reinforcement Learning POPGym Frame Stacking MMER 0.190 # 5
Partially Observable Reinforcement Learning POPGym Diagonal State Space Model MMER -0.180 # 11
Partially Observable Reinforcement Learning POPGym Fast Autoregressive Transformer MMER 0.138 # 6
Partially Observable Reinforcement Learning POPGym Fast Weight Programmer MMER 0.112 # 7
Partially Observable Reinforcement Learning POPGym Positional MLP MMER 0.064 # 10
Partially Observable Reinforcement Learning POPGym MLP MMER 0.067 # 8

Methods