OpenAI Gym
160 papers with code • 9 benchmarks • 3 datasets
An open-source toolkit from OpenAI that implements several Reinforcement Learning benchmarks including: classic control, Atari, Robotics and MuJoCo tasks.
(Description by Evolutionary learning of interpretable decision trees)
(Image Credit: OpenAI Gym)
Libraries
Use these libraries to find OpenAI Gym models and implementationsLatest papers with no code
Bridging Dimensions: Confident Reachability for High-Dimensional Controllers
Autonomous systems are increasingly implemented using end-to-end learning-based controllers.
Neural architecture impact on identifying temporally extended Reinforcement Learning tasks
In addition, motivated by recent developments in attention based video-classification models using Vision Transformer, we come up with an architecture based on Vision Transformer, for image-based RL domain too.
Optimizing with Low Budgets: a Comparison on the Black-box Optimization Benchmarking Suite and OpenAI Gym
BO-based algorithms are popular in the ML community, as they are used for hyperparameter optimization and more generally for algorithm configuration.
Implicit Sensing in Traffic Optimization: Advanced Deep Reinforcement Learning Techniques
The results unequivocally demonstrate that the DQN agent trained using the {\epsilon}-greedy policy significantly outperforms the one trained with the Boltzmann policy.
gym-saturation: Gymnasium environments for saturation provers (System description)
This work describes a new version of a previously published Python package - gym-saturation: a collection of OpenAI Gym environments for guiding saturation-style provers based on the given clause algorithm with reinforcement learning.
Attention Loss Adjusted Prioritized Experience Replay
Prioritized Experience Replay (PER) is a technical means of deep reinforcement learning by selecting experience samples with more knowledge quantity to improve the training rate of neural network.
Distributionally Robust Statistical Verification with Imprecise Neural Networks
A particularly challenging problem in AI safety is providing guarantees on the behavior of high-dimensional autonomous systems.
Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning
Offline reinforcement learning aims to utilize datasets of previously gathered environment-action interaction records to learn a policy without access to the real environment.
On Combining Expert Demonstrations in Imitation Learning via Optimal Transport
One of the key approaches to IL is to define a distance between agent and expert and to find an agent policy that minimizes that distance.
Scaling Distributed Multi-task Reinforcement Learning with Experience Sharing
Our research demonstrates that to achieve $\epsilon$-optimal policies for all $M$ tasks, a single agent using DistMT-LSVI needs to run a total number of episodes that is at most $\tilde{\mathcal{O}}({d^3H^6(\epsilon^{-2}+c_{\rm sep}^{-2})}\cdot M/N)$, where $c_{\rm sep}>0$ is a constant representing task separability, $H$ is the horizon of each episode, and $d$ is the feature dimension of the dynamics and rewards.