1 code implementation • 15 Jan 2020 • Mario S. Holubar, Marco A. Wiering
Different versions of two actor-critic learning algorithms are tested on this environment: Sampled Policy Gradient (SPG) and Proximal Policy Optimization (PPO).
OpenAI Gym reinforcement-learning +1