Trust Region-Guided Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most popular deep reinforcement learning (RL) methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, as a model-free RL method, the success of PPO relies heavily on the effectiveness of its exploratory policy search... (read more)

Results in Papers With Code
(↓ scroll down to see all results)