Supervised Policy Update for Deep Reinforcement Learning

We propose a new sample-efficient methodology, called Supervised Policy Update (SPU), for deep reinforcement learning. Starting with data generated by the current policy, SPU formulates and solves a constrained optimization problem in the non-parameterized proximal policy space... (read more)

Results in Papers With Code
(↓ scroll down to see all results)