Paper tables with annotated results for Proximal Policy Optimization and its Dynamic Version for Sequence Generation

Paper

Proximal Policy Optimization and its Dynamic Version for Sequence Generation

In sequence generation task, many works use policy gradient for model optimization to tackle the intractable backpropagation issue when maximizing the non-differentiable evaluation metrics or fooling the discriminator in adversarial learning. In this paper, we replace policy gradient with proximal policy optimization (PPO), which is a proved more efficient reinforcement learning algorithm, and propose a dynamic approach for PPO (PPO-dynamic). We demonstrate the efficacy of PPO and PPO-dynamic on conditional sequence generation tasks including synthetic experiment and chit-chat chatbot. The results show that PPO and PPO-dynamic can beat policy gradient by stability and performance.

PDF Paper record

Results in Papers With Code

(↓ scroll down to see all results)

Proximal Policy Optimization and its Dynamic Version for Sequence Generation

Reader Guidelines

Editor Guidelines