AMBER: Adaptive Multi-Batch Experience Replay for Continuous Action Control

12 Oct 2017 Seungyul Han Youngchul Sung

In this paper, a new adaptive multi-batch experience replay scheme is proposed for proximal policy optimization (PPO) for continuous action control. On the contrary to original PPO, the proposed scheme uses the batch samples of past policies as well as the current policy for the update for the next policy, where the number of the used past batches is adaptively determined based on the oldness of the past batches measured by the average importance sampling (IS) weight... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper

Entropy Regularization
Policy Gradient Methods
Experience Replay
Replay Memory