Prioritized Experience Replay

Introduced by Schaul et al. in Prioritized Experience Replay

Prioritized Experience Replay is a type of experience replay in reinforcement learning where we In more frequently replay transitions with high expected learning progress, as measured by the magnitude of their temporal-difference (TD) error. This prioritization can lead to a loss of diversity, which is alleviated with stochastic prioritization, and introduce bias, which can be corrected with importance sampling.

The stochastic sampling method interpolates between pure greedy prioritization and uniform random sampling. The probability of being sampled is ensured to be monotonic in a transition's priority, while guaranteeing a non-zero probability even for the lowest-priority transition. Concretely, define the probability of sampling transition $i$ as

$$P(i) = \frac{p_i^{\alpha}}{\sum_k p_k^{\alpha}}$$

where $p_i > 0$ is the priority of transition $i$. The exponent $\alpha$ determines how much prioritization is used, with $\alpha=0$ corresponding to the uniform case.

Prioritized replay introduces bias because it changes this distribution in an uncontrolled fashion, and therefore changes the solution that the estimates will converge to. We can correct this bias by using importance-sampling (IS) weights:

$$ w_{i} = \left(\frac{1}{N}\cdot\frac{1}{P\left(i\right)}\right)^{\beta} $$

that fully compensates for the non-uniform probabilities $P\left(i\right)$ if $\beta = 1$. These weights can be folded into the Q-learning update by using $w_{i}\delta_{i}$ instead of $\delta_{i}$ - weighted IS rather than ordinary IS. For stability reasons, we always normalize weights by $1/\max_{i}w_{i}$ so that they only scale the update downwards.

The two types of prioritization are proportional based, where $p_{i} = |\delta_{i}| + \epsilon$ and rank-based, where $p_{i} = \frac{1}{\text{rank}\left(i\right)}$, the latter where $\text{rank}\left(i\right)$ is the rank of transition $i$ when the replay memory is sorted according to |$\delta_{i}$|, For proportional based, hyperparameters used were $\alpha = 0.7$, $\beta_{0} = 0.5$. For the rank-based variant, hyperparameters used were $\alpha = 0.6$, $\beta_{0} = 0.4$.

Source: Prioritized Experience Replay

Latest Papers

PAPER DATE
Learning to Sample with Local and Global Contexts in Experience Replay Buffer
Youngmin OhKimin LeeJinwoo ShinEunho YangSung Ju Hwang
2020-07-14
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning
| Kimin LeeMichael LaskinAravind SrinivasPieter Abbeel
2020-07-09
Double Prioritized State Recycled Experience Replay
Fanchen BuDong Eui Chang
2020-07-08
The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning
Harm van SeijenHadi NekoeiEvan RacahSarath Chandar
2020-07-07
Distributed Uplink Beamforming in Cell-Free Networks Using Deep Reinforcement Learning
Firas FredjYasser Al-EryaniSetareh MaghsudiMohamed AkroutEkram Hossain
2020-06-26
Continuous Control for Searching and Planning with a Learned Model
Xuxi YangWerner DuvaudPeng Wei
2020-06-12
Balancing a CartPole System with Reinforcement Learning -- A Tutorial
Swagat Kumar
2020-06-08
Manipulating the Distributions of Experience used for Self-Play Learning in Expert Iteration
Dennis J. N. J. SoemersÉric PietteMatthew StephensonCameron Browne
2020-05-30
STDPG: A Spatio-Temporal Deterministic Policy Gradient Agent for Dynamic Routing in SDN
Juan ChenZhiwen XiaoHuanlai XingPenglin DaiShouxi LuoMuhammad Azhar Iqbal
2020-04-21
Dynamic Experience Replay
Jieliang LuoHui Li
2020-03-04
Fast Reinforcement Learning for Anti-jamming Communications
Pei-Gen YeYuan-Gen WangJin LiLiang Xiao
2020-02-13
Stacked Auto Encoder Based Deep Reinforcement Learning for Online Resource Scheduling in Large-Scale MEC Networks
Feibo JiangKezhi WangLi DongCunhua PanKun Yang
2020-01-24
Sample-based Distributional Policy Gradient
Rahul SinghKeuntaek LeeYongxin Chen
2020-01-08
Which Channel to Ask My Question? Personalized Customer Service RequestStream Routing using DeepReinforcement Learning
Zining LiuChong LongXiaolu LuZehong HuJie ZhangYafang Wang
2019-11-24
Placement Optimization of Aerial Base Stations with Deep Reinforcement Learning
Jin QiuJiangbin LyuLiqun Fu
2019-11-19
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
| Julian SchrittwieserIoannis AntonoglouThomas HubertKaren SimonyanLaurent SifreSimon SchmittArthur GuezEdward LockhartDemis HassabisThore GraepelTimothy LillicrapDavid Silver
2019-11-19
Deep Reinforcement Learning Based Dynamic Trajectory Control for UAV-assisted Mobile Edge Computing
Liang WangKezhi WangCunhua PanWei XuNauman AslamArumugam Nallanathan
2019-11-10
Task-Oriented Language Grounding for Language Input with Multiple Sub-Goals of Non-Linear Order
| Vladislav KurenkovBulat MaksudovAdil Khan
2019-10-27
Google Research Football: A Novel Reinforcement Learning Environment
| Karol KurachAnton RaichukPiotr StańczykMichał ZającOlivier BachemLasse EspeholtCarlos RiquelmeDamien VincentMarcin MichalskiOlivier BousquetSylvain Gelly
2019-07-25
Prioritized Guidance for Efficient Multi-Agent Reinforcement Learning Exploration
Qisheng WangQichao Wang
2019-07-18
Prioritized Sequence Experience Replay
Marc BrittainJosh BertramXuxi YangPeng Wei
2019-05-25
Generative Adversarial Imagination for Sample Efficient Deep Reinforcement Learning
Kacper Kielak
2019-04-30
TF-Replicator: Distributed Machine Learning for Researchers
| Peter BuchlovskyDavid BuddenDominik GreweChris JonesJohn AslanidesFrederic BesseAndy BrockAidan ClarkSergio Gómez ColmenarejoAedan PopeFabio ViolaDan Belov
2019-02-01
Macro action selection with deep reinforcement learning in StarCraft
| Sijia XuHongyu KuangZhi ZhuangRenjie HuYang LiuHuyang Sun
2018-12-02
An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution
| Rosanne LiuJoel LehmanPiero MolinoFelipe Petroski SuchEric FrankAlex SergeevJason Yosinski
2018-07-09
Deep Curiosity Search: Intra-Life Exploration Can Improve Performance on Challenging Deep Reinforcement Learning Problems
Christopher StantonJeff Clune
2018-06-01
Advances in Experience Replay
Tracy WanNeil Xu
2018-05-15
Distributed Distributional Deterministic Policy Gradients
| Gabriel Barth-MaronMatthew W. HoffmanDavid BuddenWill DabneyDan HorganDhruva TBAlistair MuldalNicolas HeessTimothy Lillicrap
2018-04-23
Distributed Prioritized Experience Replay
| Dan HorganJohn QuanDavid BuddenGabriel Barth-MaronMatteo HesselHado van HasseltDavid Silver
2018-03-02
ScreenerNet: Learning Self-Paced Curriculum for Deep Neural Networks
Tae-Hoon KimJonghyun Choi
2018-01-03
ViZDoom: DRQN with Prioritized Experience Replay, Double-Q Learning, & Snapshot Ensembling
Christopher SchulzeMarcus Schulze
2018-01-03
Rainbow: Combining Improvements in Deep Reinforcement Learning
| Matteo HesselJoseph ModayilHado van HasseltTom SchaulGeorg OstrovskiWill DabneyDan HorganBilal PiotMohammad AzarDavid Silver
2017-10-06
A novel DDPG method with prioritized experience replay
| Yuenan HouLifeng LiuQing WeiXudong XuChunlin Chen
2017-10-01
Prioritized Experience Replay
| Tom SchaulJohn QuanIoannis AntonoglouDavid Silver
2015-11-18

Tasks

TASK PAPERS SHARE
Atari Games 5 18.52%
Continuous Control 4 14.81%
Decision Making 2 7.41%
Game of Go 2 7.41%
Efficient Exploration 1 3.70%
Board Games 1 3.70%
Distributional Reinforcement Learning 1 3.70%
Chatbot 1 3.70%
Game of Chess 1 3.70%

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories