Methods > Reinforcement Learning > Value Function Estimation

V-trace is an off-policy actor-critic reinforcement learning algorithm that helps tackle the lag between when actions are generated by the actors and when the learner estimates the gradient. Consider a trajectory $\left(x_{t}, a_{t}, r_{t}\right)^{t=s+n}_{t=s}$ generated by the actor following some policy $\mu$. We can define the $n$-steps V-trace target for $V\left(x_{s}\right)$, our value approximation at state $x_{s}$ as:

$$ v_{s} = V\left(x_{s}\right) + \sum^{s+n-1}_{t=s}\gamma^{t-s}\left(\prod^{t-1}_{i=s}c_{i}\right)\delta_{t}V $$

Where $\delta_{t}V = \rho_{t}\left(r_{t} + \gamma{V}\left(x_{t+1}\right) - V\left(x_{t}\right)\right)$ is a temporal difference algorithm for $V$, and $\rho_{t} = \text{min}\left(\bar{\rho}, \frac{\pi\left(a_{t}\mid{x_{t}}\right)}{\mu\left(a_{t}\mid{x_{t}}\right)}\right)$ and $c_{i} = \text{min}\left(\bar{c}, \frac{\pi\left(a_{t}\mid{x_{t}}\right)}{\mu\left(a_{t}\mid{x_{t}}\right)}\right)$ are truncated importance sampling weights. We assume that the truncation levels are such that $\bar{\rho} \geq \bar{c}$.

Source: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Latest Papers

PAPER DATE
An Introduction of mini-AlphaStar
| Ruo-Ze LiuWenhai WangYanjie ShenZhiqi LiYang YuTong Lu
2021-04-14
Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
Sajad KhodadadianZaiwei ChenSiva Theja Maguluri
2021-02-18
A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants
Zaiwei ChenSiva Theja MaguluriSanjay ShakkottaiKarthikeyan Shanmugam
2021-02-02
TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game
| Lei HanJiechao XiongPeng SunXinghai SunMeng FangQingwei GuoQiaobo ChenTengfei ShiHongsheng YuZhengyou Zhang
2020-11-27
Adaptive Discretization for Continuous Control using Particle Filtering Policy Network
| Pei XuIoannis Karamouzas
2020-03-16
A Self-Tuning Actor-Critic Algorithm
Tom ZahavyZhongwen XuVivek VeeriahMatteo HesselJunhyuk OhHado van HasseltDavid SilverSatinder Singh
2020-02-28
Finite-Sample Analysis of Contractive Stochastic Approximation Using Smooth Convex Envelopes
Zaiwei ChenSiva Theja MaguluriSanjay ShakkottaiKarthikeyan Shanmugam
2020-02-03
IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks
Michael LuoJiahao YaoRichard LiawEric LiangIon Stoica
2019-11-30
TorchBeast: A PyTorch Platform for Distributed RL
| Heinrich KüttlerNantas NardelliThibaut LavrilMarco SelvaticiViswanath SivakumarTim RocktäschelEdward Grefenstette
2019-10-08
Off-Policy Actor-Critic with Shared Experience Replay
Simon SchmittMatteo HesselKaren Simonyan
2019-09-25
Importance Resampling for Off-policy Prediction
| Matthew SchlegelWesley ChungDaniel GravesJian QianMartha White
2019-06-11
Towards Combining On-Off-Policy Methods for Real-World Applications
Kai-Chun HuChen-Huan PiTing Han WeiI-Chen WuStone ChengYi-Wei DaiWei-Yuan Ye
2019-04-24
AlphaStar: An Evolutionary Computation Perspective
| Kai ArulkumaranAntoine CullyJulian Togelius
2019-02-05
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
| Lasse EspeholtHubert SoyerRemi MunosKaren SimonyanVolodymir MnihTom WardYotam DoronVlad FiroiuTim HarleyIain DunningShane LeggKoray Kavukcuoglu
2018-02-05

Tasks

TASK PAPERS SHARE
Starcraft II 3 23.08%
Atari Games 3 23.08%
Starcraft 2 15.38%
Continuous Control 2 15.38%
OpenAI Gym 2 15.38%
Imitation Learning 1 7.69%

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories