Vtrace is an offpolicy actorcritic reinforcement learning algorithm that helps tackle the lag between when actions are generated by the actors and when the learner estimates the gradient. Consider a trajectory $\left(x_{t}, a_{t}, r_{t}\right)^{t=s+n}_{t=s}$ generated by the actor following some policy $\mu$. We can define the $n$steps Vtrace target for $V\left(x_{s}\right)$, our value approximation at state $x_{s}$ as:
$$ v_{s} = V\left(x_{s}\right) + \sum^{s+n1}_{t=s}\gamma^{ts}\left(\prod^{t1}_{i=s}c_{i}\right)\delta_{t}V $$
Where $\delta_{t}V = \rho_{t}\left(r_{t} + \gamma{V}\left(x_{t+1}\right)  V\left(x_{t}\right)\right)$ is a temporal difference algorithm for $V$, and $\rho_{t} = \text{min}\left(\bar{\rho}, \frac{\pi\left(a_{t}\mid{x_{t}}\right)}{\mu\left(a_{t}\mid{x_{t}}\right)}\right)$ and $c_{i} = \text{min}\left(\bar{c}, \frac{\pi\left(a_{t}\mid{x_{t}}\right)}{\mu\left(a_{t}\mid{x_{t}}\right)}\right)$ are truncated importance sampling weights. We assume that the truncation levels are such that $\bar{\rho} \geq \bar{c}$.
Source: IMPALA: Scalable Distributed DeepRL with Importance Weighted ActorLearner ArchitecturesTASK  PAPERS  SHARE 

Starcraft II  3  23.08% 
Atari Games  3  23.08% 
Starcraft  2  15.38% 
Continuous Control  2  15.38% 
OpenAI Gym  2  15.38% 
Imitation Learning  1  7.69% 
COMPONENT  TYPE 


🤖 No Components Found  You can add them if they exist; e.g. Mask RCNN uses RoIAlign 