IMPALA

Introduced by Espeholt et al. in IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

IMPALA, or the Importance Weighted Actor Learner Architecture, is an off-policy actor-critic framework that decouples acting from learning and learns from experience trajectories using V-trace. Unlike the popular A3C-based agents, in which workers communicate gradients with respect to the parameters of the policy to a central parameter server, IMPALA actors communicate trajectories of experience (sequences of states, actions, and rewards) to a centralized learner. Since the learner in IMPALA has access to full trajectories of experience we use a GPU to perform updates on mini-batches of trajectories while aggressively parallelising all time independent operations.

This type of decoupled architecture can achieve very high throughput. However, because the policy used to generate a trajectory can lag behind the policy on the learner by several updates at the time of gradient calculation, learning becomes off-policy. The V-trace off-policy actor-critic algorithm is used to correct for this harmful discrepancy.

Source: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Reinforcement Learning (RL)	8	50.00%
Continuous Control	2	12.50%
Atari Games	2	12.50%
OpenAI Gym	2	12.50%
Image Captioning	1	6.25%
Edge-computing	1	6.25%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Convolution	Convolutions
Entropy Regularization	Regularization
Experience Replay	Replay Memory
Gradient Clipping	Optimization
LSTM	Recurrent Neural Networks
Max Pooling	Pooling Operations
ReLU	Activation Functions
Residual Connection	Skip Connections
RMSProp	Stochastic Optimization
V-trace	Value Function Estimation

Categories

Add Remove

Policy Gradient Methods

Distributed Reinforcement Learning

Distributed Methods