A3C, Asynchronous Advantage Actor Critic, is a policy gradient algorithm in reinforcement learning that maintains a policy $\pi\left(a_{t}\mid{s}_{t}; \theta\right)$ and an estimate of the value function $V\left(s_{t}; \theta_{v}\right)$. It operates in the forward view and uses a mix of $n$step returns to update both the policy and the valuefunction. The policy and the value function are updated after every $t_{\text{max}}$ actions or when a terminal state is reached. The update performed by the algorithm can be seen as $\nabla_{\theta{'}}\log\pi\left(a_{t}\mid{s_{t}}; \theta{'}\right)A\left(s_{t}, a_{t}; \theta, \theta_{v}\right)$ where $A\left(s_{t}, a_{t}; \theta, \theta_{v}\right)$ is an estimate of the advantage function given by:
$$\sum^{k1}_{i=0}\gamma^{i}r_{t+i} + \gamma^{k}V\left(s_{t+k}; \theta_{v}\right)  V\left(s_{t}; \theta_{v}\right)$$
where $k$ can vary from state to state and is upperbounded by $t_{max}$.
The critics in A3C learn the value function while multiple actors are trained in parallel and get synced with global parameters every so often. The gradients are accumulated as part of training for stability  this is like parallelized stochastic gradient descent.
Note that while the parameters $\theta$ of the policy and $\theta_{v}$ of the value function are shown as being separate for generality, we always share some of the parameters in practice. We typically use a convolutional neural network that has one softmax output for the policy $\pi\left(a_{t}\mid{s}_{t}; \theta\right)$ and one linear output for the value function $V\left(s_{t}; \theta_{v}\right)$, with all nonoutput layers shared.
Source: Asynchronous Methods for Deep Reinforcement LearningTASK  PAPERS  SHARE 

Atari Games  9  34.62% 
Decision Making  2  7.69% 
Autonomous Driving  2  7.69% 
Motion Planning  1  3.85% 
OpenAI Gym  1  3.85% 
Continuous Control  1  3.85% 
Multiagent Reinforcement Learning  1  3.85% 
Problem Decomposition  1  3.85% 
Visual Navigation  1  3.85% 
COMPONENT  TYPE 


Convolution

Convolutions  
Dense Connections

Feedforward Networks  
Entropy Regularization

Regularization  
RMSProp

Stochastic Optimization  (optional) 
Softmax

Output Functions 