A Highway Layer contains an information highway to other layers that helps with information flow. It is characterised by the use of a gating unit to help this information flow.
A plain feedforward neural network typically consists of $L$ layers where the $l$th layer ($l \in ${$1, 2, \dots, L$}) applies a nonlinear transform $H$ (parameterized by $\mathbf{W_{H,l}}$) on its input $\mathbf{x_{l}}$ to produce its output $\mathbf{y_{l}}$. Thus, $\mathbf{x_{1}}$ is the input to the network and $\mathbf{y_{L}}$ is the network’s output. Omitting the layer index and biases for clarity,
$$ \mathbf{y} = H\left(\mathbf{x},\mathbf{W_{H}}\right) $$
$H$ is usually an affine transform followed by a nonlinear activation function, but in general it may take other forms.
For a highway network, we additionally define two nonlinear transforms $T\left(\mathbf{x},\mathbf{W_{T}}\right)$ and $C\left(\mathbf{x},\mathbf{W_{C}}\right)$ such that:
$$ \mathbf{y} = H\left(\mathbf{x},\mathbf{W_{H}}\right)·T\left(\mathbf{x},\mathbf{W_{T}}\right) + \mathbf{x}·C\left(\mathbf{x},\mathbf{W_{C}}\right)$$
We refer to T as the transform gate and C as the carry gate, since they express how much of the output is produced by transforming the input and carrying it, respectively. In the original paper, the authors set $C = 1 − T$, giving:
$$ \mathbf{y} = H\left(\mathbf{x},\mathbf{W_{H}}\right)·T\left(\mathbf{x},\mathbf{W_{T}}\right) + \mathbf{x}·\left(1T\left(\mathbf{x},\mathbf{W_{T}}\right)\right)$$
The authors set:
$$ T\left(x\right) = \sigma\left(\mathbf{W_{T}}^{T}\mathbf{x} + \mathbf{b_{T}}\right) $$
Image: SikHo Tsang
Source:TASK  PAPERS  SHARE 

Speech Synthesis  26  35.14% 
TextToSpeech Synthesis  11  14.86% 
Language Modelling  5  6.76% 
Speech Recognition  4  5.41% 
Voice Conversion  2  2.70% 
Expressive Speech Synthesis  2  2.70% 
Variational Inference  1  1.35% 
Speaker Verification  1  1.35% 
MultiTask Learning  1  1.35% 
COMPONENT  TYPE 


Sigmoid Activation

Activation Functions 