Language Model Components

Neural Cache

Introduced by Grave et al. in Improving Neural Language Models with a Continuous Cache

A Neural Cache, or a Continuous Cache, is a module for language modelling which stores previous hidden states in memory cells. They are then used as keys to retrieve their corresponding word, that is the next word. There is no transformation applied to the storage during writing and reading.

More formally it exploits the hidden representations $h_{t}$ to define a probability distribution over the words in the cache. As illustrated in the Figure, the cache stores pairs $\left(h_{i}, x_{i+1}\right)$ of a hidden representation, and the word which was generated based on this representation (the vector $h_{i}$ encodes the history $x_{i}, \dots, x_{1}$). At time $t$, we then define a probability distribution over words stored in the cache based on the stored hidden representations and the current one $h_{t}$ as:

$$ p_{cache}\left(w | h_{1\dots{t}}, x_{1\dots{t}}\right) \propto \sum^{t-1}_{i=1}\mathcal{1}_{\text{set}\left(w=x_{i+1}\right)} \exp\left(θ_{h}>h_{t}^{T}h_{i}\right) $$

where the scalar $\theta$ is a parameter which controls the flatness of the distribution. When $\theta$ is equal to zero, the probability distribution over the history is uniform, and the model is equivalent to a unigram cache model.

Source: Improving Neural Language Models with a Continuous Cache

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Language Modelling 2 50.00%
Quantization 1 25.00%
Translation 1 25.00%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories