no code implementations • 23 Apr 2024 • Anej Svete, Ryan Cotterell
This provides a first step towards understanding the mechanisms that transformer LMs can use to represent probability distributions over strings.
no code implementations • 25 Mar 2024 • Luca Malagutti, Andrius Buinovskij, Anej Svete, Clara Meister, Afra Amini, Ryan Cotterell
For nearly three decades, language models derived from the $n$-gram assumption held the state of the art on the task.
no code implementations • 24 Feb 2024 • Anej Svete, Robin Shing Moon Chan, Ryan Cotterell
However, a closer inspection of Hewitt et al.'s (2020) construction shows that it is not limited to hierarchical LMs, posing the question of what \emph{other classes} of LMs can be efficiently represented by RNNs.
no code implementations • 7 Nov 2023 • Ryan Cotterell, Anej Svete, Clara Meister, Tianyu Liu, Li Du
Large language models have become one of the most commonly deployed NLP inventions.
1 code implementation • 19 Oct 2023 • Franz Nowak, Anej Svete, Li Du, Ryan Cotterell
We extend the Turing completeness result to the probabilistic case, showing how a rationally weighted RLM with unbounded computation time can simulate any deterministic probabilistic Turing machine (PTM) with rationally weighted transitions.
1 code implementation • 8 Oct 2023 • Anej Svete, Ryan Cotterell
These results present a first step towards characterizing the classes of distributions RNN LMs can represent and thus help us understand their capabilities and limitations.
no code implementations • 27 Jul 2023 • Clément Guerner, Anej Svete, Tianyu Liu, Alexander Warstadt, Ryan Cotterell
The linear subspace hypothesis (Bolukbasi et al., 2016) states that, in a language model's representation space, all information about a concept such as verbal number is encoded in a linear subspace.
1 code implementation • 17 Jan 2023 • Anej Svete, Benjamin Dayan, Tim Vieira, Ryan Cotterell, Jason Eisner
The pathsum in ordinary acyclic WFSAs is efficiently computed by the backward algorithm in time $O(|E|)$, where $E$ is the set of transitions.