Search Results for author: Anej Svete

Found 8 papers, 3 papers with code

Transformers Can Represent $n$-gram Language Models

no code implementations23 Apr 2024 Anej Svete, Ryan Cotterell

This provides a first step towards understanding the mechanisms that transformer LMs can use to represent probability distributions over strings.

The Role of $n$-gram Smoothing in the Age of Neural Networks

no code implementations25 Mar 2024 Luca Malagutti, Andrius Buinovskij, Anej Svete, Clara Meister, Afra Amini, Ryan Cotterell

For nearly three decades, language models derived from the $n$-gram assumption held the state of the art on the task.

Language Modelling Machine Translation

A Theoretical Result on the Inductive Bias of RNN Language Models

no code implementations24 Feb 2024 Anej Svete, Robin Shing Moon Chan, Ryan Cotterell

However, a closer inspection of Hewitt et al.'s (2020) construction shows that it is not limited to hierarchical LMs, posing the question of what \emph{other classes} of LMs can be efficiently represented by RNNs.

Inductive Bias

Formal Aspects of Language Modeling

no code implementations7 Nov 2023 Ryan Cotterell, Anej Svete, Clara Meister, Tianyu Liu, Li Du

Large language models have become one of the most commonly deployed NLP inventions.

Language Modelling

On the Representational Capacity of Recurrent Neural Language Models

1 code implementation19 Oct 2023 Franz Nowak, Anej Svete, Li Du, Ryan Cotterell

We extend the Turing completeness result to the probabilistic case, showing how a rationally weighted RLM with unbounded computation time can simulate any deterministic probabilistic Turing machine (PTM) with rationally weighted transitions.

Recurrent Neural Language Models as Probabilistic Finite-state Automata

1 code implementation8 Oct 2023 Anej Svete, Ryan Cotterell

These results present a first step towards characterizing the classes of distributions RNN LMs can represent and thus help us understand their capabilities and limitations.

A Geometric Notion of Causal Probing

no code implementations27 Jul 2023 Clément Guerner, Anej Svete, Tianyu Liu, Alexander Warstadt, Ryan Cotterell

The linear subspace hypothesis (Bolukbasi et al., 2016) states that, in a language model's representation space, all information about a concept such as verbal number is encoded in a linear subspace.

counterfactual Language Modelling

Algorithms for Acyclic Weighted Finite-State Automata with Failure Arcs

1 code implementation17 Jan 2023 Anej Svete, Benjamin Dayan, Tim Vieira, Ryan Cotterell, Jason Eisner

The pathsum in ordinary acyclic WFSAs is efficiently computed by the backward algorithm in time $O(|E|)$, where $E$ is the set of transitions.

Cannot find the paper you are looking for? You can Submit a new open access paper.