Search Results for author: Anej Svete

Found 8 papers, 3 papers with code

Transformers Can Represent $n$-gram Language Models

no code implementations • 23 Apr 2024 • Anej Svete, Ryan Cotterell

This provides a first step towards understanding the mechanisms that transformer LMs can use to represent probability distributions over strings.

Paper
Add Code

The Role of $n$-gram Smoothing in the Age of Neural Networks

no code implementations • 25 Mar 2024 • Luca Malagutti, Andrius Buinovskij, Anej Svete, Clara Meister, Afra Amini, Ryan Cotterell

For nearly three decades, language models derived from the $n$-gram assumption held the state of the art on the task.

Language Modelling Machine Translation

Paper
Add Code

A Theoretical Result on the Inductive Bias of RNN Language Models

no code implementations • 24 Feb 2024 • Anej Svete, Robin Shing Moon Chan, Ryan Cotterell

However, a closer inspection of Hewitt et al.'s (2020) construction shows that it is not limited to hierarchical LMs, posing the question of what \emph{other classes} of LMs can be efficiently represented by RNNs.

Inductive Bias

Paper
Add Code

Formal Aspects of Language Modeling

no code implementations • 7 Nov 2023 • Ryan Cotterell, Anej Svete, Clara Meister, Tianyu Liu, Li Du

Large language models have become one of the most commonly deployed NLP inventions.

Language Modelling

Paper
Add Code

On the Representational Capacity of Recurrent Neural Language Models

1 code implementation • 19 Oct 2023 • Franz Nowak, Anej Svete, Li Du, Ryan Cotterell

We extend the Turing completeness result to the probabilistic case, showing how a rationally weighted RLM with unbounded computation time can simulate any deterministic probabilistic Turing machine (PTM) with rationally weighted transitions.

Paper
Code

Recurrent Neural Language Models as Probabilistic Finite-state Automata

1 code implementation • 8 Oct 2023 • Anej Svete, Ryan Cotterell

These results present a first step towards characterizing the classes of distributions RNN LMs can represent and thus help us understand their capabilities and limitations.

Paper
Code

A Geometric Notion of Causal Probing

no code implementations • 27 Jul 2023 • Clément Guerner, Anej Svete, Tianyu Liu, Alexander Warstadt, Ryan Cotterell

The linear subspace hypothesis (Bolukbasi et al., 2016) states that, in a language model's representation space, all information about a concept such as verbal number is encoded in a linear subspace.

counterfactual Language Modelling

Paper
Add Code

Algorithms for Acyclic Weighted Finite-State Automata with Failure Arcs

1 code implementation • 17 Jan 2023 • Anej Svete, Benjamin Dayan, Tim Vieira, Ryan Cotterell, Jason Eisner

The pathsum in ordinary acyclic WFSAs is efficiently computed by the backward algorithm in time $O(|E|)$, where $E$ is the set of transitions.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.