Search Results for author: Jan A. Botha

Found 13 papers, 6 papers with code

TaTa: A Multilingual Table-to-Text Dataset for African Languages

1 code implementation • 31 Oct 2022 • Sebastian Gehrmann, Sebastian Ruder, Vitaly Nikolaev, Jan A. Botha, Michael Chavinda, Ankur Parikh, Clara Rivera

To address this lack of data, we create Table-to-Text in African languages (TaTa), the first large multilingual table-to-text dataset with a focus on African languages.

Data-to-Text Generation

167

Paper
Code

FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation

1 code implementation • 1 Oct 2022 • Parker Riley, Timothy Dozat, Jan A. Botha, Xavier Garcia, Dan Garrette, Jason Riesa, Orhan Firat, Noah Constant

We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation, a type of style-targeted translation.

Machine Translation Translation

Paper
Code

MOLEMAN: Mention-Only Linking of Entities with a Mention Annotation Network

no code implementations • ACL 2021 • Nicholas FitzGerald, Jan A. Botha, Daniel Gillick, Daniel M. Bikel, Tom Kwiatkowski, Andrew McCallum

We present an instance-based nearest neighbor approach to entity linking.

Entity Linking Entity Retrieval +1

Paper
Add Code

Entity Linking in 100 Languages

1 code implementation • EMNLP 2020 • Jan A. Botha, Zifei Shan, Daniel Gillick

We propose a new formulation for multilingual entity linking, where language-specific mentions resolve to a language-agnostic Knowledge Base.

Ranked #1 on Entity Disambiguation on Mewsli-9 (using extra training data)

Entity Disambiguation Entity Linking +2

Paper
Code

Asking without Telling: Exploring Latent Ontologies in Contextual Representations

no code implementations • EMNLP 2020 • Julian Michael, Jan A. Botha, Ian Tenney

The success of pretrained contextual encoders, such as ELMo and BERT, has brought a great deal of interest in what these models learn: do they, without explicit supervision, learn to encode meaningful notions of linguistic structure?

Paper
Add Code

Learning To Split and Rephrase From Wikipedia Edit History

1 code implementation • EMNLP 2018 • Jan A. Botha, Manaal Faruqui, John Alex, Jason Baldridge, Dipanjan Das

Split and rephrase is the task of breaking down a sentence into shorter ones that together convey the same meaning.

Sentence Split and Rephrase

Paper
Code

Natural Language Processing with Small Feed-Forward Networks

1 code implementation • EMNLP 2017 • Jan A. Botha, Emily Pitler, Ji Ma, Anton Bakalov, Alex Salcianu, David Weiss, Ryan Mcdonald, Slav Petrov

We show that small and shallow feed-forward neural networks can achieve near state-of-the-art results on a range of unstructured and structured language processing tasks while being considerably cheaper in memory and computational requirements than deep recurrent models.

Paper
Code

Cross-Lingual Morphological Tagging for Low-Resource Languages

no code implementations • ACL 2016 • Jan Buys, Jan A. Botha

We propose a tagging model using Wsabie, a discriminative embedding-based model with rank-based learning.

Morphological Tagging

Paper
Add Code

Probabilistic Modelling of Morphologically Rich Languages

no code implementations • 18 Aug 2015 • Jan A. Botha

We formulate a novel model that can learn discontiguous morphemes in addition to the more conventional contiguous morphemes that most previous models are limited to.

Language Modelling Machine Translation +3