Search Results for author: Jan A. Botha

Found 13 papers, 6 papers with code

TaTa: A Multilingual Table-to-Text Dataset for African Languages

1 code implementation31 Oct 2022 Sebastian Gehrmann, Sebastian Ruder, Vitaly Nikolaev, Jan A. Botha, Michael Chavinda, Ankur Parikh, Clara Rivera

To address this lack of data, we create Table-to-Text in African languages (TaTa), the first large multilingual table-to-text dataset with a focus on African languages.

Data-to-Text Generation

FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation

1 code implementation1 Oct 2022 Parker Riley, Timothy Dozat, Jan A. Botha, Xavier Garcia, Dan Garrette, Jason Riesa, Orhan Firat, Noah Constant

We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation, a type of style-targeted translation.

Machine Translation Translation

Entity Linking in 100 Languages

1 code implementation EMNLP 2020 Jan A. Botha, Zifei Shan, Daniel Gillick

We propose a new formulation for multilingual entity linking, where language-specific mentions resolve to a language-agnostic Knowledge Base.

 Ranked #1 on Entity Disambiguation on Mewsli-9 (using extra training data)

Entity Disambiguation Entity Linking +2

Asking without Telling: Exploring Latent Ontologies in Contextual Representations

no code implementations EMNLP 2020 Julian Michael, Jan A. Botha, Ian Tenney

The success of pretrained contextual encoders, such as ELMo and BERT, has brought a great deal of interest in what these models learn: do they, without explicit supervision, learn to encode meaningful notions of linguistic structure?

Learning To Split and Rephrase From Wikipedia Edit History

1 code implementation EMNLP 2018 Jan A. Botha, Manaal Faruqui, John Alex, Jason Baldridge, Dipanjan Das

Split and rephrase is the task of breaking down a sentence into shorter ones that together convey the same meaning.

Sentence Split and Rephrase

Natural Language Processing with Small Feed-Forward Networks

1 code implementation EMNLP 2017 Jan A. Botha, Emily Pitler, Ji Ma, Anton Bakalov, Alex Salcianu, David Weiss, Ryan Mcdonald, Slav Petrov

We show that small and shallow feed-forward neural networks can achieve near state-of-the-art results on a range of unstructured and structured language processing tasks while being considerably cheaper in memory and computational requirements than deep recurrent models.

Cross-Lingual Morphological Tagging for Low-Resource Languages

no code implementations ACL 2016 Jan Buys, Jan A. Botha

We propose a tagging model using Wsabie, a discriminative embedding-based model with rank-based learning.

Morphological Tagging

Probabilistic Modelling of Morphologically Rich Languages

no code implementations18 Aug 2015 Jan A. Botha

We formulate a novel model that can learn discontiguous morphemes in addition to the more conventional contiguous morphemes that most previous models are limited to.

Language Modelling Machine Translation +3

Compositional Morphology for Word Representations and Language Modelling

1 code implementation16 May 2014 Jan A. Botha, Phil Blunsom

This paper presents a scalable method for integrating compositional morphological representations into a vector-based probabilistic language model.

Language Modelling Machine Translation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.