1 code implementation • 31 Oct 2022 • Sebastian Gehrmann, Sebastian Ruder, Vitaly Nikolaev, Jan A. Botha, Michael Chavinda, Ankur Parikh, Clara Rivera
To address this lack of data, we create Table-to-Text in African languages (TaTa), the first large multilingual table-to-text dataset with a focus on African languages.
1 code implementation • 1 Oct 2022 • Parker Riley, Timothy Dozat, Jan A. Botha, Xavier Garcia, Dan Garrette, Jason Riesa, Orhan Firat, Noah Constant
We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation, a type of style-targeted translation.
no code implementations • ACL 2021 • Nicholas FitzGerald, Jan A. Botha, Daniel Gillick, Daniel M. Bikel, Tom Kwiatkowski, Andrew McCallum
We present an instance-based nearest neighbor approach to entity linking.
1 code implementation • EMNLP 2020 • Jan A. Botha, Zifei Shan, Daniel Gillick
We propose a new formulation for multilingual entity linking, where language-specific mentions resolve to a language-agnostic Knowledge Base.
Ranked #1 on Entity Disambiguation on Mewsli-9 (using extra training data)
no code implementations • EMNLP 2020 • Julian Michael, Jan A. Botha, Ian Tenney
The success of pretrained contextual encoders, such as ELMo and BERT, has brought a great deal of interest in what these models learn: do they, without explicit supervision, learn to encode meaningful notions of linguistic structure?
1 code implementation • EMNLP 2018 • Jan A. Botha, Manaal Faruqui, John Alex, Jason Baldridge, Dipanjan Das
Split and rephrase is the task of breaking down a sentence into shorter ones that together convey the same meaning.
1 code implementation • EMNLP 2017 • Jan A. Botha, Emily Pitler, Ji Ma, Anton Bakalov, Alex Salcianu, David Weiss, Ryan Mcdonald, Slav Petrov
We show that small and shallow feed-forward neural networks can achieve near state-of-the-art results on a range of unstructured and structured language processing tasks while being considerably cheaper in memory and computational requirements than deep recurrent models.
no code implementations • ACL 2016 • Jan Buys, Jan A. Botha
We propose a tagging model using Wsabie, a discriminative embedding-based model with rank-based learning.
no code implementations • 18 Aug 2015 • Jan A. Botha
We formulate a novel model that can learn discontiguous morphemes in addition to the more conventional contiguous morphemes that most previous models are limited to.
1 code implementation • 16 May 2014 • Jan A. Botha, Phil Blunsom
This paper presents a scalable method for integrating compositional morphological representations into a vector-based probabilistic language model.