no code implementations • EMNLP (MRL) 2021 • Asa Cooper Stickland, Iain Murray
Many recent works use ‘consistency regularisation’ to improve the generalisation of fine-tuned pre-trained models, both multilingual and English-only.
1 code implementation • 20 Nov 2023 • David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, Samuel R. Bowman
We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry.
2 code implementations • 21 Sep 2023 • Lukas Berglund, Meg Tong, Max Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak, Owain Evans
If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A".
1 code implementation • 1 Sep 2023 • Lukas Berglund, Asa Cooper Stickland, Mikita Balesni, Max Kaufmann, Meg Tong, Tomasz Korbak, Daniel Kokotajlo, Owain Evans
At test time, we assess whether the model can pass the test.
1 code implementation • 10 Oct 2022 • Asa Cooper Stickland, Sailik Sengupta, Jason Krone, Saab Mansour, He He
To benchmark the performance of pretrained multilingual language models, we construct noisy datasets covering five languages and four NLP tasks and observe a clear gap in the performance between clean and noisy data in the zero-shot cross-lingual setting.
Data Augmentation Pretrained Multilingual Language Models +1
1 code implementation • 23 May 2022 • Ahmet Üstün, Asa Cooper Stickland
We find that using PEFTs with a larger pre-trained model outperforms full fine-tuning with a smaller model, and for smaller training data sizes, PEFTs outperform full fine-tuning for the same pre-trained model.
no code implementations • WMT (EMNLP) 2021 • Asa Cooper Stickland, Alexandre Bérard, Vassilina Nikoulina
In this work we study the compositionality of language and domain adapters in the context of Machine Translation.
1 code implementation • NeurIPS 2020 • Xi-An Li, Asa Cooper Stickland, Yuqing Tang, Xiang Kong
As an extension of this framework, we propose a novel method to train one shared Transformer network for multilingual machine translation with different layer selection posteriors for each language pair.
no code implementations • 8 Jul 2020 • Asa Cooper Stickland, Iain Murray
Modern deep neural networks can produce badly calibrated predictions, especially when train and test distributions are mismatched.
no code implementations • EACL 2021 • Asa Cooper Stickland, Xi-An Li, Marjan Ghazvininejad
For BART we get the best performance by freezing most of the model parameters, and adding extra positional embeddings.
2 code implementations • 7 Feb 2019 • Asa Cooper Stickland, Iain Murray
Multi-task learning shares information between related tasks, sometimes reducing the number of parameters required.