no code implementations • Findings (EMNLP) 2021 • Alan Ansell, Edoardo Maria Ponti, Jonas Pfeiffer, Sebastian Ruder, Goran Glavaš, Ivan Vulić, Anna Korhonen
While offering (1) improved fine-tuning efficiency (by a factor of around 50 in our experiments), (2) a smaller parameter budget, and (3) increased language coverage, MAD-G remains competitive with more expensive methods for language-specific adapter training across the board.
2 code implementations • 29 Jan 2024 • Alan Ansell, Ivan Vulić, Hannah Sterz, Anna Korhonen, Edoardo M. Ponti
We experiment with instruction-tuning of LLMs on standard dataset mixtures, finding that SpIEL is often superior to popular parameter-efficient fine-tuning methods like LoRA (low-rank adaptation) in terms of performance and comparable in terms of run time.
no code implementations • 5 Jun 2023 • Marinela Parović, Alan Ansell, Ivan Vulić, Anna Korhonen
We address this mismatch by exposing the task adapter to the target language adapter during training, and empirically validate several variants of the idea: in the simplest form, we alternate between using the source and target language adapters during task adapter training, which can be generalized to cycling over any set of language adapters.
1 code implementation • 2 Jun 2023 • Alan Ansell, Edoardo Maria Ponti, Anna Korhonen, Ivan Vulić
Specifically, we use a two-phase distillation approach, termed BiStil: (i) the first phase distils a general bilingual model from the MMT, while (ii) the second, task-specific phase sparsely fine-tunes the bilingual "student" model using a task-tuned variant of the original MMT as its "teacher".
2 code implementations • ACL 2022 • Alan Ansell, Edoardo Maria Ponti, Anna Korhonen, Ivan Vulić
Both these masks can then be composed with the pretrained model.
1 code implementation • EACL 2021 • Alan Ansell, Felipe Bravo-Marquez, Bernhard Pfahringer
To avoid the "meaning conflation deficiency" of word embeddings, a number of models have aimed to embed individual word senses.