Search Results for author: Alan Ansell

Found 7 papers, 4 papers with code

MAD-G: Multilingual Adapter Generation for Efficient Cross-Lingual Transfer

no code implementations • Findings (EMNLP) 2021 • Alan Ansell, Edoardo Maria Ponti, Jonas Pfeiffer, Sebastian Ruder, Goran Glavaš, Ivan Vulić, Anna Korhonen

While offering (1) improved fine-tuning efficiency (by a factor of around 50 in our experiments), (2) a smaller parameter budget, and (3) increased language coverage, MAD-G remains competitive with more expensive methods for language-specific adapter training across the board.

Dependency Parsing named-entity-recognition +4

Paper
Add Code

Scaling Sparse Fine-Tuning to Large Language Models

2 code implementations • 29 Jan 2024 • Alan Ansell, Ivan Vulić, Hannah Sterz, Anna Korhonen, Edoardo M. Ponti

We experiment with instruction-tuning of LLMs on standard dataset mixtures, finding that SpIEL is often superior to popular parameter-efficient fine-tuning methods like LoRA (low-rank adaptation) in terms of performance and comparable in terms of run time.

Quantization

Paper
Code

Cross-Lingual Transfer with Target Language-Ready Task Adapters

no code implementations • 5 Jun 2023 • Marinela Parović, Alan Ansell, Ivan Vulić, Anna Korhonen

We address this mismatch by exposing the task adapter to the target language adapter during training, and empirically validate several variants of the idea: in the simplest form, we alternate between using the source and target language adapters during task adapter training, which can be generalized to cycling over any set of language adapters.

Zero-Shot Cross-Lingual Transfer

Paper
Add Code

Distilling Efficient Language-Specific Models for Cross-Lingual Transfer

1 code implementation • 2 Jun 2023 • Alan Ansell, Edoardo Maria Ponti, Anna Korhonen, Ivan Vulić

Specifically, we use a two-phase distillation approach, termed BiStil: (i) the first phase distils a general bilingual model from the MMT, while (ii) the second, task-specific phase sparsely fine-tunes the bilingual "student" model using a task-tuned variant of the original MMT as its "teacher".

Transfer Learning XLM-R +1