Search Results for author: Mikel Artetxe

Found 53 papers, 24 papers with code

PARADISE”:" Exploiting Parallel Data for Multilingual Sequence-to-Sequence Pretraining

no code implementations • RepL4NLP (ACL) 2022 • Machel Reid, Mikel Artetxe

Despite the success of multilingual sequence-to-sequence pretraining, most existing approaches rely on monolingual corpora and do not make use of the strong cross-lingual signal contained in parallel data.

Cross-Lingual Natural Language Inference Denoising +2

Paper
Add Code

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

no code implementations • 18 Apr 2024 • Aitor Ormazabal, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Deyu Fu, Donovan Ong, Eric Chen, Eugenie Lamprecht, Hai Pham, Isaac Ong, Kaloyan Aleksiev, Lei LI, Matthew Henderson, Max Bain, Mikel Artetxe, Nishant Relan, Piotr Padlewski, Qi Liu, Ren Chen, Samuel Phua, Yazheng Yang, Yi Tay, Yuqi Wang, Zhongkai Zhu, Zhihui Xie

On text benchmarks, Core not only performs competitively to other frontier models on a set of well-established benchmarks (e. g. MMLU, GSM8K) but also outperforms GPT4-0613 on human evaluation.

GSM8K Question Answering +2

Paper
Add Code

Latxa: An Open Language Model and Evaluation Suite for Basque

1 code implementation • 29 Mar 2024 • Julen Etxaniz, Oscar Sainz, Naiara Perez, Itziar Aldabe, German Rigau, Eneko Agirre, Aitor Ormazabal, Mikel Artetxe, Aitor Soroa

We introduce Latxa, a family of large language models for Basque ranging from 7 to 70 billion parameters.

Language Modelling Multiple-choice +1

Paper
Code

Gender-specific Machine Translation with Large Language Models

no code implementations • 6 Sep 2023 • Eduardo Sánchez, Pierre Andrews, Pontus Stenetorp, Mikel Artetxe, Marta R. Costa-jussà

While machine translation (MT) systems have seen significant improvements, it is still common for translations to reflect societal biases, such as gender bias.

coreference-resolution In-Context Learning +3

Paper
Add Code

The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants

1 code implementation • 31 Aug 2023 • Lucas Bandarkar, Davis Liang, Benjamin Muller, Mikel Artetxe, Satya Narayan Shukla, Donald Husa, Naman Goyal, Abhinandan Krishnan, Luke Zettlemoyer, Madian Khabsa

We use this dataset to evaluate the capabilities of multilingual masked language models (MLMs) and large language models (LLMs).

Cross-Lingual Transfer Machine Reading Comprehension +2

303

Paper
Code

Evaluation of Faithfulness Using the Longest Supported Subsequence

no code implementations • 23 Aug 2023 • Anirudh Mittal, Timo Schick, Mikel Artetxe, Jane Dwivedi-Yu

Our proposed metric demonstrates an 18% enhancement over the prevailing state-of-the-art metric for faithfulness on our dataset.

Question Answering

Paper
Add Code

Do Multilingual Language Models Think Better in English?

1 code implementation • 2 Aug 2023 • Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier Lopez de Lacalle, Mikel Artetxe

In this work, we introduce a new approach called self-translate, which overcomes the need of an external translation system by leveraging the few-shot translation capabilities of multilingual language models.

Common Sense Reasoning Cross-Lingual Natural Language Inference +6

Paper
Code

Improving Language Plasticity via Pretraining with Active Forgetting

no code implementations • NeurIPS 2023 • Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetorp, Sebastian Riedel, Mikel Artetxe

Pretrained language models (PLMs) are today the primary model for natural language processing.

Meta-Learning

Paper
Add Code

CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models

no code implementations • 23 May 2023 • Aitor Ormazabal, Mikel Artetxe, Eneko Agirre

Methods for adapting language models (LMs) to new tasks and domains have traditionally assumed white-box access to the model, and work by modifying its parameters.

Machine Translation

Paper
Add Code

Revisiting Machine Translation for Cross-lingual Classification

no code implementations • 23 May 2023 • Mikel Artetxe, Vedanuj Goswami, Shruti Bhosale, Angela Fan, Luke Zettlemoyer

Machine Translation (MT) has been widely used for cross-lingual classification, either by translating the test set into English and running inference with a monolingual model (translate-test), or translating the training set into the target languages and finetuning a multilingual model (translate-train).

Classification Cross-Lingual Transfer +2

Paper
Add Code

Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training

no code implementations • 20 Dec 2022 • Kelly Marchisio, Patrick Lewis, Yihong Chen, Mikel Artetxe

Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while keeping the transformer body frozen.

Cross-Lingual Transfer

Paper
Add Code

On the Role of Parallel Data in Cross-lingual Transfer Learning

no code implementations • 20 Dec 2022 • Machel Reid, Mikel Artetxe

While prior work has established that the use of parallel data is conducive for cross-lingual learning, it is unclear if the improvements come from the data itself, or if it is the modeling of parallel interactions that matters.

Cross-Lingual Transfer Transfer Learning +2

Paper
Add Code

Training Trajectories of Language Models Across Scales

1 code implementation • 19 Dec 2022 • Mengzhou Xia, Mikel Artetxe, Chunting Zhou, Xi Victoria Lin, Ramakanth Pasunuru, Danqi Chen, Luke Zettlemoyer, Ves Stoyanov

Why do larger language models demonstrate more desirable behaviors?

In-Context Learning Multiple-choice

Paper
Code

Don't Prompt, Search! Mining-based Zero-Shot Learning with Language Models

no code implementations • 26 Oct 2022 • Mozes van de Kar, Mengzhou Xia, Danqi Chen, Mikel Artetxe

Our results suggest that the success of prompting can partly be explained by the model being exposed to similar examples during pretraining, which can be directly retrieved through regular expressions.

Text Classification Text Infilling +2

Paper
Add Code

State-of-the-art generalisation research in NLP: A taxonomy and review

no code implementations • 6 Oct 2022 • Dieuwke Hupkes, Mario Giulianelli, Verna Dankers, Mikel Artetxe, Yanai Elazar, Tiago Pimentel, Christos Christodoulopoulos, Karim Lasri, Naomi Saphra, Arabella Sinclair, Dennis Ulmer, Florian Schottmann, Khuyagbaatar Batsuren, Kaiser Sun, Koustuv Sinha, Leila Khalatbari, Maria Ryskina, Rita Frieske, Ryan Cotterell, Zhijing Jin

We present a taxonomy for characterising and understanding generalisation research in NLP.

Paper
Add Code

Prompting ELECTRA: Few-Shot Learning with Discriminative Pre-Trained Models

1 code implementation • 30 May 2022 • Mengzhou Xia, Mikel Artetxe, Jingfei Du, Danqi Chen, Ves Stoyanov

In this work, we adapt prompt-based few-shot learning to ELECTRA and show that it outperforms masked language models in a wide range of tasks.

Few-Shot Learning Text Infilling

Paper
Code

On the Role of Bidirectionality in Language Model Pre-Training

no code implementations • 24 May 2022 • Mikel Artetxe, Jingfei Du, Naman Goyal, Luke Zettlemoyer, Ves Stoyanov

Prior work on language model pre-training has explored different architectures and learning objectives, but differences in data, hyperparameters and evaluation make a principled comparison difficult.

Language Modelling Text Infilling

Paper
Add Code

PoeLM: A Meter- and Rhyme-Controllable Language Model for Unsupervised Poetry Generation

1 code implementation • 24 May 2022 • Aitor Ormazabal, Mikel Artetxe, Manex Agirrezabal, Aitor Soroa, Eneko Agirre

During inference, we build control codes for the desired meter and rhyme scheme, and condition our language model on them to generate formal verse poetry.

Language Modelling valid

Paper
Code

Principled Paraphrase Generation with Parallel Corpora

1 code implementation • ACL 2022 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre

Round-trip Machine Translation (MT) is a popular choice for paraphrase generation, which leverages readily available parallel corpora for supervision.

Machine Translation Paraphrase Generation +1

Paper
Code

Multilingual Machine Translation with Hyper-Adapters

3 code implementations • 22 May 2022 • Christos Baziotis, Mikel Artetxe, James Cross, Shruti Bhosale

We find that hyper-adapters are more parameter efficient than regular adapters, reaching the same performance with up to 12 times less parameters.

Machine Translation Translation

Paper
Code

Lifting the Curse of Multilinguality by Pre-training Modular Transformers

no code implementations • NAACL 2022 • Jonas Pfeiffer, Naman Goyal, Xi Victoria Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe

Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages.

named-entity-recognition Named Entity Recognition +3

Paper
Add Code

OPT: Open Pre-trained Transformer Language Models

7 code implementations • 2 May 2022 • Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer

Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning.

Ranked #2 on Stereotypical Bias Analysis on CrowS-Pairs

Hate Speech Detection Language Modelling +1

6,387

Paper
Code

Does Corpus Quality Really Matter for Low-Resource Languages?

no code implementations • 15 Mar 2022 • Mikel Artetxe, Itziar Aldabe, Rodrigo Agerri, Olatz Perez-de-Viñaspre, Aitor Soroa

For instance, 66% of documents are rated as high-quality for EusCrawl, in contrast with <33% for both mC4 and CC100.

Representation Learning

Paper
Add Code

Efficient Language Modeling with Sparse all-MLP

no code implementations • 14 Mar 2022 • Ping Yu, Mikel Artetxe, Myle Ott, Sam Shleifer, Hongyu Gong, Ves Stoyanov, Xian Li

All-MLP architectures have attracted increasing interest as an alternative to attention-based models.

Ranked #17 on Question Answering on StoryCloze

Common Sense Reasoning In-Context Learning +4

Paper
Add Code

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

1 code implementation • 25 Feb 2022 • Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer

Large language models (LMs) are able to in-context learn -- perform a new task via inference alone by conditioning on a few input-label pairs (demonstrations) and making predictions for new inputs.

In-Context Learning

161

Paper
Code

Efficient Large Scale Language Modeling with Mixtures of Experts

no code implementations • 20 Dec 2021 • Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov

This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning.

Language Modelling

Paper
Add Code

Few-shot Learning with Multilingual Language Models

2 code implementations • 20 Dec 2021 • Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li

Large-scale generative language models such as GPT-3 are competitive few-shot learners.

Cross-Lingual Transfer Few-Shot Learning +5

29,260

Paper
Code

PARADISE: Exploiting Parallel Data for Multilingual Sequence-to-Sequence Pretraining

1 code implementation • NAACL 2022 • Machel Reid, Mikel Artetxe

Despite the success of multilingual sequence-to-sequence pretraining, most existing approaches rely on monolingual corpora, and do not make use of the strong cross-lingual signal contained in parallel data.

Cross-Lingual Natural Language Inference Denoising +2

Paper
Code

Beyond Offline Mapping: Learning Cross-lingual Word Embeddings through Context Anchoring

no code implementations • ACL 2021 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre

Recent research on cross-lingual word embeddings has been dominated by unsupervised mapping approaches that align monolingual embeddings.

Bilingual Lexicon Induction Cross-Lingual Word Embeddings +2

Paper
Add Code

Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining

no code implementations • ACL 2020 • Ivana Kvapilikova, Mikel Artetxe, Gorka Labaka, Eneko Agirre, Ondřej Bojar

Existing models of multilingual sentence embeddings require large parallel data resources which are not available for low-resource languages.

Language Modelling Parallel Corpus Mining +4

Paper
Add Code

Multilingual Autoregressive Entity Linking

1 code implementation • 23 Mar 2021 • Nicola De Cao, Ledell Wu, Kashyap Popat, Mikel Artetxe, Naman Goyal, Mikhail Plekhanov, Luke Zettlemoyer, Nicola Cancedda, Sebastian Riedel, Fabio Petroni

Moreover, in a zero-shot setting on languages with no training data at all, mGENRE treats the target language as a latent variable that is marginalized at prediction time.

Ranked #2 on Entity Disambiguation on Mewsli-9 (using extra training data)

Entity Disambiguation Entity Linking

738

Paper
Code

Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through Context Anchoring

no code implementations • 31 Dec 2020 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre

Recent research on cross-lingual word embeddings has been dominated by unsupervised mapping approaches that align monolingual embeddings.

Bilingual Lexicon Induction Cross-Lingual Word Embeddings +2

Paper
Add Code

Training Multilingual Machine Translation by Alternately Freezing Language-Specific Encoders-Decoders

no code implementations • 29 May 2020 • Carlos Escolano, Marta R. Costa-jussà, José A. R. Fonollosa, Mikel Artetxe

We propose a modular architecture of language-specific encoder-decoders that constitutes a multilingual machine translation system that can be incrementally extended to new languages without the need for retraining the existing system when adding new languages.

Machine Translation Natural Language Inference +2

Paper
Add Code

A Call for More Rigor in Unsupervised Cross-lingual Learning

no code implementations • ACL 2020 • Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre

We review motivations, definition, approaches, and methodology for unsupervised cross-lingual learning and call for a more rigorous position in each of them.

Cross-Lingual Word Embeddings Position +3

Paper
Add Code

Multilingual Machine Translation: Closing the Gap between Shared and Language-specific Encoder-Decoders

no code implementations • EACL 2021 • Carlos Escolano, Marta R. Costa-jussà, José A. R. Fonollosa, Mikel Artetxe

State-of-the-art multilingual machine translation relies on a universal encoder-decoder, which requires retraining the entire system to add new languages.

Machine Translation Translation

Paper
Add Code

Translation Artifacts in Cross-lingual Transfer Learning

1 code implementation • EMNLP 2020 • Mikel Artetxe, Gorka Labaka, Eneko Agirre

Both human and machine translation play a central role in cross-lingual transfer learning: many multilingual datasets have been created through professional translation services, and using machine translation to translate either the test set or the training set is a widely used transfer technique.

Cross-Lingual Transfer Machine Translation +3

Paper
Code

Do all Roads Lead to Rome? Understanding the Role of Initialization in Iterative Back-Translation

no code implementations • 28 Feb 2020 • Mikel Artetxe, Gorka Labaka, Noe Casas, Eneko Agirre

In this paper, we analyze the role that such initialization plays in iterative back-translation.

NMT Translation +1

Paper
Add Code

On the Cross-lingual Transferability of Monolingual Representations

6 code implementations • ACL 2020 • Mikel Artetxe, Sebastian Ruder, Dani Yogatama

This generalization ability has been attributed to the use of a shared subword vocabulary and joint training across multiple languages giving rise to deep multilingual abstractions.

Cross-Lingual Question Answering Language Modelling +1

174

Paper
Code

Contextualized Translations of Phrasal Verbs with Distributional Compositional Semantics and Monolingual Corpora

no code implementations • CL 2019 • Pablo Gamallo, Susana Sotelo, Jos{\'e} Ramom Pichel, Mikel Artetxe

The contextualization of meaning is carried out by means of distributional composition within a structured vector space with syntactic dependencies, and the bilingual space is created by means of transfer rules and a bilingual dictionary.

Translation Word Translation

Paper
Add Code

Bilingual Lexicon Induction through Unsupervised Machine Translation

1 code implementation • ACL 2019 • Mikel Artetxe, Gorka Labaka, Eneko Agirre

A recent research line has obtained strong results on bilingual lexicon induction by aligning independently trained word embeddings in two languages and using the resulting cross-lingual embeddings to induce word translation pairs through nearest neighbor or related retrieval methods.

Bilingual Lexicon Induction Language Modelling +6

227

Paper
Code

Analyzing the Limitations of Cross-lingual Word Embedding Mappings

no code implementations • ACL 2019 • Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa, Eneko Agirre

Recent research in cross-lingual word embeddings has almost exclusively focused on offline methods, which independently train word embeddings in different languages and map them to a shared space through linear transformations.

Bilingual Lexicon Induction Cross-Lingual Word Embeddings +1

Paper
Add Code

An Effective Approach to Unsupervised Machine Translation

1 code implementation • ACL 2019 • Mikel Artetxe, Gorka Labaka, Eneko Agirre

While machine translation has traditionally relied on large amounts of parallel corpora, a recent research line has managed to train both Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) systems using monolingual corpora only.

Ranked #1 on Unsupervised Machine Translation on WMT2014 English-German

NMT Translation +1

227

Paper
Code

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

13 code implementations • TACL 2019 • Mikel Artetxe, Holger Schwenk

We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts.

Ranked #1 on Cross-Lingual Bitext Mining on BUCC German-to-English

Cross-Lingual Bitext Mining Cross-Lingual Document Classification +6

3,519

Paper
Code

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings

9 code implementations • ACL 2019 • Mikel Artetxe, Holger Schwenk

Machine translation is highly sensitive to the size and quality of the training data, which has led to an increasing interest in collecting and filtering large parallel corpora.

Ranked #2 on Cross-Lingual Bitext Mining on BUCC French-to-English

Cross-Lingual Bitext Mining Machine Translation +5

3,519

Paper
Code

Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation

2 code implementations • CONLL 2018 • Mikel Artetxe, Gorka Labaka, Iñigo Lopez-Gazpio, Eneko Agirre

Following the recent success of word embeddings, it has been argued that there is no such thing as an ideal representation for words, as different models tend to capture divergent and often mutually incompatible aspects like semantics/syntax and similarity/relatedness.

Word Embeddings

Paper
Code

Unsupervised Statistical Machine Translation

3 code implementations • EMNLP 2018 • Mikel Artetxe, Gorka Labaka, Eneko Agirre

While modern machine translation has relied on large parallel corpora, a recent line of work has managed to train Neural Machine Translation (NMT) systems from monolingual corpora only (Artetxe et al., 2018c; Lample et al., 2018).

Ranked #3 on Machine Translation on WMT2014 French-English

Language Modelling NMT +2

641

Paper
Code

A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings

2 code implementations • ACL 2018 • Mikel Artetxe, Gorka Labaka, Eneko Agirre

Recent work has managed to learn cross-lingual word embeddings without parallel data by mapping monolingual embeddings to a shared space through adversarial training.

Cross-Lingual Word Embeddings Self-Learning +1

641

Paper
Code

Unsupervised Neural Machine Translation

2 code implementations • ICLR 2018 • Mikel Artetxe, Gorka Labaka, Eneko Agirre, Kyunghyun Cho

In spite of the recent success of neural machine translation (NMT) in standard benchmarks, the lack of large parallel corpora poses a major practical problem for many language pairs.

Ranked #6 on Machine Translation on WMT2015 English-German