Cross-Lingual Natural Language Inference

16 papers with code • 4 benchmarks • 2 datasets

Using data and models available for one language for which ample such resources are available (e.g., English) to solve a natural language inference task in another, commonly more low-resource, language.

Libraries

Use these libraries to find Cross-Lingual Natural Language Inference models and implementations
2 papers
383

Most implemented papers

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

google-research/bert NAACL 2019

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

facebookresearch/InferSent EMNLP 2017

Many modern NLP systems rely on word embeddings, previously trained in an unsupervised manner on large corpora, as base features.

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

facebookresearch/LASER TACL 2019

We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts.

XNLI: Evaluating Cross-lingual Sentence Representations

facebookresearch/XLM EMNLP 2018

State-of-the-art natural language processing systems rely on supervision in the form of annotated data to learn competent models.

ByT5: Towards a token-free future with pre-trained byte-to-byte models

google-research/byt5 28 May 2021

Most widely-used pre-trained language models operate on sequences of tokens corresponding to word or subword units.

Better Fine-Tuning by Reducing Representational Collapse

pytorch/fairseq ICLR 2021

Although widely adopted, existing approaches for fine-tuning pre-trained language models have been shown to be unstable across hyper-parameter settings, motivating recent work on trust region methods.

Rethinking embedding coupling in pre-trained language models

PaddlePaddle/PaddleNLP ICLR 2021

We re-evaluate the standard practice of sharing weights between input and output embeddings in state-of-the-art pre-trained language models.

Language Embeddings for Typology and Cross-lingual Transfer Learning

DianDYu/language_embeddings ACL 2021

Cross-lingual language tasks typically require a substantial amount of annotated data or parallel translation data.

PARADISE: Exploiting Parallel Data for Multilingual Sequence-to-Sequence Pretraining

machelreid/paradise NAACL 2022

Despite the success of multilingual sequence-to-sequence pretraining, most existing approaches rely on monolingual corpora, and do not make use of the strong cross-lingual signal contained in parallel data.

Subword Mapping and Anchoring across Languages

georgevern/smala Findings (EMNLP) 2021

State-of-the-art multilingual systems rely on shared vocabularies that sufficiently cover all considered languages.