Cross-Lingual Document Classification

12 papers with code • 10 benchmarks • 2 datasets

Cross-lingual document classification refers to the task of using data and models available for one language for which ample such resources are available (e.g., English) to solve classification tasks in another, commonly low-resource, language.

Most implemented papers

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

facebookresearch/LASER TACL 2019

We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts.

ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

microsoft/DeepSpeed 4 Oct 2019

Large deep learning models offer significant accuracy gains, but training billions to trillions of parameters is challenging.

MultiFiT: Efficient Multi-lingual Language Model Fine-tuning

n-waves/multifit IJCNLP 2019

Pretrained language models are promising particularly for low-resource languages as they only require unlabelled data.

BilBOWA: Fast Bilingual Distributed Representations without Word Alignments

gouwsmeister/bilbowa 9 Oct 2014

We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data.

Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification

ccsasuke/adan TACL 2018

To tackle the sentiment classification problem in low-resource languages without adequate annotated data, we propose an Adversarial Deep Averaging Network (ADAN) to transfer the knowledge learned from labeled data on a resource-rich source language to low-resource languages where only unlabeled data exists.

A Corpus for Multilingual Document Classification in Eight Languages

facebookresearch/MLDoc LREC 2018

In addition, we have observed that the class prior distributions differ significantly between the languages.

Robust Cross-lingual Embeddings from Parallel Sentences

epfml/Bi-Sent2Vec 28 Dec 2019

Recent advances in cross-lingual word embeddings have primarily relied on mapping-based methods, which project pretrained word embeddings from different languages into a shared space through a linear transformation.

Multilingual Distributed Representations without Word Alignment

karlmoritz/bicvm 20 Dec 2013

Distributed representations of meaning are a natural way to encode covariance relationships between words and phrases in NLP.

Multilingual Models for Compositional Distributed Semantics

karlmoritz/bicvm ACL 2014

We present a novel technique for learning semantic representations, which extends the distributional hypothesis to multilingual data and joint-space embeddings.

Learning Crosslingual Word Embeddings without Bilingual Corpora

longdt219/xlingualemb EMNLP 2016

Crosslingual word embeddings represent lexical items from different languages in the same vector space, enabling transfer of NLP tools.