Cross-Lingual Bitext Mining

5 papers with code • 4 benchmarks • 1 datasets

Cross-lingual bitext mining is the task of mining sentence pairs that are translations of each other from large text corpora.

Benchmarks

Add a Result

These leaderboards are used to track progress in Cross-Lingual Bitext Mining

Dataset	Best Model	Compare
BUCC German-to-English	Massively Multilingual Sentence Embeddings	See all
BUCC French-to-English	Massively Multilingual Sentence Embeddings	See all
BUCC Russian-to-English	Massively Multilingual Sentence Embeddings	See all
BUCC Chinese-to-English	Massively Multilingual Sentence Embeddings	See all

Libraries

Use these libraries to find Cross-Lingual Bitext Mining models and implementations

facebookresearch/LASER

2 papers

3,519

Datasets

BUCC

Most implemented papers

Most implemented Social Latest No code

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

facebookresearch/LASER • • TACL 2019

We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts.

Paper
Code

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings

facebookresearch/LASER • • ACL 2019

Machine translation is highly sensitive to the size and quality of the training data, which has led to an increasing interest in collecting and filtering large parallel corpora.

Paper
Code