Unsupervised Bitext Mining and Translation via Self-trained Contextual Embeddings

15 Oct 2020 Phillip Keung Julian Salazar Yichao Lu Noah A. Smith

We describe an unsupervised method to create pseudo-parallel corpora for machine translation (MT) from unaligned text. We use multilingual BERT to create source and target sentence embeddings for nearest-neighbor search and adapt the model via self-training... (read more)

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper