Multilingual Word Embeddings
19 papers with code • 0 benchmarks • 0 datasets
Benchmarks
These leaderboards are used to track progress in Multilingual Word Embeddings
Most implemented papers
CS-Embed at SemEval-2020 Task 9: The effectiveness of code-switched word embeddings for sentiment analysis
The growing popularity and applications of sentiment analysis of social media posts has naturally led to sentiment analysis of posts written in multiple languages, a practice known as code-switching.
Debiasing Multilingual Word Embeddings: A Case Study of Three Indian Languages
In this paper, we advance the current state-of-the-art method for debiasing monolingual word embeddings so as to generalize well in a multilingual setting.
Improving Word Translation via Two-Stage Contrastive Learning
As Stage C1, we propose to refine standard cross-lingual linear maps between static word embeddings (WEs) via a contrastive learning objective; we also show how to integrate it into the self-learning procedure for even more refined cross-lingual maps.
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
Discriminative self-supervised learning allows training models on any random group of internet images, and possibly recover salient information that helps differentiate between the images.
Improving Word Translation via Two-Stage Contrastive Learning
At Stage C1, we propose to refine standard cross-lingual linear maps between static word embeddings (WEs) via a contrastive learning objective; we also show how to integrate it into the self-learning procedure for even more refined cross-lingual maps.
Improving Bilingual Lexicon Induction with Cross-Encoder Reranking
This crucial step is done via 1) creating a word similarity dataset, comprising positive word pairs (i. e., true translations) and hard negative pairs induced from the original CLWE space, and then 2) fine-tuning an mPLM (e. g., mBERT or XLM-R) in a cross-encoder manner to predict the similarity scores.
Language Embeddings Sometimes Contain Typological Generalizations
To what extent can neural network models learn generalizations about language structure, and how do we find out what they have learned?
OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining
Instead of pretraining multilingual language models from scratch, a more efficient method is to adapt existing pretrained language models (PLMs) to new languages via vocabulary extension and continued pretraining.