Word Embeddings
1105 papers with code • 0 benchmarks • 52 datasets
Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers.
Techniques for learning word embeddings can include Word2Vec, GloVe, and other neural network-based approaches that train on an NLP task such as language modeling or document classification.
( Image credit: Dynamic Word Embedding for Evolving Semantic Discovery )
Benchmarks
These leaderboards are used to track progress in Word Embeddings
Datasets
Subtasks
Latest papers
Bridging Vision and Language Spaces with Assignment Prediction
This paper introduces VLAP, a novel approach that bridges pretrained vision models and large language models (LLMs) to make frozen LLMs understand the visual world.
Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval
Deep quantization methods have shown high efficiency on large-scale image retrieval.
IITK at SemEval-2024 Task 1: Contrastive Learning and Autoencoders for Semantic Textual Relatedness in Multilingual Texts
This paper describes our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness.
BanglaAutoKG: Automatic Bangla Knowledge Graph Construction with Semantic Neural Graph Filtering
Knowledge Graphs (KGs) have proven essential in information processing and reasoning applications because they link related entities and give context-rich information, supporting efficient information retrieval and knowledge discovery; presenting information flow in a very effective manner.
Breaking the Silence Detecting and Mitigating Gendered Abuse in Hindi, Tamil, and Indian English Online Spaces
Online gender-based harassment is a widespread issue limiting the free expression and participation of women and marginalized genders in digital spaces.
DiLM: Distilling Dataset into Language Model for Text-level Dataset Distillation
To address this issue, we propose a novel text dataset distillation approach, called Distilling dataset into Language Model (DiLM), which trains a language model to generate informative synthetic training samples as text data, instead of directly optimizing synthetic samples.
Debiasing Sentence Embedders through Contrastive Word Pairs
It is problematic that most debiasing approaches are directly transferred from word embeddings, therefore these approaches fail to take into account the nonlinear nature of sentence embedders and the embeddings they produce.
SemRoDe: Macro Adversarial Training to Learn Representations That are Robust to Word-Level Attacks
Language models (LMs) are indispensable tools for natural language processing tasks, but their vulnerability to adversarial attacks remains a concern.
Projective Methods for Mitigating Gender Bias in Pre-trained Language Models
Mitigation of gender bias in NLP has a long history tied to debiasing static word embeddings.
Prescribing Large Language Models for Perioperative Care: What's The Right Dose for Pre-trained Models?
Adapting models further improved performance: (1) self-supervised finetuning by 3. 2% for AUROC and 1. 5% for AUPRC; (2) semi-supervised finetuning by 1. 8% for AUROC and 2% for AUPRC, compared to self-supervised finetuning; (3) foundational modelling by 3. 6% for AUROC and 2. 6% for AUPRC, compared to self-supervised finetuning.