Word Embeddings
1107 papers with code • 0 benchmarks • 52 datasets
Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers.
Techniques for learning word embeddings can include Word2Vec, GloVe, and other neural network-based approaches that train on an NLP task such as language modeling or document classification.
( Image credit: Dynamic Word Embedding for Evolving Semantic Discovery )
Benchmarks
These leaderboards are used to track progress in Word Embeddings
Datasets
Subtasks
Most implemented papers
SuperTML: Two-Dimensional Word Embedding for the Precognition on Structured Tabular Data
Tabular data is the most commonly used form of data in industry.
A Latent Variable Model Approach to PMI-based Word Embeddings
Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods.
Adversarial Training Methods for Semi-Supervised Text Classification
We extend adversarial and virtual adversarial training to the text domain by applying perturbations to the word embeddings in a recurrent neural network rather than to the original input itself.
Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change
Understanding how words change their meanings over time is key to models of language and cultural evolution, but historical data on meaning is scarce, making theories hard to develop and test.
An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation
Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings.
How to evaluate word embeddings? On importance of data efficiency and simple supervised tasks
Maybe the single most important goal of representation learning is making subsequent learning faster.
Improving Lexical Choice in Neural Machine Translation
We explore two solutions to the problem of mistranslating rare words in neural machine translation.
Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data
Word embeddings are a popular approach to unsupervised learning of word relationships that are widely used in natural language processing.
Unsupervised Abstractive Meeting Summarization with Multi-Sentence Compression and Budgeted Submodular Maximization
We introduce a novel graph-based framework for abstractive meeting speech summarization that is fully unsupervised and does not rely on any annotations.
BioSentVec: creating sentence embeddings for biomedical texts
Sentence embeddings have become an essential part of today's natural language processing (NLP) systems, especially together advanced deep learning methods.