Word Embeddings

1107 papers with code • 0 benchmarks • 52 datasets

Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers.

Techniques for learning word embeddings can include Word2Vec, GloVe, and other neural network-based approaches that train on an NLP task such as language modeling or document classification.

( Image credit: Dynamic Word Embedding for Evolving Semantic Discovery )

Most implemented papers

A Latent Variable Model Approach to PMI-based Word Embeddings

PrincetonML/SemanticVector TACL 2016

Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods.

Adversarial Training Methods for Semi-Supervised Text Classification

tensorflow/models 25 May 2016

We extend adversarial and virtual adversarial training to the text domain by applying perturbations to the word embeddings in a recurrent neural network rather than to the original input itself.

Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change

williamleif/histwords ACL 2016

Understanding how words change their meanings over time is key to models of language and cultural evolution, but historical data on meaning is scarce, making theories hard to develop and test.

An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation

jhlau/doc2vec WS 2016

Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings.

How to evaluate word embeddings? On importance of data efficiency and simple supervised tasks

kudkudak/word-embeddings-benchmarks 7 Feb 2017

Maybe the single most important goal of representation learning is making subsequent learning faster.

Improving Lexical Choice in Neural Machine Translation

tnq177/improving_lexical_choice_in_nmt NAACL 2018

We explore two solutions to the problem of mistranslating rare words in neural machine translation.

Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data

beamandrew/cui2vec 4 Apr 2018

Word embeddings are a popular approach to unsupervised learning of word relationships that are widely used in natural language processing.

Unsupervised Abstractive Meeting Summarization with Multi-Sentence Compression and Budgeted Submodular Maximization

dascim/acl2018_abssumm ACL 2018

We introduce a novel graph-based framework for abstractive meeting speech summarization that is fully unsupervised and does not rely on any annotations.

BioSentVec: creating sentence embeddings for biomedical texts

ncbi-nlp/BioSentVec 22 Oct 2018

Sentence embeddings have become an essential part of today's natural language processing (NLP) systems, especially together advanced deep learning methods.