Existing approaches for learning word embeddings often assume there are sufficient occurrences for each word in the corpus, such that the representation of words can be accurately estimated from their contexts.
Our model family consists of a latent-variable generative model and a discriminative labeler.
Ranked #40 on Named Entity Recognition on CoNLL 2003 (English)
In this paper, we propose a novel deep neural network architecture, Speech2Vec, for learning fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the underlying spoken words, and are close to other vectors in the embedding space if their corresponding underlying spoken words are semantically similar.
In this study, we improve grammatical error detection by learning word embeddings that consider grammaticality and error patterns.
We present a pointwise mutual information (PMI)-based approach to formalize paraphrasability and propose a variant of PMI, called MIPA, for the paraphrase acquisition.
Learning word embeddings on large unlabeled corpus has been shown to be successful in improving many natural language tasks.
In this paper, we propose a low-rank coordinate descent approach to structured semidefinite programming with diagonal constraints.