Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics

EMNLP 2017 · Zhe Zhao, Tao Liu, Shen Li, Bofang Li, Xiaoyong Du ·

The existing word representation methods mostly limit their information source to word co-occurrence statistics. In this paper, we introduce ngrams into four representation methods: SGNS, GloVe, PPMI matrix, and its SVD factorization. Comprehensive experiments are conducted on word analogy and similarity tasks. The results show that improved word representations are learned from ngram co-occurrence statistics. We also demonstrate that the trained ngram representations are useful in many aspects such as finding antonyms and collocations. Besides, a novel approach of building co-occurrence matrix is proposed to alleviate the hardware burdens brought by ngrams.

PDF Abstract