Ruminating Word Representations with Random Noise Masking
We introduce a training method for better word representation and performance, which we call \textbf{GraVeR} (\textbf{Gra}dual \textbf{Ve}ctor \textbf{R}umination). The method is to gradually and iteratively add random noises and bias to word embeddings after training a model, and re-train the model from scratch but initialize with the noised word embeddings. Through the re-training process, some of noises can be compensated and other noises can be utilized to learn better representations. As a result, we can get word representations further fine-tuned and specialized in the task. On six text classification tasks, our method improves model performances with a large gap. When GraVeR is combined with other regularization techniques, it shows further improvements. Lastly, we investigate the usefulness of GraVeR for pretraining by training data.
PDF Abstract