Improving Neural Language Modeling via Adversarial Training

10 Jun 2019  ·  Dilin Wang, Chengyue Gong, Qiang Liu ·

Recently, substantial progress has been made in language modeling by using deep neural networks. However, in practice, large scale neural language models have been shown to be prone to overfitting. In this paper, we present a simple yet highly effective adversarial training mechanism for regularizing neural language models. The idea is to introduce adversarial noise to the output embedding layer while training the models. We show that the optimal adversarial noise yields a simple closed-form solution, thus allowing us to develop a simple and time efficient algorithm. Theoretically, we show that our adversarial mechanism effectively encourages the diversity of the embedding vectors, helping to increase the robustness of models. Empirically, we show that our method improves on the single model state-of-the-art results for language modeling on Penn Treebank (PTB) and Wikitext-2, achieving test perplexity scores of 46.01 and 38.07, respectively. When applied to machine translation, our method improves over various transformer-based translation baselines in BLEU scores on the WMT14 English-German and IWSLT14 German-English tasks.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Language Modelling Penn Treebank (Word Level) adversarial + AWD-LSTM-MoS + dynamic eval Validation perplexity 46.63 # 3
Test perplexity 46.01 # 5
Params 22M # 23
Language Modelling WikiText-103 AdvSoft (+ 4 layer QRNN + dynamic eval) Validation perplexity 27.2 # 26
Test perplexity 28.0 # 65
Language Modelling WikiText-2 adversarial + AWD-LSTM-MoS + dynamic eval Validation perplexity 40.27 # 4
Test perplexity 38.65 # 12
Number of params 35M # 12
Machine Translation WMT2014 English-German Transformer Big + adversarial MLE BLEU score 29.52 # 24
Hardware Burden None # 1
Operations per network pass None # 1

Methods


No methods listed for this paper. Add relevant methods here