Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule

Several papers argue that wide minima generalize better than narrow minima. In this paper, through detailed experiments that not only corroborate the generalization properties of wide minima, we also provide empirical evidence for a new hypothesis that the density of wide minima is likely lower than the density of narrow minima... (read more)

PDF Abstract ICLR 2021 PDF (under review) ICLR 2021 Abstract (under review)
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Machine Translation IWSLT2014 German-English MAT+Knee BLEU score 36.6 # 3
Machine Translation WMT2014 German-English MAT+Knee BLEU score 31.9 # 2

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet