The Neural Hype and Comparisons Against Weak Baselines

Recently, the machine learning community paused in a moment of self-reflection. In a widely discussed paper at ICLR 2018, Sculley et al. wrote: "We observe that the rate of empirical advancement may not have been matched by consistent increase in the level of empirical rigor across the field as a whole." Their primary complaint is the development of a "research and publication culture that emphasizes wins" (emphasis in original), which typically means "demonstrating that a new method beats previous methods on a given task or benchmark". An apt description might be "leaderboard chasing"-and for many vision and NLP tasks, this isn't a metaphor. There are literally centralized leaderboards1 that track incremental progress, down to the fifth decimal point, some persisting over years, accumulating dozens of entries. Sculley et al. remind us that "the goal of science is not wins, but knowledge". The structure of the scientific enterprise today (pressure to publish, pace of progress, etc.) means that "winning" and "doing good science" are often not fully aligned. To wit, they cite a number of papers showing that recent advances in neural networks could very well be attributed to mundane issues like better hyperparameter optimization. Many results can't be reproduced, and some observed improvements might just be noise.

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Ad-Hoc Information Retrieval TREC Robust04 Anserini BM25+RM3 MAP 0.302 # 3
P@20 0.4012 # 8

Methods


No methods listed for this paper. Add relevant methods here