Why Are Bootstrapped Deep Ensembles Not Better?

NeurIPS Workshop ICBINB 2020 · Jeremy Nixon, Balaji Lakshminarayanan, Dustin Tran ·

Ensemble methods have consistently reached state of the art across predictive, uncertainty, and out-of-distribution robustness benchmarks. One of the most popular ways to construct an ensemble is to independently train each model on are sampled (bootstrapped) version of the dataset. Bootstrapping is popular in the literature on decision trees and frequentist statistics, with strong theoretical guarantees, but it is not used often in practice for deep neural networks. We investigate a common hypothesis for bootstrap’s weak performance—percentage of unique points in the subsampled dataset—and find that even when adjusting for it, boot-strap ensembles of deep neural networks yield no benefit over simpler baselines.This brings to question the role of data randomization as a source of uncertainty in deep learning.

PDF Abstract