How to measure deep uncertainty estimation performance and which models are naturally better at providing it

29 Sep 2021  ·  Ido Galil, Mohammed Dabbah, Ran El-Yaniv ·

When deployed for risk-sensitive tasks, deep neural networks (DNNs) must be equipped with an uncertainty estimation mechanism. This paper studies the relationship between deep architectures and their training regimes with their corresponding uncertainty estimation performance. We consider both in-distribution uncertainties ("aleatoric" or "epistemic") and class-out-of-distribution ones. Moreover, we consider some of the most popular estimation performance metrics previously proposed including AUROC, ECE, AURC, and coverage for selective accuracy constraint. We present a novel and comprehensive study carried out by evaluating the uncertainty performance of 484 deep ImageNet classification models. We identify numerous and previously unknown factors that affect uncertainty estimation and examine the relationships between the different metrics. We find that distillation-based training regimes consistently yield better uncertainty estimations than other training schemes such as vanilla training, pretraining on a larger dataset and adversarial training. We also provide strong empirical evidence showing that ViT is by far the most superior architecture in terms of uncertainty estimation performance, judging by any aspect, in both in-distribution and class-out-of-distribution scenarios. We learn various interesting facts along the way. Contrary to previous work, ECE does not necessarily worsen with an increase in the number of network parameters. Likewise, we discovered an unprecedented 99% top-1 selective accuracy at 47% coverage (and 95% top-1 accuracy at 80%) for a ViT model, whereas a competing EfficientNet-V2-XL cannot obtain these accuracy constraints at any level of coverage.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here