Can gradient clipping mitigate label noise?

ICLR 2020 · Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar ·

Gradient clipping is a widely-used technique in the training of deep networks, and is generally motivated from an optimisation lens: informally, it controls the dynamics of iterates, thus enhancing the rate of convergence to a local minimum. This intuition has been made precise in a line of recent works, which show that suitable clipping can yield significantly faster convergence than vanilla gradient descent. In this paper, we propose a new lens for studying gradient clipping, namely, robustness: informally, one expects clipping to provide robustness to noise, since one does not overly trust any single sample. Surprisingly, we prove that for the common problem of label noise in classification, standard gradient clipping does not in general provide robustness. On the other hand, we show that a simple variant of gradient clipping is provably robust, and corresponds to suitably modifying the underlying loss function. This yields a simple, noise-robust alternative to the standard cross-entropy loss which performs well empirically.

PDF Abstract

Code

Add Remove Mark official

dmizr/phuber

Tasks

Add Remove

Datasets

CIFAR-10

CIFAR-100

MNIST

Reproducibility Reports

Jan 31 2021

[Re] Can gradient clipping mitigate label noise?

RC 2020 · David Mizrahi, Oğuz Kaan Yüksel, Aiday Marlen Kyzy

Overall, our results mostly support the claims of the original paper. For the synthetic experiments, our results differ when using the exact values described in the paper, although they still support the main claim. After slightly modifying some of the experiment settings, our reproduced figures are nearly identical to the figures from the original paper. For the deep learning experiments, our results differ, with some of the baselines reaching a much higher accuracy on MNIST, CIFAR-10 and CIFAR-100. Nonetheless, with the help of an additional experiment, our results support the authorsʼ claim that partially Huberised losses perform well on real-world datasets subject to label noise.

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

1x1 Convolution • Average Pooling • Batch Normalization • Bottleneck Residual Block • Convolution • Global Average Pooling • Gradient Clipping • Kaiming Initialization • Max Pooling • ReLU • Residual Block • Residual Connection • ResNet

Edit Social Preview

Can gradient clipping mitigate label noise?

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit