[Re] Can gradient clipping mitigate label noise?

RC 2020 · David Mizrahi, Oğuz Kaan Yüksel, Aiday Marlen Kyzy ·

Reproduction study for Can gradient clipping mitigate label noise?

Scope of Reproducibility
The original paper proposes partially Huberised losses, which possess label noise robustness. The authors claim that there exist label noise scenarios that defeat Huberised but not partially Huberised losses, and that partially Huberised versions of existing losses perform well on real-world datasets subject to symmetric label noise.

Methodology
All the experiments described in the paper were fully re-implemented using NumPy, SciPy and PyTorch. The experiments on synthetic data were run on a CPU, while the deep learning experiments were run using a Nvidia RTX 2080 Ti GPU. Running the experimentation necessary to gain some insight on some of the network architectures used and reproducing the real-world experiments required over 550 GPU hours.

Results
Overall, our results mostly support the claims of the original paper. For the synthetic experiments, our results differ when using the exact values described in the paper, although they still support the main claim. After slightly modifying some of the experiment settings, our reproduced figures are nearly identical to the figures from the original paper. For the deep learning experiments, our results differ, with some of the baselines reaching a much higher accuracy on MNIST, CIFAR-10 and CIFAR-100. Nonetheless, with the help of an additional experiment, our results support the authorsʼ claim that partially Huberised losses perform well on real-world datasets subject to label noise.

What was easy
The original paper is well written and insightful, which made it fairly easy to implement the partially Huberised version of standard losses based on the information given. In addition, recreating the synthetic datasets used in two of the original paperʼs experiments was relatively straightforward.

What was difficult
Even though the authors were very detailed in their feedback, finding the exact hyperparameters used in the real-world experiments required many iterations of inquiry and experimentation. In addition, the CIFAR-10 and CIFAR-100 experiments can be difficult to reproduce due to the high number of experiment configurations, resulting in many training runs and a relatively high computational cost of over 550 GPU hours.

Communication with original authors
We contacted the authors on multiple occasions regarding some of the hyperparameters used in their experiments, to which they promptly replied with very detailed explanations.