How To Prevent the Continuous Damage of Noises To Model Training?

CVPR 2023 · Xiaotian Yu, Yang Jiang, Tianqi Shi, Zunlei Feng, Yuexuan Wang, Mingli Song, Li Sun ·

Deep learning with noisy labels is challenging and inevitable in many circumstances. Existing methods reduce the impact of noise samples by reducing loss weights of uncertain samples or by filtering out potential noise samples, which highly rely on the model's superior discriminative power for identifying noise samples. However, in the training stage, the trainee model is imperfect will miss many noise samples, which cause continuous damage to the model training. Consequently, there is a large performance gap between existing anti-noise models trained with noisy samples and models trained with clean samples. In this paper, we put forward a Gradient Switching Strategy (GSS) to prevent the continuous damage of noise samples to the classifier. Theoretical analysis shows that the damage comes from the misleading gradient direction computed from the noise samples. The trainee model will deviate from the correct optimization direction under the influence of the accumulated misleading gradient of noise samples. To address this problem, the proposed GSS alleviates the damage by switching the current gradient direction of each sample to a new direction selected from a gradient direction pool, which contains all-class gradient directions with different probabilities. During training, the trainee model is optimized along switched gradient directions generated by GSS, which assigns higher probabilities to potential principal directions for high-confidence samples. Conversely, uncertain samples have a relatively uniform probability distribution for all gradient directions, which can cancel out the misleading gradient directions. Extensive experiments show that a model trained with GSS can achieve comparable performance with a model trained with clean data. Moreover, the proposed GSS is pluggable for existing frameworks for noisy-label learning. This work can provide a new perspective for future noisy-label learning.

PDF Abstract