Stochastic Re-weighted Gradient Descent via Distributionally Robust Optimization

15 Jun 2023  ·  Ramnath Kumar, Kushal Majmundar, Dheeraj Nagaraj, Arun Sai Suggala ·

We present Re-weighted Gradient Descent (RGD), a novel optimization technique that improves the performance of deep neural networks through dynamic sample importance weighting. Our method is grounded in the principles of distributionally robust optimization (DRO) with Kullback-Leibler divergence. RGD is simple to implement, computationally efficient, and compatible with widely used optimizers such as SGD and Adam. We demonstrate the broad applicability and impact of RGD by achieving state-of-the-art results on diverse benchmarks, including improvements of +0.7% (DomainBed), +1.44% (tabular classification), +1.94% (GLUE with BERT), and +1.01% (ImageNet-1K with ViT).

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods