Momentumized, adaptive, dual averaged gradient

Introduced by Defazio et al. in Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

The MADGRAD method contains a series of modifications to the AdaGrad-DA method to improve its performance on deep learning optimization problems. It gives state-of-the-art generalization performance across a diverse set of problems, including those that Adam normally under-performs on.

Source: Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Stochastic Optimization

Momentumized, adaptive, dual averaged gradient

Papers

Usage Over Time

Components

Categories Edit Add Remove

Categories

Add Remove