Browse SoTA > Methodology > Stochastic Optimization

Stochastic Optimization

109 papers with code · Methodology

Stochastic Optimization is the task of optimizing certain objective functional by generating and using stochastic random variables. Usually the Stochastic Optimization is an iterative process of generating random variables that progressively finds out the minima or the maxima of the objective functional. Stochastic Optimization is usually applied in the non-convex functional spaces where the usual deterministic optimization such as linear or quadratic programming or their variants cannot be used.

Source: ASOC: An Adaptive Parameter-free Stochastic Optimization Techinique for Continuous Variables

Benchmarks

Greatest papers with code

Revisiting Distributed Synchronous SGD

4 Apr 2016tensorflow/models

Distributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise introduced from asynchrony.

STOCHASTIC OPTIMIZATION

Reducing the variance in online optimization by transporting past gradients

NeurIPS 2019 google-research/google-research

While variance reduction methods have shown that reusing past gradients can be beneficial when there is a finite number of datapoints, they do not easily extend to the online setting.

STOCHASTIC OPTIMIZATION

Lookahead Optimizer: k steps forward, 1 step back

NeurIPS 2019 rwightman/pytorch-image-models

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms.

IMAGE CLASSIFICATION MACHINE TRANSLATION STOCHASTIC OPTIMIZATION

SGDR: Stochastic Gradient Descent with Warm Restarts

13 Aug 2016rwightman/pytorch-image-models

Partial warm restarts are also gaining popularity in gradient-based optimization to improve the rate of convergence in accelerated gradient schemes to deal with ill-conditioned functions.

EEG STOCHASTIC OPTIMIZATION

On the Variance of the Adaptive Learning Rate and Beyond

ICLR 2020 LiyuanLucasLiu/RAdam

The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam.

IMAGE CLASSIFICATION LANGUAGE MODELLING MACHINE TRANSLATION STOCHASTIC OPTIMIZATION

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

ICLR 2020 kaushaltrivedi/fast-bert

In this paper, we first study a principled layerwise adaptation strategy to accelerate training of deep neural networks using large mini-batches.

Ranked #10 on Question Answering on SQuAD1.1 dev (F1 metric)

QUESTION ANSWERING STOCHASTIC OPTIMIZATION

Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks

27 May 2019NVIDIA/OpenSeq2Seq

We propose NovoGrad, an adaptive stochastic gradient descent method with layer-wise gradient normalization and decoupled weight decay.

STOCHASTIC OPTIMIZATION

An Adaptive and Momental Bound Method for Stochastic Learning

27 Oct 2019jettify/pytorch-optimizer

The dynamic learning rate bounds are based on the exponential moving averages of the adaptive learning rates themselves, which smooth out unexpected large learning rates and stabilize the training of deep neural networks.

STOCHASTIC OPTIMIZATION

Adaptive Methods for Nonconvex Optimization

NeurIPS 2018 jettify/pytorch-optimizer

In this work, we provide a new analysis of such methods applied to nonconvex stochastic optimization problems, characterizing the effect of increasing minibatch size.

STOCHASTIC OPTIMIZATION

Quasi-hyperbolic momentum and Adam for deep learning

ICLR 2019 jettify/pytorch-optimizer

Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning.

STOCHASTIC OPTIMIZATION