Search Results for author: Donald Goldfarb

Found 19 papers, 4 papers with code

Layer-wise Adaptive Step-Sizes for Stochastic First-Order Methods for Deep Learning

no code implementations23 May 2023 Achraf Bahamou, Donald Goldfarb

We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization methods for minimizing empirical loss functions in deep learning, eliminating the need for the user to tune the learning rate (LR).

A Mini-Block Fisher Method for Deep Neural Networks

no code implementations8 Feb 2022 Achraf Bahamou, Donald Goldfarb, Yi Ren

Specifically, our method uses a block-diagonal approximation to the empirical Fisher matrix, where for each layer in the DNN, whether it is convolutional or feed-forward and fully connected, the associated diagonal block is itself block-diagonal and is composed of a large number of mini-blocks of modest size.

Second-order methods

Tensor Normal Training for Deep Learning Models

1 code implementation NeurIPS 2021 Yi Ren, Donald Goldfarb

Based on the so-called tensor normal (TN) distribution, we propose and analyze a brand new approximate natural gradient method, Tensor Normal Training (TNT), which like Shampoo, only requires knowledge of the shape of the training parameters.

Second-order methods

Kronecker-factored Quasi-Newton Methods for Deep Learning

no code implementations12 Feb 2021 Yi Ren, Achraf Bahamou, Donald Goldfarb

Several improvements to the methods in Goldfarb et al. (2020) are also proposed that can be applied to both MLPs and CNNs.

Second-order methods

Practical Quasi-Newton Methods for Training Deep Neural Networks

1 code implementation NeurIPS 2020 Donald Goldfarb, Yi Ren, Achraf Bahamou

We consider the development of practical stochastic quasi-Newton, and in particular Kronecker-factored block-diagonal BFGS and L-BFGS methods, for training deep neural networks (DNNs).

A Dynamic Sampling Adaptive-SGD Method for Machine Learning

no code implementations31 Dec 2019 Achraf Bahamou, Donald Goldfarb

We also propose an adaptive version of ADAM that eliminates the need to tune the base learning rate and compares favorably to fine-tuned ADAM on training DNNs.

BIG-bench Machine Learning Stochastic Optimization

Efficient Subsampled Gauss-Newton and Natural Gradient Methods for Training Neural Networks

no code implementations5 Jun 2019 Yi Ren, Donald Goldfarb

We present practical Levenberg-Marquardt variants of Gauss-Newton and natural gradient methods for solving non-convex optimization problems that arise in training deep neural networks involving enormous numbers of variables and huge data sets.

Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models: Extension

1 code implementation NeurIPS 2019 Yunfei Teng, Wenbo Gao, Francois Chalus, Anna Choromanska, Donald Goldfarb, Adrian Weller

Finally, we implement an asynchronous version of our algorithm and extend it to the multi-leader setting, where we form groups of workers, each represented by its own local leader (the best performer in a group), and update each worker with a corrective direction comprised of two attractive forces: one to the local, and one to the global leader (the best performer among all workers).

Distributed Optimization

Increasing Iterate Averaging for Solving Saddle-Point Problems

no code implementations26 Mar 2019 Yuan Gao, Christian Kroer, Donald Goldfarb

In particular, the increasing averages consistently outperform the uniform averages in all test problems by orders of magnitude.

Image Denoising

Stochastic Adaptive Quasi-Newton Methods for Minimizing Expected Values

no code implementations ICML 2017 Chaoxu Zhou, Wenbo Gao, Donald Goldfarb

We propose a novel class of stochastic, adaptive methods for minimizing self-concordant functions which can be expressed as an expected value.

Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization

no code implementations5 Jul 2016 Xiao Wang, Shiqian Ma, Donald Goldfarb, Wei Liu

In this paper we study stochastic quasi-Newton methods for nonconvex stochastic optimization, where we assume that noisy information about the gradients of the objective function is available via a stochastic first-order oracle (SFO).

Binary Classification General Classification +1

Scalable Robust Matrix Recovery: Frank-Wolfe Meets Proximal Methods

no code implementations29 Mar 2014 Cun Mu, Yuqian Zhang, John Wright, Donald Goldfarb

Recovering matrices from compressive and grossly corrupted observations is a fundamental problem in robust statistics, with rich applications in computer vision and machine learning.

Robust Low-rank Tensor Recovery: Models and Algorithms

no code implementations24 Nov 2013 Donald Goldfarb, Zhiwei Qin

Robust tensor recovery plays an instrumental role in robustifying tensor decompositions for multilinear data analysis against outliers, gross corruptions and missing values and has a diverse array of applications.

Efficient Algorithms for Robust and Stable Principal Component Pursuit Problems

no code implementations26 Sep 2013 Necdet Serhat Aybat, Donald Goldfarb, Shiqian Ma

Moreover, if the observed data matrix has also been corrupted by a dense noise matrix in addition to gross sparse error, then the stable principal component pursuit (SPCP) problem is solved to recover the low-rank matrix.

Optimization and Control

Square Deal: Lower Bounds and Improved Relaxations for Tensor Recovery

no code implementations22 Jul 2013 Cun Mu, Bo Huang, John Wright, Donald Goldfarb

The most popular convex relaxation of this problem minimizes the sum of the nuclear norms of the unfoldings of the tensor.

Fast First-Order Methods for Stable Principal Component Pursuit

no code implementations11 May 2011 Necdet Serhat Aybat, Donald Goldfarb, Garud Iyengar

The stable principal component pursuit (SPCP) problem is a non-smooth convex optimization problem, the solution of which has been shown both in theory and in practice to enable one to recover the low rank and sparse components of a matrix whose elements have been corrupted by Gaussian noise.

Optimization and Control

Fast Alternating Linearization Methods for Minimizing the Sum of Two Convex Functions

no code implementations23 Dec 2009 Donald Goldfarb, Shiqian Ma, Katya Scheinberg

We present in this paper first-order alternating linearization algorithms based on an alternating direction augmented Lagrangian approach for minimizing the sum of two convex functions.

Fixed Point and Bregman Iterative Methods for Matrix Rank Minimization

1 code implementation11 May 2009 Shiqian Ma, Donald Goldfarb, Lifeng Chen

The tightest convex relaxation of this problem is the linearly constrained nuclear norm minimization.

Optimization and Control Information Theory Information Theory

Cannot find the paper you are looking for? You can Submit a new open access paper.