Search Results for author: Blake Woodworth

Found 26 papers, 6 papers with code

Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy

no code implementations • 7 Feb 2023 • Blake Woodworth, Konstantin Mishchenko, Francis Bach

We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy.

Paper
Add Code

Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays

1 code implementation • 15 Jun 2022 • Konstantin Mishchenko, Francis Bach, Mathieu Even, Blake Woodworth

The existing analysis of asynchronous stochastic gradient descent (SGD) degrades dramatically when any delay is large, giving the impression that performance depends primarily on the delay.

Paper
Code

Non-Convex Optimization with Certificates and Fast Rates Through Kernel Sums of Squares

no code implementations • 11 Apr 2022 • Blake Woodworth, Francis Bach, Alessandro Rudi

We consider potentially non-convex optimization problems, for which optimal rates of approximation depend on the dimension of the parameter space and the smoothness of the function to be optimized.

Paper
Add Code

A Stochastic Newton Algorithm for Distributed Convex Optimization

no code implementations • NeurIPS 2021 • Brian Bullins, Kumar Kshitij Patel, Ohad Shamir, Nathan Srebro, Blake Woodworth

We propose and analyze a stochastic Newton algorithm for homogeneous distributed stochastic convex optimization, where each machine can calculate stochastic gradients of the same population objective, as well as stochastic Hessian-vector products (products of an independent unbiased estimator of the Hessian of the population objective with arbitrary vectors), with many such stochastic computations performed between rounds of communication.

regression

Paper
Add Code

The Minimax Complexity of Distributed Optimization

no code implementations • 1 Sep 2021 • Blake Woodworth

In this setting, I analyze the theoretical properties of the popular Local Stochastic Gradient Descent (SGD) algorithm in convex setting, both for homogeneous and heterogeneous objectives.

Distributed Optimization

Paper
Add Code

A Field Guide to Federated Optimization

2 code implementations • 14 Jul 2021 • Jianyu Wang, Zachary Charles, Zheng Xu, Gauri Joshi, H. Brendan McMahan, Blaise Aguera y Arcas, Maruan Al-Shedivat, Galen Andrew, Salman Avestimehr, Katharine Daly, Deepesh Data, Suhas Diggavi, Hubert Eichner, Advait Gadhikar, Zachary Garrett, Antonious M. Girgis, Filip Hanzely, Andrew Hard, Chaoyang He, Samuel Horvath, Zhouyuan Huo, Alex Ingerman, Martin Jaggi, Tara Javidi, Peter Kairouz, Satyen Kale, Sai Praneeth Karimireddy, Jakub Konecny, Sanmi Koyejo, Tian Li, Luyang Liu, Mehryar Mohri, Hang Qi, Sashank J. Reddi, Peter Richtarik, Karan Singhal, Virginia Smith, Mahdi Soltanolkotabi, Weikang Song, Ananda Theertha Suresh, Sebastian U. Stich, Ameet Talwalkar, Hongyi Wang, Blake Woodworth, Shanshan Wu, Felix X. Yu, Honglin Yuan, Manzil Zaheer, Mi Zhang, Tong Zhang, Chunxiang Zheng, Chen Zhu, Wennan Zhu

Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection.

Federated Learning

654

Paper
Code

An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning

no code implementations • NeurIPS 2021 • Blake Woodworth, Nathan Srebro

We present and analyze an algorithm for optimizing smooth and convex or strongly convex objectives using minibatch stochastic gradient estimates.

Stochastic Optimization

Paper
Add Code

On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent

no code implementations • 19 Feb 2021 • Shahar Azulay, Edward Moroshko, Mor Shpigel Nacson, Blake Woodworth, Nathan Srebro, Amir Globerson, Daniel Soudry

Recent work has highlighted the role of initialization scale in determining the structure of the solutions that gradient methods converge to.

Inductive Bias

Paper
Add Code

The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication

no code implementations • 2 Feb 2021 • Blake Woodworth, Brian Bullins, Ohad Shamir, Nathan Srebro

We resolve the min-max complexity of distributed stochastic convex optimization (up to a log factor) in the intermittent communication setting, where $M$ machines work in parallel over the course of $R$ rounds of communication to optimize the objective, and during each round of communication, each machine may sequentially compute $K$ stochastic gradient estimates.

Paper
Add Code

Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy

no code implementations • NeurIPS 2020 • Edward Moroshko, Suriya Gunasekar, Blake Woodworth, Jason D. Lee, Nathan Srebro, Daniel Soudry

We provide a detailed asymptotic study of gradient flow trajectories and their implicit optimization bias when minimizing the exponential loss over "diagonal linear networks".

General Classification

Paper
Add Code

Minibatch vs Local SGD for Heterogeneous Distributed Learning

no code implementations • NeurIPS 2020 • Blake Woodworth, Kumar Kshitij Patel, Nathan Srebro

the average objective; and machines can only communicate intermittently.

Paper
Add Code

Mirrorless Mirror Descent: A Natural Derivation of Mirror Descent

no code implementations • 2 Apr 2020 • Suriya Gunasekar, Blake Woodworth, Nathan Srebro

We present a primal only derivation of Mirror Descent as a "partial" discretization of gradient flow on a Riemannian manifold where the metric tensor is the Hessian of the Mirror Descent potential.

Paper
Add Code

Kernel and Rich Regimes in Overparametrized Models

1 code implementation • 20 Feb 2020 • Blake Woodworth, Suriya Gunasekar, Jason D. Lee, Edward Moroshko, Pedro Savarese, Itay Golan, Daniel Soudry, Nathan Srebro

We provide a complete and detailed analysis for a family of simple depth-$D$ models that already exhibit an interesting and meaningful transition between the kernel and rich regimes, and we also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.

Paper
Code

Is Local SGD Better than Minibatch SGD?

no code implementations • ICML 2020 • Blake Woodworth, Kumar Kshitij Patel, Sebastian U. Stich, Zhen Dai, Brian Bullins, H. Brendan McMahan, Ohad Shamir, Nathan Srebro

We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method.

Distributed Optimization

Paper
Add Code

Lower Bounds for Non-Convex Stochastic Optimization

no code implementations • 5 Dec 2019 • Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan Srebro, Blake Woodworth

We lower bound the complexity of finding $\epsilon$-stationary points (with gradient norm at most $\epsilon$) using stochastic first-order methods.

Stochastic Optimization

Paper
Add Code

The gradient complexity of linear regression

no code implementations • 6 Nov 2019 • Mark Braverman, Elad Hazan, Max Simchowitz, Blake Woodworth

We investigate the computational complexity of several basic linear algebra primitives, including largest eigenvector computation and linear regression, in the computational model that allows access to the data via a matrix-vector product oracle.

regression

Paper
Add Code

Open Problem: The Oracle Complexity of Convex Optimization with Limited Memory

no code implementations • 1 Jul 2019 • Blake Woodworth, Nathan Srebro

We note that known methods achieving the optimal oracle complexity for first order convex optimization require quadratic memory, and ask whether this is necessary, and more broadly seek to characterize the minimax number of first order queries required to optimize a convex Lipschitz function subject to a memory constraint.

Paper
Add Code

Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis

1 code implementation • 21 Jun 2019 • Ryan Rogers, Aaron Roth, Adam Smith, Nathan Srebro, Om Thakkar, Blake Woodworth

We design a general framework for answering adaptive statistical queries that focuses on providing explicit confidence intervals along with point estimates.

valid

Paper
Code

Kernel and Rich Regimes in Overparametrized Models

1 code implementation • 13 Jun 2019 • Blake Woodworth, Suriya Gunasekar, Pedro Savarese, Edward Moroshko, Itay Golan, Jason Lee, Daniel Soudry, Nathan Srebro

A recent line of work studies overparametrized neural networks in the "kernel regime," i. e. when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS norm solution.

Paper
Code

The Complexity of Making the Gradient Small in Stochastic Convex Optimization

no code implementations • 13 Feb 2019 • Dylan J. Foster, Ayush Sekhari, Ohad Shamir, Nathan Srebro, Karthik Sridharan, Blake Woodworth

Notably, we show that in the global oracle/statistical learning model, only logarithmic dependence on smoothness is required to find a near-stationary point, whereas polynomial dependence on smoothness is necessary in the local stochastic oracle model.

Stochastic Optimization

Paper
Add Code

Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints

1 code implementation • 29 Jun 2018 • Andrew Cotter, Maya Gupta, Heinrich Jiang, Nathan Srebro, Karthik Sridharan, Serena Wang, Blake Woodworth, Seungil You

Classifiers can be trained with data-dependent constraints to satisfy fairness goals, reduce churn, achieve a targeted false positive rate, or other policy goals.

Fairness

299

Paper
Code

Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization

no code implementations • NeurIPS 2018 • Blake Woodworth, Jialei Wang, Adam Smith, Brendan Mcmahan, Nathan Srebro

We suggest a general oracle-based framework that captures different parallel stochastic optimization settings described by a dependency graph, and derive generic lower bounds in terms of this graph.

Stochastic Optimization

Paper
Add Code

The Everlasting Database: Statistical Validity at a Fair Price

no code implementations • NeurIPS 2018 • Blake Woodworth, Vitaly Feldman, Saharon Rosset, Nathan Srebro

The problem of handling adaptivity in data analysis, intentional or not, permeates a variety of fields, including test-set overfitting in ML challenges and the accumulation of invalid scientific discoveries.

Paper
Add Code

Implicit Regularization in Matrix Factorization

no code implementations • NeurIPS 2017 • Suriya Gunasekar, Blake Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro

We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix $X$ with gradient descent on a factorization of $X$.

Paper
Add Code

Learning Non-Discriminatory Predictors

no code implementations • 20 Feb 2017 • Blake Woodworth, Suriya Gunasekar, Mesrob I. Ohannessian, Nathan Srebro

We consider learning a predictor which is non-discriminatory with respect to a "protected attribute" according to the notion of "equalized odds" proposed by Hardt et al. [2016].

Attribute

Paper
Add Code

Tight Complexity Bounds for Optimizing Composite Objectives

no code implementations • NeurIPS 2016 • Blake Woodworth, Nathan Srebro

We provide tight upper and lower bounds on the complexity of minimizing the average of $m$ convex functions using gradient and prox oracles of the component functions.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.