Search Results for author: Jingzhao Zhang

Found 30 papers, 5 papers with code

Complexity of Finding Stationary Points of Nonconvex Nonsmooth Functions

no code implementations • ICML 2020 • Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Suvrit Sra, Ali Jadbabaie

Therefore, we introduce the notion of (delta, epsilon)-stationarity, a generalization that allows for a point to be within distance delta of an epsilon-stationary point and reduces to epsilon-stationarity for smooth functions.

Paper
Add Code

Efficient Sampling on Riemannian Manifolds via Langevin MCMC

no code implementations • 15 Feb 2024 • Xiang Cheng, Jingzhao Zhang, Suvrit Sra

We study the task of efficiently sampling from a Gibbs distribution $d \pi^* = e^{-h} d {vol}_g$ over a Riemannian manifold $M$ via (geometric) Langevin MCMC; this algorithm involves computing exponential maps in random Gaussian directions and is efficiently implementable in practice.

Paper
Add Code

A Quadratic Synchronization Rule for Distributed Deep Learning

1 code implementation • 22 Oct 2023 • Xinran Gu, Kaifeng Lyu, Sanjeev Arora, Jingzhao Zhang, Longbo Huang

In distributed deep learning with data parallelism, synchronizing gradients at each training step can cause a huge communication overhead, especially when many nodes work together to train large models.

Paper
Code

Two Phases of Scaling Laws for Nearest Neighbor Classifiers

no code implementations • 16 Aug 2023 • Pengkun Yang, Jingzhao Zhang

We show that a scaling law can have two phases: in the first phase, the generalization error depends polynomially on the data dimension and decreases fast; whereas in the second phase, the error depends exponentially on the data dimension and decreases slowly.

Paper
Add Code

Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

no code implementations • 26 Jun 2023 • Lesi Chen, Yaohua Ma, Jingzhao Zhang

Designing efficient algorithms for bilevel optimization is challenging because the lower-level problem defines a feasibility set implicitly via another optimization problem.

Bilevel Optimization Meta-Learning +2

Paper
Add Code

Fast Conditional Mixing of MCMC Algorithms for Non-log-concave Distributions

no code implementations • NeurIPS 2023 • Xiang Cheng, Bohan Wang, Jingzhao Zhang, Yusong Zhu

However, on the theory side, MCMC algorithms suffer from slow mixing rate when $\pi(x)$ is non-log-concave.

Paper
Add Code

Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex Optimization

no code implementations • 19 Mar 2023 • Peiyuan Zhang, Jiaye Teng, Jingzhao Zhang

Our paper examines this observation by providing excess risk lower bounds for GD and SGD in two realizable settings: 1) $\eta T = \bigO{n}$, and (2) $\eta T = \bigOmega{n}$, where $n$ is the size of dataset.

Generalization Bounds Learning Theory

Paper
Add Code

On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis

no code implementations • 2 Jan 2023 • Lesi Chen, Jing Xu, Jingzhao Zhang

Bilevel optimization reveals the inner structure of otherwise oblique optimization problems, such as hyperparameter tuning, neural architecture search, and meta-learning.

Bilevel Optimization Meta-Learning +1

Paper
Add Code

Online Policy Optimization for Robust MDP

no code implementations • 28 Sep 2022 • Jing Dong, Jingwei Li, Baoxiang Wang, Jingzhao Zhang

Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go.

Reinforcement Learning (RL)

Paper
Add Code

Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models

no code implementations • 1 Jun 2022 • Kaiyue Wen, Jiaye Teng, Jingzhao Zhang

Studies on benign overfitting provide insights for the success of overparameterized deep learning models.

Paper
Add Code

Understanding the unstable convergence of gradient descent

no code implementations • 3 Apr 2022 • Kwangjun Ahn, Jingzhao Zhang, Suvrit Sra

Most existing analyses of (stochastic) gradient descent rely on the condition that for $L$-smooth costs, the step size is less than $2/L$.

Paper
Add Code

Sion's Minimax Theorem in Geodesic Metric Spaces and a Riemannian Extragradient Algorithm

no code implementations • 13 Feb 2022 • Peiyuan Zhang, Jingzhao Zhang, Suvrit Sra

Deciding whether saddle points exist or are approximable for nonconvex-nonconcave problems is usually intractable.

Paper
Add Code

EVBattery: A Large-Scale Electric Vehicle Dataset for Battery Health and Capacity Estimation

no code implementations • 28 Jan 2022 • Haowei He, Jingzhao Zhang, Yanan Wang, Benben Jiang, Shaobo Huang, Chen Wang, Yang Zhang, Gengang Xiong, Xuebing Han, Dongxu Guo, Guannan He, Minggao Ouyang

In addition to demonstrating how existing deep learning algorithms can be applied to this task, we further develop an algorithm that exploits the data structure of battery systems.

Anomaly Detection Capacity Estimation +1

Paper
Add Code

Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective

no code implementations • 12 Oct 2021 • Jingzhao Zhang, Haochuan Li, Suvrit Sra, Ali Jadbabaie

This work examines the deep disconnect between existing theoretical analyses of gradient-based algorithms and the practice of training deep neural networks.

Paper
Add Code

Fast Federated Learning in the Presence of Arbitrary Device Unavailability

1 code implementation • NeurIPS 2021 • Xinran Gu, Kaixuan Huang, Jingzhao Zhang, Longbo Huang

In this case, the convergence of popular FL algorithms such as FedAvg is severely influenced by the straggling devices.

Federated Learning

Paper
Code

Complexity Lower Bounds for Nonconvex-Strongly-Concave Min-Max Optimization

no code implementations • NeurIPS 2021 • Haochuan Li, Yi Tian, Jingzhao Zhang, Ali Jadbabaie

We provide a first-order oracle complexity lower bound for finding stationary points of min-max optimization problems where the objective function is smooth, nonconvex in the minimization variable, and strongly concave in the maximization variable.

Paper
Add Code

Provably Efficient Algorithms for Multi-Objective Competitive RL

no code implementations • 5 Feb 2021 • Tiancheng Yu, Yi Tian, Jingzhao Zhang, Suvrit Sra

To our knowledge, this work provides the first provably efficient algorithms for vector-valued Markov games and our theoretical guarantees are near-optimal.

Multi-Objective Reinforcement Learning

Paper
Add Code

Stochastic Optimization with Non-stationary Noise: The Power of Moment Estimation

no code implementations • 1 Jan 2021 • Jingzhao Zhang, Hongzhou Lin, Subhro Das, Suvrit Sra, Ali Jadbabaie

In particular, standard results on optimal convergence rates for stochastic optimization assume either there exists a uniform bound on the moments of the gradient noise, or that the noise decays as the algorithm progresses.

Stochastic Optimization

Paper
Add Code

On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective

1 code implementation • NeurIPS 2023 • Zeke Xie, Zhiqiang Xu, Jingzhao Zhang, Issei Sato, Masashi Sugiyama

Weight decay is a simple yet powerful regularization technique that has been very widely used in training of deep neural networks (DNNs).

Paper
Code

Coping with Label Shift via Distributionally Robust Optimisation

1 code implementation • ICLR 2021 • Jingzhao Zhang, Aditya Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra

The label shift problem refers to the supervised learning setting where the train and test label distributions do not match.

Paper
Code

Quantifying Exposure Bias for Open-ended Language Generation

no code implementations • 28 Sep 2020 • Tianxing He, Jingzhao Zhang, Zhiming Zhou, James R. Glass

The exposure bias problem refers to the incrementally distorted generation induced by the training-generation discrepancy, in teacher-forcing training for auto-regressive neural network language models (LM).

Text Generation

Paper
Add Code

Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity

no code implementations • 8 Jun 2020 • Jingzhao Zhang, Hongzhou Lin, Subhro Das, Suvrit Sra, Ali Jadbabaie

We study oracle complexity of gradient based methods for stochastic approximation problems.

Stochastic Optimization

Paper
Add Code

Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions

no code implementations • 10 Feb 2020 • Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Ali Jadbabaie, Suvrit Sra

In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds.

Paper
Add Code

Why are Adaptive Methods Good for Attention Models?

no code implementations • NeurIPS 2020 • Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J. Reddi, Sanjiv Kumar, Suvrit Sra

While stochastic gradient descent (SGD) is still the \emph{de facto} algorithm in deep learning, adaptive methods like Clipped SGD/Adam have been observed to outperform SGD across important tasks, such as attention models.

Paper
Add Code

Why ADAM Beats SGD for Attention Models

no code implementations • 25 Sep 2019 • Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J Reddi, Sanjiv Kumar, Suvrit Sra

While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning, adaptive methods like Adam have been observed to outperform SGD across important tasks, such as attention models.

Paper
Add Code

Why gradient clipping accelerates training: A theoretical justification for adaptivity

1 code implementation • ICLR 2020 • Jingzhao Zhang, Tianxing He, Suvrit Sra, Ali Jadbabaie

We provide a theoretical explanation for the effectiveness of gradient clipping in training deep neural networks.

General Classification Image Classification +1

Paper
Code

Exposure Bias versus Self-Recovery: Are Distortions Really Incremental for Autoregressive Text Generation?

no code implementations • EMNLP 2021 • Tianxing He, Jingzhao Zhang, Zhiming Zhou, James Glass

Exposure bias has been regarded as a central problem for auto-regressive language models (LM).

Machine Translation Text Generation

Paper
Add Code

A Probe Towards Understanding GAN and VAE Models

no code implementations • 13 Dec 2018 • Lu Mi, Macheng Shen, Jingzhao Zhang

This project report compares some known GAN and VAE models proposed prior to 2017.

Generative Adversarial Network

Paper
Add Code

R-SPIDER: A Fast Riemannian Stochastic Optimization Algorithm with Curvature Independent Rate

no code implementations • 10 Nov 2018 • Jingzhao Zhang, Hongyi Zhang, Suvrit Sra

We study smooth stochastic optimization problems on Riemannian manifolds.

Stochastic Optimization

Paper
Add Code

Direct Runge-Kutta Discretization Achieves Acceleration

no code implementations • NeurIPS 2018 • Jingzhao Zhang, Aryan Mokhtari, Suvrit Sra, Ali Jadbabaie

We study gradient-based optimization methods obtained by directly discretizing a second-order ordinary differential equation (ODE) related to the continuous limit of Nesterov's accelerated gradient method.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.