Search Results for author: Jingzhao Zhang

Found 30 papers, 5 papers with code

Complexity of Finding Stationary Points of Nonconvex Nonsmooth Functions

no code implementations ICML 2020 Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Suvrit Sra, Ali Jadbabaie

Therefore, we introduce the notion of (delta, epsilon)-stationarity, a generalization that allows for a point to be within distance delta of an epsilon-stationary point and reduces to epsilon-stationarity for smooth functions.

Efficient Sampling on Riemannian Manifolds via Langevin MCMC

no code implementations15 Feb 2024 Xiang Cheng, Jingzhao Zhang, Suvrit Sra

We study the task of efficiently sampling from a Gibbs distribution $d \pi^* = e^{-h} d {vol}_g$ over a Riemannian manifold $M$ via (geometric) Langevin MCMC; this algorithm involves computing exponential maps in random Gaussian directions and is efficiently implementable in practice.

A Quadratic Synchronization Rule for Distributed Deep Learning

1 code implementation22 Oct 2023 Xinran Gu, Kaifeng Lyu, Sanjeev Arora, Jingzhao Zhang, Longbo Huang

In distributed deep learning with data parallelism, synchronizing gradients at each training step can cause a huge communication overhead, especially when many nodes work together to train large models.

Two Phases of Scaling Laws for Nearest Neighbor Classifiers

no code implementations16 Aug 2023 Pengkun Yang, Jingzhao Zhang

We show that a scaling law can have two phases: in the first phase, the generalization error depends polynomially on the data dimension and decreases fast; whereas in the second phase, the error depends exponentially on the data dimension and decreases slowly.

Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

no code implementations26 Jun 2023 Lesi Chen, Yaohua Ma, Jingzhao Zhang

Designing efficient algorithms for bilevel optimization is challenging because the lower-level problem defines a feasibility set implicitly via another optimization problem.

Bilevel Optimization Meta-Learning +2

Fast Conditional Mixing of MCMC Algorithms for Non-log-concave Distributions

no code implementations NeurIPS 2023 Xiang Cheng, Bohan Wang, Jingzhao Zhang, Yusong Zhu

However, on the theory side, MCMC algorithms suffer from slow mixing rate when $\pi(x)$ is non-log-concave.

Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex Optimization

no code implementations19 Mar 2023 Peiyuan Zhang, Jiaye Teng, Jingzhao Zhang

Our paper examines this observation by providing excess risk lower bounds for GD and SGD in two realizable settings: 1) $\eta T = \bigO{n}$, and (2) $\eta T = \bigOmega{n}$, where $n$ is the size of dataset.

Generalization Bounds Learning Theory

On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis

no code implementations2 Jan 2023 Lesi Chen, Jing Xu, Jingzhao Zhang

Bilevel optimization reveals the inner structure of otherwise oblique optimization problems, such as hyperparameter tuning, neural architecture search, and meta-learning.

Bilevel Optimization Meta-Learning +1

Online Policy Optimization for Robust MDP

no code implementations28 Sep 2022 Jing Dong, Jingwei Li, Baoxiang Wang, Jingzhao Zhang

Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go.

Reinforcement Learning (RL)

Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models

no code implementations1 Jun 2022 Kaiyue Wen, Jiaye Teng, Jingzhao Zhang

Studies on benign overfitting provide insights for the success of overparameterized deep learning models.

Understanding the unstable convergence of gradient descent

no code implementations3 Apr 2022 Kwangjun Ahn, Jingzhao Zhang, Suvrit Sra

Most existing analyses of (stochastic) gradient descent rely on the condition that for $L$-smooth costs, the step size is less than $2/L$.

Sion's Minimax Theorem in Geodesic Metric Spaces and a Riemannian Extragradient Algorithm

no code implementations13 Feb 2022 Peiyuan Zhang, Jingzhao Zhang, Suvrit Sra

Deciding whether saddle points exist or are approximable for nonconvex-nonconcave problems is usually intractable.

EVBattery: A Large-Scale Electric Vehicle Dataset for Battery Health and Capacity Estimation

no code implementations28 Jan 2022 Haowei He, Jingzhao Zhang, Yanan Wang, Benben Jiang, Shaobo Huang, Chen Wang, Yang Zhang, Gengang Xiong, Xuebing Han, Dongxu Guo, Guannan He, Minggao Ouyang

In addition to demonstrating how existing deep learning algorithms can be applied to this task, we further develop an algorithm that exploits the data structure of battery systems.

Anomaly Detection Capacity Estimation +1

Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective

no code implementations12 Oct 2021 Jingzhao Zhang, Haochuan Li, Suvrit Sra, Ali Jadbabaie

This work examines the deep disconnect between existing theoretical analyses of gradient-based algorithms and the practice of training deep neural networks.

Fast Federated Learning in the Presence of Arbitrary Device Unavailability

1 code implementation NeurIPS 2021 Xinran Gu, Kaixuan Huang, Jingzhao Zhang, Longbo Huang

In this case, the convergence of popular FL algorithms such as FedAvg is severely influenced by the straggling devices.

Federated Learning

Complexity Lower Bounds for Nonconvex-Strongly-Concave Min-Max Optimization

no code implementations NeurIPS 2021 Haochuan Li, Yi Tian, Jingzhao Zhang, Ali Jadbabaie

We provide a first-order oracle complexity lower bound for finding stationary points of min-max optimization problems where the objective function is smooth, nonconvex in the minimization variable, and strongly concave in the maximization variable.

Provably Efficient Algorithms for Multi-Objective Competitive RL

no code implementations5 Feb 2021 Tiancheng Yu, Yi Tian, Jingzhao Zhang, Suvrit Sra

To our knowledge, this work provides the first provably efficient algorithms for vector-valued Markov games and our theoretical guarantees are near-optimal.

Multi-Objective Reinforcement Learning

Stochastic Optimization with Non-stationary Noise: The Power of Moment Estimation

no code implementations1 Jan 2021 Jingzhao Zhang, Hongzhou Lin, Subhro Das, Suvrit Sra, Ali Jadbabaie

In particular, standard results on optimal convergence rates for stochastic optimization assume either there exists a uniform bound on the moments of the gradient noise, or that the noise decays as the algorithm progresses.

Stochastic Optimization

On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective

1 code implementation NeurIPS 2023 Zeke Xie, Zhiqiang Xu, Jingzhao Zhang, Issei Sato, Masashi Sugiyama

Weight decay is a simple yet powerful regularization technique that has been very widely used in training of deep neural networks (DNNs).

Coping with Label Shift via Distributionally Robust Optimisation

1 code implementation ICLR 2021 Jingzhao Zhang, Aditya Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra

The label shift problem refers to the supervised learning setting where the train and test label distributions do not match.

Quantifying Exposure Bias for Open-ended Language Generation

no code implementations28 Sep 2020 Tianxing He, Jingzhao Zhang, Zhiming Zhou, James R. Glass

The exposure bias problem refers to the incrementally distorted generation induced by the training-generation discrepancy, in teacher-forcing training for auto-regressive neural network language models (LM).

Text Generation

Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions

no code implementations10 Feb 2020 Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Ali Jadbabaie, Suvrit Sra

In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds.

Why are Adaptive Methods Good for Attention Models?

no code implementations NeurIPS 2020 Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J. Reddi, Sanjiv Kumar, Suvrit Sra

While stochastic gradient descent (SGD) is still the \emph{de facto} algorithm in deep learning, adaptive methods like Clipped SGD/Adam have been observed to outperform SGD across important tasks, such as attention models.

Why ADAM Beats SGD for Attention Models

no code implementations25 Sep 2019 Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J Reddi, Sanjiv Kumar, Suvrit Sra

While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning, adaptive methods like Adam have been observed to outperform SGD across important tasks, such as attention models.

A Probe Towards Understanding GAN and VAE Models

no code implementations13 Dec 2018 Lu Mi, Macheng Shen, Jingzhao Zhang

This project report compares some known GAN and VAE models proposed prior to 2017.

Generative Adversarial Network

Direct Runge-Kutta Discretization Achieves Acceleration

no code implementations NeurIPS 2018 Jingzhao Zhang, Aryan Mokhtari, Suvrit Sra, Ali Jadbabaie

We study gradient-based optimization methods obtained by directly discretizing a second-order ordinary differential equation (ODE) related to the continuous limit of Nesterov's accelerated gradient method.

Cannot find the paper you are looking for? You can Submit a new open access paper.