Search Results for author: Liu Ziyin

Found 29 papers, 8 papers with code

The Implicit Bias of Gradient Noise: A Symmetry Perspective

no code implementations • 11 Feb 2024 • Liu Ziyin, Mingze Wang, Lei Wu

For one class of symmetry, SGD naturally converges to solutions that have a balanced and aligned gradient noise.

Paper
Add Code

When Does Feature Learning Happen? Perspective from an Analytically Solvable Model

no code implementations • 13 Jan 2024 • Yizhou Xu, Liu Ziyin

We identify and solve a hidden-layer model that is analytically tractable at any finite width and whose limits exhibit both the kernel phase and the feature learning phase.

Paper
Add Code

Symmetry Leads to Structured Constraint of Learning

no code implementations • 29 Sep 2023 • Liu Ziyin

Due to common architecture designs, symmetries exist extensively in contemporary neural networks.

Paper
Add Code

Law of Balance and Stationary Distribution of Stochastic Gradient Descent

no code implementations • 13 Aug 2023 • Liu Ziyin, Hongchao Li, Masahito Ueda

The stochastic gradient descent (SGD) algorithm is the algorithm we use to train neural networks.

Paper
Add Code

On the Stepwise Nature of Self-Supervised Learning

1 code implementation • 27 Mar 2023 • James B. Simon, Maksis Knutins, Liu Ziyin, Daniel Geisz, Abraham J. Fetterman, Joshua Albrecht

We present a simple picture of the training process of joint embedding self-supervised learning methods.

Self-Supervised Learning

Paper
Code

The Probabilistic Stability of Stochastic Gradient Descent

no code implementations • 23 Mar 2023 • Liu Ziyin, Botao Li, Tomer Galanti, Masahito Ueda

Characterizing and understanding the stability of Stochastic Gradient Descent (SGD) remains an open problem in deep learning.

Learning Theory

Paper
Add Code

spred: Solving $L_1$ Penalty with SGD

2 code implementations • 3 Oct 2022 • Liu Ziyin, ZiHao Wang

We propose to minimize a generic differentiable objective with $L_1$ constraint using a simple reparametrization and straightforward stochastic gradient descent.

Inductive Bias Neural Network Compression

Paper
Code

What shapes the loss landscape of self-supervised learning?

no code implementations • 2 Oct 2022 • Liu Ziyin, Ekdeep Singh Lubana, Masahito Ueda, Hidenori Tanaka

Prevention of complete and dimensional collapse of representations has recently become a design principle for self-supervised learning (SSL).

Self-Supervised Learning

Paper
Add Code

Exact Phase Transitions in Deep Learning

no code implementations • 25 May 2022 • Liu Ziyin, Masahito Ueda

This work reports deep-learning-unique first-order and second-order phase transitions, whose phenomenology closely follows that in statistical physics.

Paper
Add Code

Posterior Collapse of a Linear Latent Variable Model

no code implementations • 9 May 2022 • ZiHao Wang, Liu Ziyin

This work identifies the existence and cause of a type of posterior collapse that frequently occurs in the Bayesian deep learning practice.

Paper
Add Code

Exact Solutions of a Deep Linear Network

no code implementations • 10 Feb 2022 • Liu Ziyin, Botao Li, Xiangming Meng

This work finds the analytical expression of the global minima of a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the landscape of neural networks.

Paper
Add Code

Stochastic Neural Networks with Infinite Width are Deterministic

no code implementations • 30 Jan 2022 • Liu Ziyin, HANLIN ZHANG, Xiangming Meng, Yuting Lu, Eric Xing, Masahito Ueda

This work theoretically studies stochastic neural networks, a main type of neural network in use.

Paper
Add Code

Logarithmic landscape and power-law escape rate of SGD

no code implementations • 29 Sep 2021 • Takashi Mori, Liu Ziyin, Kangqiao Liu, Masahito Ueda

Stochastic gradient descent (SGD) undergoes complicated multiplicative noise for the mean-square loss.

Paper
Add Code

SGD Can Converge to Local Maxima

no code implementations • ICLR 2022 • Liu Ziyin, Botao Li, James B Simon, Masahito Ueda

Stochastic gradient descent (SGD) is widely used for the nonlinear, nonconvex problem of training deep neural networks, but its behavior remains poorly understood.

Paper
Add Code

SGD with a Constant Large Learning Rate Can Converge to Local Maxima

no code implementations • 25 Jul 2021 • Liu Ziyin, Botao Li, James B. Simon, Masahito Ueda

Previous works on stochastic gradient descent (SGD) often focus on its success.

Paper
Add Code

Theoretically Motivated Data Augmentation and Regularization for Portfolio Construction

1 code implementation • 8 Jun 2021 • Liu Ziyin, Kentaro Minami, Kentaro Imajo

The task we consider is portfolio construction in a speculative market, a fundamental problem in modern finance.

Data Augmentation

Paper
Code

Power-law escape rate of SGD

no code implementations • 20 May 2021 • Takashi Mori, Liu Ziyin, Kangqiao Liu, Masahito Ueda

Stochastic gradient descent (SGD) undergoes complicated multiplicative noise for the mean-square loss.

Paper
Add Code

On the Distributional Properties of Adaptive Gradients

no code implementations • 15 May 2021 • Zhang Zhiyi, Liu Ziyin

Adaptive gradient methods have achieved remarkable success in training deep neural networks on a wide variety of tasks.

Paper
Add Code

Strength of Minibatch Noise in SGD

no code implementations • ICLR 2022 • Liu Ziyin, Kangqiao Liu, Takashi Mori, Masahito Ueda

The noise in stochastic gradient descent (SGD), caused by minibatch sampling, is poorly understood despite its practical importance in deep learning.

Paper
Add Code

Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent

no code implementations • 7 Dec 2020 • Kangqiao Liu, Liu Ziyin, Masahito Ueda

In the vanishing learning rate regime, stochastic gradient descent (SGD) is now relatively well understood.

Bayesian Inference Second-order methods

Paper
Add Code

Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment

1 code implementation • 4 Dec 2020 • Paul Pu Liang, Peter Wu, Liu Ziyin, Louis-Philippe Morency, Ruslan Salakhutdinov

In this work, we propose algorithms for cross-modal generalization: a learning paradigm to train a model that can (1) quickly perform new tasks in a target modality (i. e. meta-learning) and (2) doing so while being trained on a different source modality.

Meta-Learning

Paper
Code

An Investigation of how Label Smoothing Affects Generalization

no code implementations • 23 Oct 2020 • Blair Chen, Liu Ziyin, ZiHao Wang, Paul Pu Liang

In this paper, as a step towards understanding why label smoothing is effective, we propose a theoretical framework to show how label smoothing provides in controlling the generalization loss.

Paper
Add Code

Neural Networks Fail to Learn Periodic Functions and How to Fix It

3 code implementations • NeurIPS 2020 • Liu Ziyin, Tilman Hartwig, Masahito Ueda

Previous literature offers limited clues on how to learn a periodic function using modern neural networks.

Inductive Bias

611

Paper
Code

Volumization as a Natural Generalization of Weight Decay

no code implementations • 25 Mar 2020 • Liu Ziyin, ZiHao Wang, Makoto Yamada, Masahito Ueda

We propose a novel regularization method, called \textit{volumization}, for neural networks.

Memorization

Paper
Add Code

Learning Not to Learn in the Presence of Noisy Labels

no code implementations • 16 Feb 2020 • Liu Ziyin, Blair Chen, Ru Wang, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency, Masahito Ueda

Learning in the presence of label noise is a challenging yet important task: it is crucial to design models that are robust in the presence of mislabeled datasets.

Memorization text-classification +1

Paper
Add Code

LaProp: Separating Momentum and Adaptivity in Adam

1 code implementation • 12 Feb 2020 • Liu Ziyin, Zhikang T. Wang, Masahito Ueda

We also bound the regret of Laprop on a convex problem and show that our bound differs from that of Adam by a key factor, which demonstrates its advantage.

Style Transfer

Paper
Code

Think Locally, Act Globally: Federated Learning with Local and Global Representations

4 code implementations • 6 Jan 2020 • Paul Pu Liang, Terrance Liu, Liu Ziyin, Nicholas B. Allen, Randy P. Auerbach, David Brent, Ruslan Salakhutdinov, Louis-Philippe Morency

To this end, we propose a new federated learning algorithm that jointly learns compact local representations on each device and a global model across all devices.

Federated Learning Representation Learning +2

1,152

Paper
Code

A Simple Approach to the Noisy Label Problem Through the Gambler's Loss

no code implementations • 25 Sep 2019 • Liu Ziyin, Ru Wang, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency, Masahito Ueda

Learning in the presence of label noise is a challenging yet important task.

Memorization

Paper
Add Code

Deep Gamblers: Learning to Abstain with Portfolio Theory

3 code implementations • NeurIPS 2019 • Liu Ziyin, Zhikang Wang, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency, Masahito Ueda

We deal with the \textit{selective classification} problem (supervised-learning problem with a rejection option), where we want to achieve the best performance at a certain level of coverage of the data.

Classification General Classification

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.