no code implementations • 11 Feb 2024 • Liu Ziyin, Mingze Wang, Lei Wu
For one class of symmetry, SGD naturally converges to solutions that have a balanced and aligned gradient noise.
no code implementations • 13 Jan 2024 • Yizhou Xu, Liu Ziyin
We identify and solve a hidden-layer model that is analytically tractable at any finite width and whose limits exhibit both the kernel phase and the feature learning phase.
no code implementations • 29 Sep 2023 • Liu Ziyin
Due to common architecture designs, symmetries exist extensively in contemporary neural networks.
no code implementations • 13 Aug 2023 • Liu Ziyin, Hongchao Li, Masahito Ueda
The stochastic gradient descent (SGD) algorithm is the algorithm we use to train neural networks.
1 code implementation • 27 Mar 2023 • James B. Simon, Maksis Knutins, Liu Ziyin, Daniel Geisz, Abraham J. Fetterman, Joshua Albrecht
We present a simple picture of the training process of joint embedding self-supervised learning methods.
no code implementations • 23 Mar 2023 • Liu Ziyin, Botao Li, Tomer Galanti, Masahito Ueda
Characterizing and understanding the stability of Stochastic Gradient Descent (SGD) remains an open problem in deep learning.
2 code implementations • 3 Oct 2022 • Liu Ziyin, ZiHao Wang
We propose to minimize a generic differentiable objective with $L_1$ constraint using a simple reparametrization and straightforward stochastic gradient descent.
no code implementations • 2 Oct 2022 • Liu Ziyin, Ekdeep Singh Lubana, Masahito Ueda, Hidenori Tanaka
Prevention of complete and dimensional collapse of representations has recently become a design principle for self-supervised learning (SSL).
no code implementations • 25 May 2022 • Liu Ziyin, Masahito Ueda
This work reports deep-learning-unique first-order and second-order phase transitions, whose phenomenology closely follows that in statistical physics.
no code implementations • 9 May 2022 • ZiHao Wang, Liu Ziyin
This work identifies the existence and cause of a type of posterior collapse that frequently occurs in the Bayesian deep learning practice.
no code implementations • 10 Feb 2022 • Liu Ziyin, Botao Li, Xiangming Meng
This work finds the analytical expression of the global minima of a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the landscape of neural networks.
no code implementations • 30 Jan 2022 • Liu Ziyin, HANLIN ZHANG, Xiangming Meng, Yuting Lu, Eric Xing, Masahito Ueda
This work theoretically studies stochastic neural networks, a main type of neural network in use.
no code implementations • 29 Sep 2021 • Takashi Mori, Liu Ziyin, Kangqiao Liu, Masahito Ueda
Stochastic gradient descent (SGD) undergoes complicated multiplicative noise for the mean-square loss.
no code implementations • ICLR 2022 • Liu Ziyin, Botao Li, James B Simon, Masahito Ueda
Stochastic gradient descent (SGD) is widely used for the nonlinear, nonconvex problem of training deep neural networks, but its behavior remains poorly understood.
no code implementations • 25 Jul 2021 • Liu Ziyin, Botao Li, James B. Simon, Masahito Ueda
Previous works on stochastic gradient descent (SGD) often focus on its success.
1 code implementation • 8 Jun 2021 • Liu Ziyin, Kentaro Minami, Kentaro Imajo
The task we consider is portfolio construction in a speculative market, a fundamental problem in modern finance.
no code implementations • 20 May 2021 • Takashi Mori, Liu Ziyin, Kangqiao Liu, Masahito Ueda
Stochastic gradient descent (SGD) undergoes complicated multiplicative noise for the mean-square loss.
no code implementations • 15 May 2021 • Zhang Zhiyi, Liu Ziyin
Adaptive gradient methods have achieved remarkable success in training deep neural networks on a wide variety of tasks.
no code implementations • ICLR 2022 • Liu Ziyin, Kangqiao Liu, Takashi Mori, Masahito Ueda
The noise in stochastic gradient descent (SGD), caused by minibatch sampling, is poorly understood despite its practical importance in deep learning.
no code implementations • 7 Dec 2020 • Kangqiao Liu, Liu Ziyin, Masahito Ueda
In the vanishing learning rate regime, stochastic gradient descent (SGD) is now relatively well understood.
1 code implementation • 4 Dec 2020 • Paul Pu Liang, Peter Wu, Liu Ziyin, Louis-Philippe Morency, Ruslan Salakhutdinov
In this work, we propose algorithms for cross-modal generalization: a learning paradigm to train a model that can (1) quickly perform new tasks in a target modality (i. e. meta-learning) and (2) doing so while being trained on a different source modality.
no code implementations • 23 Oct 2020 • Blair Chen, Liu Ziyin, ZiHao Wang, Paul Pu Liang
In this paper, as a step towards understanding why label smoothing is effective, we propose a theoretical framework to show how label smoothing provides in controlling the generalization loss.
3 code implementations • NeurIPS 2020 • Liu Ziyin, Tilman Hartwig, Masahito Ueda
Previous literature offers limited clues on how to learn a periodic function using modern neural networks.
no code implementations • 25 Mar 2020 • Liu Ziyin, ZiHao Wang, Makoto Yamada, Masahito Ueda
We propose a novel regularization method, called \textit{volumization}, for neural networks.
no code implementations • 16 Feb 2020 • Liu Ziyin, Blair Chen, Ru Wang, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency, Masahito Ueda
Learning in the presence of label noise is a challenging yet important task: it is crucial to design models that are robust in the presence of mislabeled datasets.
1 code implementation • 12 Feb 2020 • Liu Ziyin, Zhikang T. Wang, Masahito Ueda
We also bound the regret of Laprop on a convex problem and show that our bound differs from that of Adam by a key factor, which demonstrates its advantage.
4 code implementations • 6 Jan 2020 • Paul Pu Liang, Terrance Liu, Liu Ziyin, Nicholas B. Allen, Randy P. Auerbach, David Brent, Ruslan Salakhutdinov, Louis-Philippe Morency
To this end, we propose a new federated learning algorithm that jointly learns compact local representations on each device and a global model across all devices.
no code implementations • 25 Sep 2019 • Liu Ziyin, Ru Wang, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency, Masahito Ueda
Learning in the presence of label noise is a challenging yet important task.
3 code implementations • NeurIPS 2019 • Liu Ziyin, Zhikang Wang, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency, Masahito Ueda
We deal with the \textit{selective classification} problem (supervised-learning problem with a rejection option), where we want to achieve the best performance at a certain level of coverage of the data.