no code implementations • 2 Feb 2024 • Kwangjun Ahn, ZhiYu Zhang, Yunbum Kook, Yan Dai
In this work, we provide a different perspective based on online learning that underscores the importance of Adam's algorithmic components.
1 code implementation • 2 Oct 2023 • Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra
Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics.
no code implementations • 24 Jun 2023 • Haoyuan Sun, Khashayar Gatmiry, Kwangjun Ahn, Navid Azizan
However, the implicit regularization of different algorithms are confined to either a specific geometry or a particular class of learning problems, indicating a gap in a general approach for controlling the implicit regularization.
no code implementations • 25 May 2023 • Kwangjun Ahn, Ali Jadbabaie, Suvrit Sra
Under this notion, we then analyze algorithms that find approximate flat minima efficiently.
no code implementations • 14 Dec 2022 • Kwangjun Ahn, Sébastien Bubeck, Sinho Chewi, Yin Tat Lee, Felipe Suarez, Yi Zhang
For these models, we provably establish the edge of stability phenomenon and discover a sharp phase transition for the step size below which the neural network fails to learn "threshold-like" neurons (i. e., neurons with a non-zero first-layer bias).
no code implementations • 17 Oct 2022 • Kwangjun Ahn, Zakaria Mhammedi, Horia Mania, Zhang-Wei Hong, Ali Jadbabaie
Recent approaches to data-driven MPC have used the simplest form of imitation learning known as behavior cloning to learn controllers that mimic the performance of MPC by online sampling of the trajectories of the closed-loop MPC system.
no code implementations • 28 Jul 2022 • Youngjae Min, Kwangjun Ahn, Navid Azizan
While deep neural networks are capable of achieving state-of-the-art performance in various domains, their training typically requires iterating for many passes over the dataset.
no code implementations • 25 May 2022 • Haoyuan Sun, Kwangjun Ahn, Christos Thrampoulidis, Navid Azizan
Driven by the empirical success and wide use of deep neural networks, understanding the generalization performance of overparameterized models has become an increasingly popular question.
no code implementations • 3 Apr 2022 • Kwangjun Ahn, Jingzhao Zhang, Suvrit Sra
Most existing analyses of (stochastic) gradient descent rely on the condition that for $L$-smooth costs, the step size is less than $2/L$.
no code implementations • 9 Feb 2022 • Kwangjun Ahn, Prateek Jain, Ziwei Ji, Satyen Kale, Praneeth Netrapalli, Gil I. Shamir
We initiate a formal study of reproducibility in optimization.
no code implementations • 31 Jan 2022 • Ziwei Ji, Kwangjun Ahn, Pranjal Awasthi, Satyen Kale, Stefani Karp
In this paper, we close this gap by constructing a well-behaved distribution such that the global minimizer of the logistic risk over this distribution only achieves $\Omega(\sqrt{\textrm{OPT}})$ misclassification risk, matching the upper bound in (Frei et al., 2021).
no code implementations • 1 Feb 2021 • Kwangjun Ahn, Felipe Suarez
We study the non-convex matrix factorization approach to matrix completion via Riemannian geometry.
no code implementations • 23 Dec 2020 • Sinho Chewi, Chen Lu, Kwangjun Ahn, Xiang Cheng, Thibaut Le Gouic, Philippe Rigollet
Conventional wisdom in the sampling literature, backed by a popular diffusion scaling limit, suggests that the mixing time of the Metropolis-Adjusted Langevin Algorithm (MALA) scales as $O(d^{1/3})$, where $d$ is the dimension.
no code implementations • NeurIPS 2021 • Kwangjun Ahn, Sinho Chewi
We propose a new discretization of the mirror-Langevin diffusion and give a crisp proof of its convergence.
no code implementations • NeurIPS 2020 • Kwangjun Ahn, Chulhee Yun, Suvrit Sra
We study without-replacement SGD for solving finite-sum optimization problems.
no code implementations • 17 May 2020 • Kwangjun Ahn, Suvrit Sra
The proximal point method (PPM) is a fundamental method in optimization that is often used as a building block for designing optimization algorithms.
no code implementations • 18 Apr 2020 • Kwangjun Ahn, Suvrit Sra
For solving finite-sum optimization problems, SGD without replacement sampling is empirically shown to outperform SGD.
no code implementations • 24 Jan 2020 • Kwangjun Ahn, Suvrit Sra
We control this distortion by developing a novel geometric inequality, which permits us to propose and analyze a Riemannian counterpart to Nesterov's accelerated gradient method.
no code implementations • NeurIPS 2018 • Kwangjun Ahn, Kangwook Lee, Hyunseung Cha, Changho Suh
Considering a simple correlation model between a rating matrix and a graph, we characterize the sharp threshold on the number of observed entries required to recover the rating matrix (called the optimal sample complexity) as a function of the quality of graph side information (to be detailed).
no code implementations • 23 May 2018 • Kwangjun Ahn, Kangwook Lee, Changho Suh
Our main contribution lies in performance analysis of the poly-time algorithms under a random hypergraph model, which we name the weighted stochastic block model, in which objects and multi-way measures are modeled as nodes and weights of hyperedges, respectively.
no code implementations • 12 Sep 2017 • Kwangjun Ahn, Kangwook Lee, Changho Suh
The objective of the problem is to cluster data points into distinct communities based on a set of measurements, each of which is associated with the values of a certain number of data points.