Search Results for author: Kwangjun Ahn

Found 22 papers, 1 papers with code

Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise

no code implementations • 2 Feb 2024 • Kwangjun Ahn, ZhiYu Zhang, Yunbum Kook, Yan Dai

In this work, we provide a different perspective based on online learning that underscores the importance of Adam's algorithmic components.

Paper
Add Code

Linear attention is (maybe) all you need (to understand transformer optimization)

1 code implementation • 2 Oct 2023 • Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra

Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics.

Paper
Code

A Unified Approach to Controlling Implicit Regularization via Mirror Descent

no code implementations • 24 Jun 2023 • Haoyuan Sun, Khashayar Gatmiry, Kwangjun Ahn, Navid Azizan

However, the implicit regularization of different algorithms are confined to either a specific geometry or a particular class of learning problems, indicating a gap in a general approach for controlling the implicit regularization.

Classification regression

Paper
Add Code

Smooth Model Predictive Control with Applications to Statistical Learning

no code implementations • 2 Jun 2023 • Kwangjun Ahn, Daniel Pfrommer, Jack Umenberger, Tobia Marcucci, Zak Mhammedi, Ali Jadbabaie

barrier MPC.

Learning Theory Model Predictive Control

Paper
Add Code

How to escape sharp minima with random perturbations

no code implementations • 25 May 2023 • Kwangjun Ahn, Ali Jadbabaie, Suvrit Sra

Under this notion, we then analyze algorithms that find approximate flat minima efficiently.

Paper
Add Code

Learning threshold neurons via the "edge of stability"

no code implementations • 14 Dec 2022 • Kwangjun Ahn, Sébastien Bubeck, Sinho Chewi, Yin Tat Lee, Felipe Suarez, Yi Zhang

For these models, we provably establish the edge of stability phenomenon and discover a sharp phase transition for the step size below which the neural network fails to learn "threshold-like" neurons (i. e., neurons with a non-zero first-layer bias).

Inductive Bias

Paper
Add Code

Model Predictive Control via On-Policy Imitation Learning

no code implementations • 17 Oct 2022 • Kwangjun Ahn, Zakaria Mhammedi, Horia Mania, Zhang-Wei Hong, Ali Jadbabaie

Recent approaches to data-driven MPC have used the simplest form of imitation learning known as behavior cloning to learn controllers that mimic the performance of MPC by online sampling of the trajectories of the closed-loop MPC system.

Imitation Learning Model Predictive Control +1

Paper
Add Code

One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares

no code implementations • 28 Jul 2022 • Youngjae Min, Kwangjun Ahn, Navid Azizan

While deep neural networks are capable of achieving state-of-the-art performance in various domains, their training typically requires iterating for many passes over the dataset.

Paper
Add Code

Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently

no code implementations • 25 May 2022 • Haoyuan Sun, Kwangjun Ahn, Christos Thrampoulidis, Navid Azizan

Driven by the empirical success and wide use of deep neural networks, understanding the generalization performance of overparameterized models has become an increasingly popular question.

Open-Ended Question Answering

Paper
Add Code

Understanding the unstable convergence of gradient descent

no code implementations • 3 Apr 2022 • Kwangjun Ahn, Jingzhao Zhang, Suvrit Sra

Most existing analyses of (stochastic) gradient descent rely on the condition that for $L$-smooth costs, the step size is less than $2/L$.

Paper
Add Code

Reproducibility in Optimization: Theoretical Framework and Limits

no code implementations • 9 Feb 2022 • Kwangjun Ahn, Prateek Jain, Ziwei Ji, Satyen Kale, Praneeth Netrapalli, Gil I. Shamir

We initiate a formal study of reproducibility in optimization.

Paper
Add Code

Agnostic Learnability of Halfspaces via Logistic Loss

no code implementations • 31 Jan 2022 • Ziwei Ji, Kwangjun Ahn, Pranjal Awasthi, Satyen Kale, Stefani Karp

In this paper, we close this gap by constructing a well-behaved distribution such that the global minimizer of the logistic risk over this distribution only achieves $\Omega(\sqrt{\textrm{OPT}})$ misclassification risk, matching the upper bound in (Frei et al., 2021).

regression

Paper
Add Code

Riemannian Perspective on Matrix Factorization

no code implementations • 1 Feb 2021 • Kwangjun Ahn, Felipe Suarez

We study the non-convex matrix factorization approach to matrix completion via Riemannian geometry.

Matrix Completion

Paper
Add Code

Optimal dimension dependence of the Metropolis-Adjusted Langevin Algorithm

no code implementations • 23 Dec 2020 • Sinho Chewi, Chen Lu, Kwangjun Ahn, Xiang Cheng, Thibaut Le Gouic, Philippe Rigollet

Conventional wisdom in the sampling literature, backed by a popular diffusion scaling limit, suggests that the mixing time of the Metropolis-Adjusted Langevin Algorithm (MALA) scales as $O(d^{1/3})$, where $d$ is the dimension.

Paper
Add Code

Efficient constrained sampling via the mirror-Langevin algorithm

no code implementations • NeurIPS 2021 • Kwangjun Ahn, Sinho Chewi

We propose a new discretization of the mirror-Langevin diffusion and give a crisp proof of its convergence.

Paper
Add Code

SGD with shuffling: optimal rates without component convexity and large epoch requirements

no code implementations • NeurIPS 2020 • Kwangjun Ahn, Chulhee Yun, Suvrit Sra

We study without-replacement SGD for solving finite-sum optimization problems.

Paper
Add Code

Understanding Nesterov's Acceleration via Proximal Point Method

no code implementations • 17 May 2020 • Kwangjun Ahn, Suvrit Sra

The proximal point method (PPM) is a fundamental method in optimization that is often used as a building block for designing optimization algorithms.

Paper
Add Code

On Tight Convergence Rates of Without-replacement SGD

no code implementations • 18 Apr 2020 • Kwangjun Ahn, Suvrit Sra

For solving finite-sum optimization problems, SGD without replacement sampling is empirically shown to outperform SGD.

Paper
Add Code

From Nesterov's Estimate Sequence to Riemannian Acceleration

no code implementations • 24 Jan 2020 • Kwangjun Ahn, Suvrit Sra

We control this distortion by developing a novel geometric inequality, which permits us to propose and analyze a Riemannian counterpart to Nesterov's accelerated gradient method.

Paper
Add Code

Binary Rating Estimation with Graph Side Information

no code implementations • NeurIPS 2018 • Kwangjun Ahn, Kangwook Lee, Hyunseung Cha, Changho Suh

Considering a simple correlation model between a rating matrix and a graph, we characterize the sharp threshold on the number of observed entries required to recover the rating matrix (called the optimal sample complexity) as a function of the quality of graph side information (to be detailed).

Paper
Add Code

Hypergraph Spectral Clustering in the Weighted Stochastic Block Model

no code implementations • 23 May 2018 • Kwangjun Ahn, Kangwook Lee, Changho Suh

Our main contribution lies in performance analysis of the poly-time algorithms under a random hypergraph model, which we name the weighted stochastic block model, in which objects and multi-way measures are modeled as nodes and weights of hyperedges, respectively.

Clustering Stochastic Block Model

Paper
Add Code

Community Recovery in Hypergraphs

no code implementations • 12 Sep 2017 • Kwangjun Ahn, Kangwook Lee, Changho Suh

The objective of the problem is to cluster data points into distinct communities based on a set of measurements, each of which is associated with the values of a certain number of data points.

Clustering Face Clustering +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.