Search Results for author: Kwangjun Ahn

Found 22 papers, 1 papers with code

Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise

no code implementations2 Feb 2024 Kwangjun Ahn, ZhiYu Zhang, Yunbum Kook, Yan Dai

In this work, we provide a different perspective based on online learning that underscores the importance of Adam's algorithmic components.

Linear attention is (maybe) all you need (to understand transformer optimization)

1 code implementation2 Oct 2023 Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra

Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics.

A Unified Approach to Controlling Implicit Regularization via Mirror Descent

no code implementations24 Jun 2023 Haoyuan Sun, Khashayar Gatmiry, Kwangjun Ahn, Navid Azizan

However, the implicit regularization of different algorithms are confined to either a specific geometry or a particular class of learning problems, indicating a gap in a general approach for controlling the implicit regularization.

Classification regression

How to escape sharp minima with random perturbations

no code implementations25 May 2023 Kwangjun Ahn, Ali Jadbabaie, Suvrit Sra

Under this notion, we then analyze algorithms that find approximate flat minima efficiently.

Learning threshold neurons via the "edge of stability"

no code implementations14 Dec 2022 Kwangjun Ahn, Sébastien Bubeck, Sinho Chewi, Yin Tat Lee, Felipe Suarez, Yi Zhang

For these models, we provably establish the edge of stability phenomenon and discover a sharp phase transition for the step size below which the neural network fails to learn "threshold-like" neurons (i. e., neurons with a non-zero first-layer bias).

Inductive Bias

Model Predictive Control via On-Policy Imitation Learning

no code implementations17 Oct 2022 Kwangjun Ahn, Zakaria Mhammedi, Horia Mania, Zhang-Wei Hong, Ali Jadbabaie

Recent approaches to data-driven MPC have used the simplest form of imitation learning known as behavior cloning to learn controllers that mimic the performance of MPC by online sampling of the trajectories of the closed-loop MPC system.

Imitation Learning Model Predictive Control +1

One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares

no code implementations28 Jul 2022 Youngjae Min, Kwangjun Ahn, Navid Azizan

While deep neural networks are capable of achieving state-of-the-art performance in various domains, their training typically requires iterating for many passes over the dataset.

Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently

no code implementations25 May 2022 Haoyuan Sun, Kwangjun Ahn, Christos Thrampoulidis, Navid Azizan

Driven by the empirical success and wide use of deep neural networks, understanding the generalization performance of overparameterized models has become an increasingly popular question.

Open-Ended Question Answering

Understanding the unstable convergence of gradient descent

no code implementations3 Apr 2022 Kwangjun Ahn, Jingzhao Zhang, Suvrit Sra

Most existing analyses of (stochastic) gradient descent rely on the condition that for $L$-smooth costs, the step size is less than $2/L$.

Agnostic Learnability of Halfspaces via Logistic Loss

no code implementations31 Jan 2022 Ziwei Ji, Kwangjun Ahn, Pranjal Awasthi, Satyen Kale, Stefani Karp

In this paper, we close this gap by constructing a well-behaved distribution such that the global minimizer of the logistic risk over this distribution only achieves $\Omega(\sqrt{\textrm{OPT}})$ misclassification risk, matching the upper bound in (Frei et al., 2021).

regression

Riemannian Perspective on Matrix Factorization

no code implementations1 Feb 2021 Kwangjun Ahn, Felipe Suarez

We study the non-convex matrix factorization approach to matrix completion via Riemannian geometry.

Matrix Completion

Optimal dimension dependence of the Metropolis-Adjusted Langevin Algorithm

no code implementations23 Dec 2020 Sinho Chewi, Chen Lu, Kwangjun Ahn, Xiang Cheng, Thibaut Le Gouic, Philippe Rigollet

Conventional wisdom in the sampling literature, backed by a popular diffusion scaling limit, suggests that the mixing time of the Metropolis-Adjusted Langevin Algorithm (MALA) scales as $O(d^{1/3})$, where $d$ is the dimension.

Efficient constrained sampling via the mirror-Langevin algorithm

no code implementations NeurIPS 2021 Kwangjun Ahn, Sinho Chewi

We propose a new discretization of the mirror-Langevin diffusion and give a crisp proof of its convergence.

Understanding Nesterov's Acceleration via Proximal Point Method

no code implementations17 May 2020 Kwangjun Ahn, Suvrit Sra

The proximal point method (PPM) is a fundamental method in optimization that is often used as a building block for designing optimization algorithms.

On Tight Convergence Rates of Without-replacement SGD

no code implementations18 Apr 2020 Kwangjun Ahn, Suvrit Sra

For solving finite-sum optimization problems, SGD without replacement sampling is empirically shown to outperform SGD.

From Nesterov's Estimate Sequence to Riemannian Acceleration

no code implementations24 Jan 2020 Kwangjun Ahn, Suvrit Sra

We control this distortion by developing a novel geometric inequality, which permits us to propose and analyze a Riemannian counterpart to Nesterov's accelerated gradient method.

Binary Rating Estimation with Graph Side Information

no code implementations NeurIPS 2018 Kwangjun Ahn, Kangwook Lee, Hyunseung Cha, Changho Suh

Considering a simple correlation model between a rating matrix and a graph, we characterize the sharp threshold on the number of observed entries required to recover the rating matrix (called the optimal sample complexity) as a function of the quality of graph side information (to be detailed).

Hypergraph Spectral Clustering in the Weighted Stochastic Block Model

no code implementations23 May 2018 Kwangjun Ahn, Kangwook Lee, Changho Suh

Our main contribution lies in performance analysis of the poly-time algorithms under a random hypergraph model, which we name the weighted stochastic block model, in which objects and multi-way measures are modeled as nodes and weights of hyperedges, respectively.

Clustering Stochastic Block Model

Community Recovery in Hypergraphs

no code implementations12 Sep 2017 Kwangjun Ahn, Kangwook Lee, Changho Suh

The objective of the problem is to cluster data points into distinct communities based on a set of measurements, each of which is associated with the values of a certain number of data points.

Clustering Face Clustering +1

Cannot find the paper you are looking for? You can Submit a new open access paper.