Search Results for author: Juhan Bae

Found 12 papers, 7 papers with code

Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective

no code implementations • 5 Feb 2024 • Wu Lin, Felix Dangel, Runa Eschenhagen, Juhan Bae, Richard E. Turner, Alireza Makhzani

Adaptive gradient optimizers like Adam(W) are the default training algorithms for many deep learning architectures, such as transformers.

Second-order methods

Paper
Add Code

Using Large Language Models for Hyperparameter Optimization

no code implementations • 7 Dec 2023 • Michael R. Zhang, Nishkrit Desai, Juhan Bae, Jonathan Lorraine, Jimmy Ba

This paper studies using foundational large language models (LLMs) to make decisions during hyperparameter optimization (HPO).

Bayesian Optimization Decision Making +1

Paper
Add Code

Studying Large Language Model Generalization with Influence Functions

2 code implementations • 7 Aug 2023 • Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman

When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior?

counterfactual Language Modelling +2

Paper
Code

Benchmarking Neural Network Training Algorithms

3 code implementations • 12 Jun 2023 • George E. Dahl, Frank Schneider, Zachary Nado, Naman Agarwal, Chandramouli Shama Sastry, Philipp Hennig, Sourabh Medapati, Runa Eschenhagen, Priya Kasimbeg, Daniel Suo, Juhan Bae, Justin Gilmer, Abel L. Peirson, Bilal Khan, Rohan Anil, Mike Rabbat, Shankar Krishnan, Daniel Snider, Ehsan Amid, Kongtao Chen, Chris J. Maddison, Rakshith Vasudev, Michal Badura, Ankush Garg, Peter Mattson

In order to address these challenges, we introduce a new, competitive, time-to-result benchmark using multiple workloads running on fixed hardware, the AlgoPerf: Training Algorithms benchmark.

Benchmarking

1,466

Paper
Code

Efficient Parametric Approximations of Neural Network Function Space Distance

no code implementations • 7 Feb 2023 • Nikita Dhawan, Sicong Huang, Juhan Bae, Roger Grosse

It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset.

Continual Learning

Paper
Add Code

Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve

no code implementations • 7 Dec 2022 • Juhan Bae, Michael R. Zhang, Michael Ruan, Eric Wang, So Hasegawa, Jimmy Ba, Roger Grosse

Variational autoencoders (VAEs) are powerful tools for learning latent representations of data used in a wide range of applications.

Paper
Add Code

If Influence Functions are the Answer, Then What is the Question?

2 code implementations • 12 Sep 2022 • Juhan Bae, Nathan Ng, Alston Lo, Marzyeh Ghassemi, Roger Grosse

Influence functions efficiently estimate the effect of removing a single training data point on a model's learned parameters.

Paper
Code

Amortized Proximal Optimization

no code implementations • 28 Feb 2022 • Juhan Bae, Paul Vicol, Jeff Z. HaoChen, Roger Grosse

Using APO to adapt a structured preconditioning matrix generally results in optimization performance competitive with second-order methods.

Image Classification Image Reconstruction +2

Paper
Add Code

Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes

1 code implementation • 22 Apr 2021 • James Lucas, Juhan Bae, Michael R. Zhang, Stanislav Fort, Richard Zemel, Roger Grosse

Linear interpolation between initial neural network parameters and converged parameters after training with stochastic gradient descent (SGD) typically leads to a monotonic decrease in the training objective.

Paper
Code

Delta-STN: Efficient Bilevel Optimization for Neural Networks using Structured Response Jacobians

1 code implementation • NeurIPS 2020 • Juhan Bae, Roger Grosse

Hyperparameter optimization of neural networks can be elegantly formulated as a bilevel optimization problem.

Bilevel Optimization Hyperparameter Optimization +2

Paper
Code

Eigenvalue Corrected Noisy Natural Gradient

3 code implementations • 30 Nov 2018 • Juhan Bae, Guodong Zhang, Roger Grosse

A recently proposed method, noisy natural gradient, is a surprisingly simple method to fit expressive posteriors by adding weight noise to regular natural gradient updates.

Paper
Code

Learnable Pooling Methods for Video Classification

1 code implementation • 1 Oct 2018 • Sebastian Kmiec, Juhan Bae, Ruijian An

We demonstrate our solutions in the "The 2nd YouTube-8M Video Understanding Challenge", by using frame-level video and audio descriptors.

Classification General Classification +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.