Search Results for author: Juhan Bae

Found 12 papers, 7 papers with code

Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective

no code implementations5 Feb 2024 Wu Lin, Felix Dangel, Runa Eschenhagen, Juhan Bae, Richard E. Turner, Alireza Makhzani

Adaptive gradient optimizers like Adam(W) are the default training algorithms for many deep learning architectures, such as transformers.

Second-order methods

Using Large Language Models for Hyperparameter Optimization

no code implementations7 Dec 2023 Michael R. Zhang, Nishkrit Desai, Juhan Bae, Jonathan Lorraine, Jimmy Ba

This paper studies using foundational large language models (LLMs) to make decisions during hyperparameter optimization (HPO).

Bayesian Optimization Decision Making +1

Studying Large Language Model Generalization with Influence Functions

2 code implementations7 Aug 2023 Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman

When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior?

counterfactual Language Modelling +2

Efficient Parametric Approximations of Neural Network Function Space Distance

no code implementations7 Feb 2023 Nikita Dhawan, Sicong Huang, Juhan Bae, Roger Grosse

It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset.

Continual Learning

Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve

no code implementations7 Dec 2022 Juhan Bae, Michael R. Zhang, Michael Ruan, Eric Wang, So Hasegawa, Jimmy Ba, Roger Grosse

Variational autoencoders (VAEs) are powerful tools for learning latent representations of data used in a wide range of applications.

If Influence Functions are the Answer, Then What is the Question?

2 code implementations12 Sep 2022 Juhan Bae, Nathan Ng, Alston Lo, Marzyeh Ghassemi, Roger Grosse

Influence functions efficiently estimate the effect of removing a single training data point on a model's learned parameters.

Amortized Proximal Optimization

no code implementations28 Feb 2022 Juhan Bae, Paul Vicol, Jeff Z. HaoChen, Roger Grosse

Using APO to adapt a structured preconditioning matrix generally results in optimization performance competitive with second-order methods.

Image Classification Image Reconstruction +2

Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes

1 code implementation22 Apr 2021 James Lucas, Juhan Bae, Michael R. Zhang, Stanislav Fort, Richard Zemel, Roger Grosse

Linear interpolation between initial neural network parameters and converged parameters after training with stochastic gradient descent (SGD) typically leads to a monotonic decrease in the training objective.

Eigenvalue Corrected Noisy Natural Gradient

3 code implementations30 Nov 2018 Juhan Bae, Guodong Zhang, Roger Grosse

A recently proposed method, noisy natural gradient, is a surprisingly simple method to fit expressive posteriors by adding weight noise to regular natural gradient updates.

Learnable Pooling Methods for Video Classification

1 code implementation1 Oct 2018 Sebastian Kmiec, Juhan Bae, Ruijian An

We demonstrate our solutions in the "The 2nd YouTube-8M Video Understanding Challenge", by using frame-level video and audio descriptors.

Classification General Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.