Search Results for author: Jared Davis

Found 7 papers, 4 papers with code

Debiasing a First-order Heuristic for Approximate Bi-level Optimization

1 code implementation • 4 Jun 2021 • Valerii Likhosherstov, Xingyou Song, Krzysztof Choromanski, Jared Davis, Adrian Weller

Approximate bi-level optimization (ABLO) consists of (outer-level) optimization problems, involving numerical (inner-level) optimization loops.

Paper
Code

Sub-Linear Memory: How to Make Performers SLiM

2 code implementations • NeurIPS 2021 • Valerii Likhosherstov, Krzysztof Choromanski, Jared Davis, Xingyou Song, Adrian Weller

Recent works proposed various linear self-attention mechanisms, scaling only as $O(L)$ for serial computation.

32,732

Paper
Code

Rethinking Attention with Performers

12 code implementations • ICLR 2021 • Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller

We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness.

Ranked #7 on Offline RL on D4RL

D4RL Image Generation +2

76,577

Paper
Code

Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers

1 code implementation • 5 Jun 2020 • Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, David Belanger, Lucy Colwell, Adrian Weller

In response, solutions that exploit the structure and sparsity of the learned attention matrix have blossomed.

Language Modelling Masked Language Modeling

32,732

Paper
Code

UFO-BLO: Unbiased First-Order Bilevel Optimization

no code implementations • 5 Jun 2020 • Valerii Likhosherstov, Xingyou Song, Krzysztof Choromanski, Jared Davis, Adrian Weller

Bilevel optimization (BLO) is a popular approach with many applications including hyperparameter optimization, neural architecture search, adversarial robustness and model-agnostic meta-learning.

Adversarial Robustness Bilevel Optimization +4

Paper
Add Code

CWY Parametrization: a Solution for Parallelized Optimization of Orthogonal and Stiefel Matrices

no code implementations • 18 Apr 2020 • Valerii Likhosherstov, Jared Davis, Krzysztof Choromanski, Adrian Weller

We introduce an efficient approach for optimization over orthogonal groups on highly parallel computation units such as GPUs or TPUs.

Machine Translation Translation +1

Paper
Add Code

Stochastic Flows and Geometric Optimization on the Orthogonal Group

no code implementations • ICML 2020 • Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas Sindhwani

We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group $O(d)$ and naturally reductive homogeneous manifolds obtained from the action of the rotation group $SO(d)$.

Metric Learning Stochastic Optimization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.