Search Results for author: Valerii Likhosherstov

Found 25 papers, 12 papers with code

Scalable Neural Network Kernels

1 code implementation • 20 Oct 2023 • Arijit Sehanobish, Krzysztof Choromanski, Yunfan Zhao, Avinava Dubey, Valerii Likhosherstov

We introduce the concept of scalable neural network kernels (SNNKs), the replacements of regular feedforward layers (FFLs), capable of approximating the latter, but with favorable computational properties.

Paper
Code

Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers

no code implementations • 3 Feb 2023 • Krzysztof Marcin Choromanski, Shanda Li, Valerii Likhosherstov, Kumar Avinava Dubey, Shengjie Luo, Di He, Yiming Yang, Tamas Sarlos, Thomas Weingarten, Adrian Weller

We propose a new class of linear Transformers called FourierLearner-Transformers (FLTs), which incorporate a wide range of relative positional encoding mechanisms (RPEs).

Image Classification Inductive Bias +1

Paper
Add Code

Efficient Graph Field Integrators Meet Point Clouds

1 code implementation • 2 Feb 2023 • Krzysztof Choromanski, Arijit Sehanobish, Han Lin, Yunfan Zhao, Eli Berger, Tetiana Parshakova, Alvin Pan, David Watkins, Tianyi Zhang, Valerii Likhosherstov, Somnath Basu Roy Chowdhury, Avinava Dubey, Deepali Jain, Tamas Sarlos, Snigdha Chaturvedi, Adrian Weller

We present two new classes of algorithms for efficient field integration on graphs encoding point clouds.

Paper
Code

FAVOR#: Sharp Attention Kernel Approximations via New Classes of Positive Random Features

no code implementations • 1 Feb 2023 • Valerii Likhosherstov, Krzysztof Choromanski, Avinava Dubey, Frederick Liu, Tamas Sarlos, Adrian Weller

The problem of efficient approximation of a linear operator induced by the Gaussian or softmax kernel is often addressed using random features (RFs) which yield an unbiased approximation of the operator's result.

Paper
Add Code

Simplex Random Features

1 code implementation • 31 Jan 2023 • Isaac Reid, Krzysztof Choromanski, Valerii Likhosherstov, Adrian Weller

We present Simplex Random Features (SimRFs), a new random feature (RF) mechanism for unbiased approximation of the softmax and Gaussian kernels by geometrical correlation of random projection vectors.

Paper
Code

Adaptive Computation with Elastic Input Sequence

1 code implementation • 30 Jan 2023 • Fuzhao Xue, Valerii Likhosherstov, Anurag Arnab, Neil Houlsby, Mostafa Dehghani, Yang You

However, most standard neural networks have a fixed function type and computation budget regardless of the sample's nature or difficulty.

Inductive Bias

2,988

Paper
Code

Karyotype AI for Precision Oncology

no code implementations • 20 Nov 2022 • Zahra Shamsi, Drew Bryant, Jacob Wilson, Xiaoyu Qu, Avinava Dubey, Konik Kothari, Mostafa Dehghani, Mariya Chavarha, Valerii Likhosherstov, Brian Williams, Michael Frumkin, Fred Appelbaum, Krzysztof Choromanski, Ali Bashir, Min Fang

These individual chromosomes were used to train and assess deep learning models for classifying the 24 human chromosomes and identifying chromosomal aberrations.

Few-Shot Learning Inductive Bias

Paper
Add Code

Chefs' Random Tables: Non-Trigonometric Random Features

1 code implementation • 30 May 2022 • Valerii Likhosherstov, Krzysztof Choromanski, Avinava Dubey, Frederick Liu, Tamas Sarlos, Adrian Weller

We introduce chefs' random tables (CRTs), a new class of non-trigonometric random features (RFs) to approximate Gaussian and softmax kernels.

76,582

Paper
Code

PolyViT: Co-training Vision Transformers on Images, Videos and Audio

no code implementations • 25 Nov 2021 • Valerii Likhosherstov, Anurag Arnab, Krzysztof Choromanski, Mario Lucic, Yi Tay, Adrian Weller, Mostafa Dehghani

Can we train a single transformer model capable of processing multiple modalities and datasets, whilst sharing almost all of its learnable parameters?

Audio Classification

Paper
Add Code

Hybrid Random Features

1 code implementation • ICLR 2022 • Krzysztof Choromanski, Haoxian Chen, Han Lin, Yuanzhe Ma, Arijit Sehanobish, Deepali Jain, Michael S Ryoo, Jake Varley, Andy Zeng, Valerii Likhosherstov, Dmitry Kalashnikov, Vikas Sindhwani, Adrian Weller

We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs) that automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest.

Benchmarking

Paper
Code

From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers

1 code implementation • 16 Jul 2021 • Krzysztof Choromanski, Han Lin, Haoxian Chen, Tianyi Zhang, Arijit Sehanobish, Valerii Likhosherstov, Jack Parker-Holder, Tamas Sarlos, Adrian Weller, Thomas Weingarten

In this paper we provide, to the best of our knowledge, the first comprehensive approach for incorporating various masking mechanisms into Transformers architectures in a scalable way.

Graph Attention

Paper
Code

On the Expressive Power of Self-Attention Matrices

no code implementations • 7 Jun 2021 • Valerii Likhosherstov, Krzysztof Choromanski, Adrian Weller

Our proof is constructive, enabling us to propose an algorithm for finding adaptive inputs and fixed self-attention parameters in order to approximate a given matrix.

LEMMA

Paper
Add Code

Debiasing a First-order Heuristic for Approximate Bi-level Optimization

1 code implementation • 4 Jun 2021 • Valerii Likhosherstov, Xingyou Song, Krzysztof Choromanski, Jared Davis, Adrian Weller

Approximate bi-level optimization (ABLO) consists of (outer-level) optimization problems, involving numerical (inner-level) optimization loops.

Paper
Code

Unlocking Pixels for Reinforcement Learning via Implicit Attention

no code implementations • 8 Feb 2021 • Krzysztof Marcin Choromanski, Deepali Jain, Wenhao Yu, Xingyou Song, Jack Parker-Holder, Tingnan Zhang, Valerii Likhosherstov, Aldo Pacchiano, Anirban Santara, Yunhao Tang, Jie Tan, Adrian Weller

There has recently been significant interest in training reinforcement learning (RL) agents in vision-based environments.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Sub-Linear Memory: How to Make Performers SLiM

2 code implementations • NeurIPS 2021 • Valerii Likhosherstov, Krzysztof Choromanski, Jared Davis, Xingyou Song, Adrian Weller

Recent works proposed various linear self-attention mechanisms, scaling only as $O(L)$ for serial computation.

32,745

Paper
Code

Ode to an ODE

no code implementations • NeurIPS 2020 • Krzysztof M. Choromanski, Jared Quincy Davis, Valerii Likhosherstov, Xingyou Song, Jean-Jacques Slotine, Jacob Varley, Honglak Lee, Adrian Weller, Vikas Sindhwani

We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the orthogonal group O(d).

Paper
Add Code

Rethinking Attention with Performers

12 code implementations • ICLR 2021 • Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller

We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness.

Ranked #7 on Offline RL on D4RL

D4RL Image Generation +2

76,583

Paper
Code

An Ode to an ODE

no code implementations • NeurIPS 2020 • Krzysztof Choromanski, Jared Quincy Davis, Valerii Likhosherstov, Xingyou Song, Jean-Jacques Slotine, Jacob Varley, Honglak Lee, Adrian Weller, Vikas Sindhwani

We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the orthogonal group O(d).

Paper
Add Code

UFO-BLO: Unbiased First-Order Bilevel Optimization

no code implementations • 5 Jun 2020 • Valerii Likhosherstov, Xingyou Song, Krzysztof Choromanski, Jared Davis, Adrian Weller

Bilevel optimization (BLO) is a popular approach with many applications including hyperparameter optimization, neural architecture search, adversarial robustness and model-agnostic meta-learning.

Adversarial Robustness Bilevel Optimization +4

Paper
Add Code

Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers

1 code implementation • 5 Jun 2020 • Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, David Belanger, Lucy Colwell, Adrian Weller

In response, solutions that exploit the structure and sparsity of the learned attention matrix have blossomed.

Language Modelling Masked Language Modeling

32,745

Paper
Code

CWY Parametrization: a Solution for Parallelized Optimization of Orthogonal and Stiefel Matrices

no code implementations • 18 Apr 2020 • Valerii Likhosherstov, Jared Davis, Krzysztof Choromanski, Adrian Weller

We introduce an efficient approach for optimization over orthogonal groups on highly parallel computation units such as GPUs or TPUs.

Machine Translation Translation +1

Paper
Add Code

Stochastic Flows and Geometric Optimization on the Orthogonal Group

no code implementations • ICML 2020 • Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas Sindhwani

We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group $O(d)$ and naturally reductive homogeneous manifolds obtained from the action of the rotation group $SO(d)$.

Metric Learning Stochastic Optimization

Paper
Add Code

Tractable Minor-free Generalization of Planar Zero-field Ising Models

no code implementations • 22 Oct 2019 • Valerii Likhosherstov, Yury Maximov, Michael Chertkov

We present a new family of zero-field Ising models over $N$ binary variables/spins obtained by consecutive "gluing" of planar and $O(1)$-sized components and subsets of at most three vertices into a tree.

Paper
Add Code

A New Family of Tractable Ising Models

no code implementations • 14 Jun 2019 • Valerii Likhosherstov, Yury Maximov, Michael Chertkov

To illustrate the utility of the new family of tractable graphical models, we first build an $O(N^{3/2})$ algorithm for inference and sampling of the K5-minor-free zero-field Ising models - an extension of the planar zero-field Ising models - which is neither genus- nor treewidth-bounded.

Data Structures and Algorithms Statistical Mechanics Data Analysis, Statistics and Probability Computation

Paper
Add Code

Inference and Sampling of $K_{33}$-free Ising Models

2 code implementations • 22 Dec 2018 • Valerii Likhosherstov, Yury Maximov, Michael Chertkov

We call an Ising model tractable when it is possible to compute its partition function value (statistical inference) in polynomial time.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.