Search Results for author: Valerii Likhosherstov

Found 25 papers, 12 papers with code

Scalable Neural Network Kernels

1 code implementation20 Oct 2023 Arijit Sehanobish, Krzysztof Choromanski, Yunfan Zhao, Avinava Dubey, Valerii Likhosherstov

We introduce the concept of scalable neural network kernels (SNNKs), the replacements of regular feedforward layers (FFLs), capable of approximating the latter, but with favorable computational properties.

FAVOR#: Sharp Attention Kernel Approximations via New Classes of Positive Random Features

no code implementations1 Feb 2023 Valerii Likhosherstov, Krzysztof Choromanski, Avinava Dubey, Frederick Liu, Tamas Sarlos, Adrian Weller

The problem of efficient approximation of a linear operator induced by the Gaussian or softmax kernel is often addressed using random features (RFs) which yield an unbiased approximation of the operator's result.

Simplex Random Features

1 code implementation31 Jan 2023 Isaac Reid, Krzysztof Choromanski, Valerii Likhosherstov, Adrian Weller

We present Simplex Random Features (SimRFs), a new random feature (RF) mechanism for unbiased approximation of the softmax and Gaussian kernels by geometrical correlation of random projection vectors.

Adaptive Computation with Elastic Input Sequence

1 code implementation30 Jan 2023 Fuzhao Xue, Valerii Likhosherstov, Anurag Arnab, Neil Houlsby, Mostafa Dehghani, Yang You

However, most standard neural networks have a fixed function type and computation budget regardless of the sample's nature or difficulty.

Inductive Bias

Karyotype AI for Precision Oncology

no code implementations20 Nov 2022 Zahra Shamsi, Drew Bryant, Jacob Wilson, Xiaoyu Qu, Avinava Dubey, Konik Kothari, Mostafa Dehghani, Mariya Chavarha, Valerii Likhosherstov, Brian Williams, Michael Frumkin, Fred Appelbaum, Krzysztof Choromanski, Ali Bashir, Min Fang

These individual chromosomes were used to train and assess deep learning models for classifying the 24 human chromosomes and identifying chromosomal aberrations.

Few-Shot Learning Inductive Bias

Chefs' Random Tables: Non-Trigonometric Random Features

1 code implementation30 May 2022 Valerii Likhosherstov, Krzysztof Choromanski, Avinava Dubey, Frederick Liu, Tamas Sarlos, Adrian Weller

We introduce chefs' random tables (CRTs), a new class of non-trigonometric random features (RFs) to approximate Gaussian and softmax kernels.

PolyViT: Co-training Vision Transformers on Images, Videos and Audio

no code implementations25 Nov 2021 Valerii Likhosherstov, Anurag Arnab, Krzysztof Choromanski, Mario Lucic, Yi Tay, Adrian Weller, Mostafa Dehghani

Can we train a single transformer model capable of processing multiple modalities and datasets, whilst sharing almost all of its learnable parameters?

Audio Classification

Hybrid Random Features

1 code implementation ICLR 2022 Krzysztof Choromanski, Haoxian Chen, Han Lin, Yuanzhe Ma, Arijit Sehanobish, Deepali Jain, Michael S Ryoo, Jake Varley, Andy Zeng, Valerii Likhosherstov, Dmitry Kalashnikov, Vikas Sindhwani, Adrian Weller

We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs) that automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest.

Benchmarking

From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers

1 code implementation16 Jul 2021 Krzysztof Choromanski, Han Lin, Haoxian Chen, Tianyi Zhang, Arijit Sehanobish, Valerii Likhosherstov, Jack Parker-Holder, Tamas Sarlos, Adrian Weller, Thomas Weingarten

In this paper we provide, to the best of our knowledge, the first comprehensive approach for incorporating various masking mechanisms into Transformers architectures in a scalable way.

Graph Attention

On the Expressive Power of Self-Attention Matrices

no code implementations7 Jun 2021 Valerii Likhosherstov, Krzysztof Choromanski, Adrian Weller

Our proof is constructive, enabling us to propose an algorithm for finding adaptive inputs and fixed self-attention parameters in order to approximate a given matrix.

LEMMA

Debiasing a First-order Heuristic for Approximate Bi-level Optimization

1 code implementation4 Jun 2021 Valerii Likhosherstov, Xingyou Song, Krzysztof Choromanski, Jared Davis, Adrian Weller

Approximate bi-level optimization (ABLO) consists of (outer-level) optimization problems, involving numerical (inner-level) optimization loops.

Sub-Linear Memory: How to Make Performers SLiM

2 code implementations NeurIPS 2021 Valerii Likhosherstov, Krzysztof Choromanski, Jared Davis, Xingyou Song, Adrian Weller

Recent works proposed various linear self-attention mechanisms, scaling only as $O(L)$ for serial computation.

Ode to an ODE

no code implementations NeurIPS 2020 Krzysztof M. Choromanski, Jared Quincy Davis, Valerii Likhosherstov, Xingyou Song, Jean-Jacques Slotine, Jacob Varley, Honglak Lee, Adrian Weller, Vikas Sindhwani

We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the orthogonal group O(d).

Rethinking Attention with Performers

12 code implementations ICLR 2021 Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller

We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness.

D4RL Image Generation +2

An Ode to an ODE

no code implementations NeurIPS 2020 Krzysztof Choromanski, Jared Quincy Davis, Valerii Likhosherstov, Xingyou Song, Jean-Jacques Slotine, Jacob Varley, Honglak Lee, Adrian Weller, Vikas Sindhwani

We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the orthogonal group O(d).

UFO-BLO: Unbiased First-Order Bilevel Optimization

no code implementations5 Jun 2020 Valerii Likhosherstov, Xingyou Song, Krzysztof Choromanski, Jared Davis, Adrian Weller

Bilevel optimization (BLO) is a popular approach with many applications including hyperparameter optimization, neural architecture search, adversarial robustness and model-agnostic meta-learning.

Adversarial Robustness Bilevel Optimization +4

CWY Parametrization: a Solution for Parallelized Optimization of Orthogonal and Stiefel Matrices

no code implementations18 Apr 2020 Valerii Likhosherstov, Jared Davis, Krzysztof Choromanski, Adrian Weller

We introduce an efficient approach for optimization over orthogonal groups on highly parallel computation units such as GPUs or TPUs.

Machine Translation Translation +1

Stochastic Flows and Geometric Optimization on the Orthogonal Group

no code implementations ICML 2020 Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas Sindhwani

We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group $O(d)$ and naturally reductive homogeneous manifolds obtained from the action of the rotation group $SO(d)$.

Metric Learning Stochastic Optimization

Tractable Minor-free Generalization of Planar Zero-field Ising Models

no code implementations22 Oct 2019 Valerii Likhosherstov, Yury Maximov, Michael Chertkov

We present a new family of zero-field Ising models over $N$ binary variables/spins obtained by consecutive "gluing" of planar and $O(1)$-sized components and subsets of at most three vertices into a tree.

A New Family of Tractable Ising Models

no code implementations14 Jun 2019 Valerii Likhosherstov, Yury Maximov, Michael Chertkov

To illustrate the utility of the new family of tractable graphical models, we first build an $O(N^{3/2})$ algorithm for inference and sampling of the K5-minor-free zero-field Ising models - an extension of the planar zero-field Ising models - which is neither genus- nor treewidth-bounded.

Data Structures and Algorithms Statistical Mechanics Data Analysis, Statistics and Probability Computation

Inference and Sampling of $K_{33}$-free Ising Models

2 code implementations22 Dec 2018 Valerii Likhosherstov, Yury Maximov, Michael Chertkov

We call an Ising model tractable when it is possible to compute its partition function value (statistical inference) in polynomial time.

Cannot find the paper you are looking for? You can Submit a new open access paper.