Search Results for author: Daniel Bershatsky

Found 6 papers, 3 papers with code

LoTR: Low Tensor Rank Weight Adaptation

no code implementations • 2 Feb 2024 • Daniel Bershatsky, Daria Cherniuk, Talgat Daulbaev, Aleksandr Mikhalev, Ivan Oseledets

In this paper we generalize and extend an idea of low-rank adaptation (LoRA) of large language models (LLMs) based on Transformer architecture.

Tensor Decomposition

Paper
Add Code

Federated Privacy-preserving Collaborative Filtering for On-Device Next App Prediction

no code implementations • 5 Feb 2023 • Albert Sayapin, Gleb Balitskiy, Daniel Bershatsky, Aleksandr Katrutsa, Evgeny Frolov, Alexey Frolov, Ivan Oseledets, Vitaliy Kharin

Since the data about user experience are distributed among devices, the federated learning setup is used to train the proposed sequential matrix factorization model.

Collaborative Filtering Federated Learning +1

Paper
Add Code

NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer

2 code implementations • 29 Sep 2022 • Valentin Leplat, Daniil Merkulov, Aleksandr Katrutsa, Daniel Bershatsky, Olga Tsymboi, Ivan Oseledets

Classical machine learning models such as deep neural networks are usually trained by using Stochastic Gradient Descent-based (SGD) algorithms.

Paper
Code

Survey on Large Scale Neural Network Training

no code implementations • 21 Feb 2022 • Julia Gusak, Daria Cherniuk, Alena Shilova, Alexander Katrutsa, Daniel Bershatsky, Xunyi Zhao, Lionel Eyraud-Dubois, Oleg Shlyazhko, Denis Dimitrov, Ivan Oseledets, Olivier Beaumont

Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training.

Paper
Add Code

Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction

2 code implementations • 1 Feb 2022 • Georgii Novikov, Daniel Bershatsky, Julia Gusak, Alex Shonenkov, Denis Dimitrov, Ivan Oseledets

Every modern neural network model has quite a few pointwise nonlinearities in its architecture, and such operation induces additional memory costs which -- as we show -- can be significantly reduced by quantization of the gradients.

Neural Network Compression Quantization

Paper
Code

Memory-Efficient Backpropagation through Large Linear Layers

2 code implementations • 31 Jan 2022 • Daniel Bershatsky, Aleksandr Mikhalev, Alexandr Katrutsa, Julia Gusak, Daniil Merkulov, Ivan Oseledets

Also, we investigate the variance of the gradient estimate induced by the randomized matrix multiplication.

Model Compression

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.