Search Results for author: Daniel Bershatsky

Found 6 papers, 3 papers with code

LoTR: Low Tensor Rank Weight Adaptation

no code implementations2 Feb 2024 Daniel Bershatsky, Daria Cherniuk, Talgat Daulbaev, Aleksandr Mikhalev, Ivan Oseledets

In this paper we generalize and extend an idea of low-rank adaptation (LoRA) of large language models (LLMs) based on Transformer architecture.

Tensor Decomposition

Federated Privacy-preserving Collaborative Filtering for On-Device Next App Prediction

no code implementations5 Feb 2023 Albert Sayapin, Gleb Balitskiy, Daniel Bershatsky, Aleksandr Katrutsa, Evgeny Frolov, Alexey Frolov, Ivan Oseledets, Vitaliy Kharin

Since the data about user experience are distributed among devices, the federated learning setup is used to train the proposed sequential matrix factorization model.

Collaborative Filtering Federated Learning +1

NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer

2 code implementations29 Sep 2022 Valentin Leplat, Daniil Merkulov, Aleksandr Katrutsa, Daniel Bershatsky, Olga Tsymboi, Ivan Oseledets

Classical machine learning models such as deep neural networks are usually trained by using Stochastic Gradient Descent-based (SGD) algorithms.

Survey on Large Scale Neural Network Training

no code implementations21 Feb 2022 Julia Gusak, Daria Cherniuk, Alena Shilova, Alexander Katrutsa, Daniel Bershatsky, Xunyi Zhao, Lionel Eyraud-Dubois, Oleg Shlyazhko, Denis Dimitrov, Ivan Oseledets, Olivier Beaumont

Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training.

Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction

2 code implementations1 Feb 2022 Georgii Novikov, Daniel Bershatsky, Julia Gusak, Alex Shonenkov, Denis Dimitrov, Ivan Oseledets

Every modern neural network model has quite a few pointwise nonlinearities in its architecture, and such operation induces additional memory costs which -- as we show -- can be significantly reduced by quantization of the gradients.

Neural Network Compression Quantization

Memory-Efficient Backpropagation through Large Linear Layers

2 code implementations31 Jan 2022 Daniel Bershatsky, Aleksandr Mikhalev, Alexandr Katrutsa, Julia Gusak, Daniil Merkulov, Ivan Oseledets

Also, we investigate the variance of the gradient estimate induced by the randomized matrix multiplication.

Model Compression

Cannot find the paper you are looking for? You can Submit a new open access paper.