no code implementations • 2 Feb 2024 • Daniel Bershatsky, Daria Cherniuk, Talgat Daulbaev, Aleksandr Mikhalev, Ivan Oseledets
In this paper we generalize and extend an idea of low-rank adaptation (LoRA) of large language models (LLMs) based on Transformer architecture.
no code implementations • 5 Feb 2023 • Albert Sayapin, Gleb Balitskiy, Daniel Bershatsky, Aleksandr Katrutsa, Evgeny Frolov, Alexey Frolov, Ivan Oseledets, Vitaliy Kharin
Since the data about user experience are distributed among devices, the federated learning setup is used to train the proposed sequential matrix factorization model.
2 code implementations • 29 Sep 2022 • Valentin Leplat, Daniil Merkulov, Aleksandr Katrutsa, Daniel Bershatsky, Olga Tsymboi, Ivan Oseledets
Classical machine learning models such as deep neural networks are usually trained by using Stochastic Gradient Descent-based (SGD) algorithms.
no code implementations • 21 Feb 2022 • Julia Gusak, Daria Cherniuk, Alena Shilova, Alexander Katrutsa, Daniel Bershatsky, Xunyi Zhao, Lionel Eyraud-Dubois, Oleg Shlyazhko, Denis Dimitrov, Ivan Oseledets, Olivier Beaumont
Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training.
2 code implementations • 1 Feb 2022 • Georgii Novikov, Daniel Bershatsky, Julia Gusak, Alex Shonenkov, Denis Dimitrov, Ivan Oseledets
Every modern neural network model has quite a few pointwise nonlinearities in its architecture, and such operation induces additional memory costs which -- as we show -- can be significantly reduced by quantization of the gradients.
2 code implementations • 31 Jan 2022 • Daniel Bershatsky, Aleksandr Mikhalev, Alexandr Katrutsa, Julia Gusak, Daniil Merkulov, Ivan Oseledets
Also, we investigate the variance of the gradient estimate induced by the randomized matrix multiplication.