no code implementations • NeurIPS 2023 • Alexander Borzunov, Max Ryabinin, Artem Chumachenko, Dmitry Baranchuk, Tim Dettmers, Younes Belkada, Pavel Samygin, Colin Raffel
Large language models (LLMs) are useful in many NLP tasks and become more capable with size, with the best open-source models having over 50 billion parameters.
1 code implementation • 2 Sep 2022 • Alexander Borzunov, Dmitry Baranchuk, Tim Dettmers, Max Ryabinin, Younes Belkada, Artem Chumachenko, Pavel Samygin, Colin Raffel
However, these techniques have innate limitations: offloading is too slow for interactive inference, while APIs are not flexible enough for research that requires access to weights, attention or logits.
no code implementations • 14 Oct 2020 • Artem Chumachenko, Daniil Gavrilov, Nikita Balagansky, Pavel Kalaidin
We also proposed a variant of Weight Squeezing called Gated Weight Squeezing, for which we combined fine-tuning of BERT-Medium model and learning mapping from BERT-Base weights.