Search Results for author: Yury Gorbachev

Found 2 papers, 2 papers with code

Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO

1 code implementation8 Nov 2023 Haim Barad, Ekaterina Aidova, Yury Gorbachev

Inference optimizations are critical for improving user experience and reducing infrastructure costs and power consumption.

Quantization Text Generation

Neural Network Compression Framework for fast model inference

2 code implementations20 Feb 2020 Alexander Kozlov, Ivan Lazarevich, Vasily Shamporov, Nikolay Lyalyushkin, Yury Gorbachev

In this work we present a new framework for neural networks compression with fine-tuning, which we called Neural Network Compression Framework (NNCF).

Binarization Neural Network Compression +1

Cannot find the paper you are looking for? You can Submit a new open access paper.