Search Results for author: Shivaram Venkataraman

Found 23 papers, 10 papers with code

CHAI: Clustered Head Attention for Efficient LLM Inference

no code implementations • 12 Mar 2024 • Saurabh Agarwal, Bilge Acun, Basil Hosmer, Mostafa Elhoushi, Yejin Lee, Shivaram Venkataraman, Dimitris Papailiopoulos, Carole-Jean Wu

We observe that there is a high amount of redundancy across heads on which tokens they pay attention to.

Paper
Add Code

Decoding Speculative Decoding

1 code implementation • 2 Feb 2024 • Minghao Yan, Saurabh Agarwal, Shivaram Venkataraman

Speculative Decoding is a widely used technique to speed up inference for Large Language Models (LLMs) without sacrificing quality.

Language Modelling

Paper
Code

PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices

no code implementations • 30 Oct 2023 • Minghao Yan, Hongyi Wang, Shivaram Venkataraman

As neural networks (NN) are deployed across diverse sectors, their energy demand correspondingly grows.

Bayesian Optimization Efficient Neural Network

Paper
Add Code

Does compressing activations help model parallel training?

no code implementations • 6 Jan 2023 • Song Bian, Dacheng Li, Hongyi Wang, Eric P. Xing, Shivaram Venkataraman

Finally, we provide insights for future development of model parallelism compression algorithms.

Quantization

Paper
Add Code

BagPipe: Accelerating Deep Recommendation Model Training

no code implementations • 24 Feb 2022 • Saurabh Agarwal, Chengpo Yan, Ziyi Zhang, Shivaram Venkataraman

Based on these insights, we develop Bagpipe, a system for training deep recommendation models that uses caching and prefetching to overlap remote embedding accesses with the computation.

Paper
Add Code

MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks

1 code implementation • 4 Feb 2022 • Roger Waleffe, Jason Mohoney, Theodoros Rekatsinas, Shivaram Venkataraman

We study training of Graph Neural Networks (GNNs) for large-scale graphs.

159

Paper
Code

Doing More by Doing Less: How Structured Partial Backpropagation Improves Deep Learning Clusters

1 code implementation • 20 Nov 2021 • Adarsh Kumar, Kausik Subramanian, Shivaram Venkataraman, Aditya Akella

This simultaneously reduces network bandwidth, compute utilization, and memory footprint while preserving model quality.

Scheduling

Paper
Code

KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks

3 code implementations • 4 Jul 2021 • J. Gregory Pauloski, Qi Huang, Lei Huang, Shivaram Venkataraman, Kyle Chard, Ian Foster, Zhao Zhang

Kronecker-factored Approximate Curvature (K-FAC) has recently been shown to converge faster in deep neural network (DNN) training than stochastic gradient descent (SGD); however, K-FAC's larger memory footprint hinders its applicability to large models.

Paper
Code

On the Utility of Gradient Compression in Distributed Training Systems

1 code implementation • 28 Feb 2021 • Saurabh Agarwal, Hongyi Wang, Shivaram Venkataraman, Dimitris Papailiopoulos

A rich body of prior work has highlighted the existence of communication bottlenecks in synchronous data-parallel training.

Model Compression

Paper
Code

AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning

1 code implementation • 2 Feb 2021 • YuHan Liu, Saurabh Agarwal, Shivaram Venkataraman

With the rapid adoption of machine learning (ML), a number of domains now use the approach of fine tuning models which were pre-trained on a large corpus of data.

Paper
Code

Marius: Learning Massive Graph Embeddings on a Single Machine

1 code implementation • 20 Jan 2021 • Jason Mohoney, Roger Waleffe, Yiheng Xu, Theodoros Rekatsinas, Shivaram Venkataraman

We propose a new framework for computing the embeddings of large-scale graphs on a single machine.

Graph Embedding

159

Paper
Code

Accelerating Deep Learning Inference via Learned Caches

no code implementations • 18 Jan 2021 • Arjun Balasubramanian, Adarsh Kumar, YuHan Liu, Han Cao, Shivaram Venkataraman, Aditya Akella

We present the design of GATI, an end-to-end prediction serving system that incorporates learned caches for low-latency DNN inference.

Paper
Add Code

Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification

3 code implementations • 29 Oct 2020 • Saurabh Agarwal, Hongyi Wang, Kangwook Lee, Shivaram Venkataraman, Dimitris Papailiopoulos

The techniques usually require choosing a static compression ratio, often requiring users to balance the trade-off between model accuracy and per-iteration speedup.

Quantization

133

Paper
Code

Accelerating Deep Learning Inference via Freezing

no code implementations • 7 Feb 2020 • Adarsh Kumar, Arjun Balasubramanian, Shivaram Venkataraman, Aditya Akella

In this work, we observe that caching intermediate layer outputs can help us avoid running all the layers of a DNN for a sizeable fraction of inference requests.

Quantization

Paper
Add Code

Blink: Fast and Generic Collectives for Distributed ML

no code implementations • 11 Oct 2019 • Guanhua Wang, Shivaram Venkataraman, Amar Phanishayee, Jorgen Thelin, Nikhil Devanur, Ion Stoica

Model parameter synchronization across GPUs introduces high overheads for data-parallel training at scale.

Image Classification

Paper
Add Code

Parity Models: A General Framework for Coding-Based Resilience in ML Inference

no code implementations • 2 May 2019 • Jack Kosaian, K. V. Rashmi, Shivaram Venkataraman

In order to scale to high query rates, prediction serving systems are run on many machines in cluster settings, and thus are prone to slowdowns and failures that inflate tail latency and cause violations of strict latency targets.

BIG-bench Machine Learning Image Classification +3

Paper
Add Code

MLSys: The New Frontier of Machine Learning Systems

no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar

Machine learning (ML) techniques are enjoying rapidly increasing adoption.

BIG-bench Machine Learning

Paper
Add Code

Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads

1 code implementation • 17 Jan 2019 • Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, Fan Yang

With widespread advances in machine learning, a number of large enterprises are beginning to incorporate machine learning models across a number of products.

Distributed, Parallel, and Cluster Computing

163

Paper
Code

Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation

3 code implementations • 4 Jun 2018 • Jack Kosaian, K. V. Rashmi, Shivaram Venkataraman

To the best of our knowledge, this work proposes the first learning-based approach for designing codes, and also presents the first coding-theoretic solution that can provide resilience for any non-linear (differentiable) computation.

BIG-bench Machine Learning

Paper
Code

Hemingway: Modeling Distributed Optimization Algorithms

no code implementations • 20 Feb 2017 • Xinghao Pan, Shivaram Venkataraman, Zizheng Tai, Joseph Gonzalez

Distributed optimization algorithms are widely used in many industrial machine learning applications.

Distributed Optimization

Paper
Add Code

KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics

no code implementations • 29 Oct 2016 • Evan R. Sparks, Shivaram Venkataraman, Tomer Kaftan, Michael J. Franklin, Benjamin Recht

Modern advanced analytics applications make use of machine learning techniques and contain multiple steps of domain-specific and general-purpose processing with high resource requirements.

BIG-bench Machine Learning General Classification +1

Paper
Add Code

Large Scale Kernel Learning using Block Coordinate Descent

no code implementations • 17 Feb 2016 • Stephen Tu, Rebecca Roelofs, Shivaram Venkataraman, Benjamin Recht

We demonstrate that distributed block coordinate descent can quickly solve kernel regression and classification problems with millions of data points.

Classification General Classification +1

Paper
Add Code

MLlib: Machine Learning in Apache Spark

no code implementations • 26 May 2015 • Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, Ameet Talwalkar

Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks.

BIG-bench Machine Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.