Search Results for author: Minsoo Rhu

Found 20 papers, 3 papers with code

LazyDP: Co-Designing Algorithm-Software for Scalable Training of Differentially Private Recommendation Models

no code implementations • 12 Apr 2024 • Juntaek Lim, Youngeun Kwon, Ranggi Hwang, Kiwan Maeng, G. Edward Suh, Minsoo Rhu

Differential privacy (DP) is widely being employed in the industry as a practical standard for privacy protection.

Recommendation Systems

Paper
Add Code

vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training

no code implementations • 27 Nov 2023 • Jehyeon Bang, Yujeong Choi, Myeongwoo Kim, YongDeok Kim, Minsoo Rhu

As large language models (LLMs) become widespread in various application domains, a critical challenge the AI community is facing is how to train these large AI models in a cost-effective manner.

Language Modelling Large Language Model

Paper
Add Code

Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations

no code implementations • 23 Feb 2023 • Yujeong Choi, John Kim, Minsoo Rhu

While providing low latency is a fundamental requirement in deploying recommendation services, achieving high resource utility is also crucial in cost-effectively maintaining the datacenter.

Paper
Add Code

GPU-based Private Information Retrieval for On-Device Machine Learning Inference

1 code implementation • 26 Jan 2023 • Maximilian Lam, Jeff Johnson, Wenjie Xiong, Kiwan Maeng, Udit Gupta, Yang Li, Liangzhen Lai, Ilias Leontiadis, Minsoo Rhu, Hsien-Hsin S. Lee, Vijay Janapa Reddi, Gu-Yeon Wei, David Brooks, G. Edward Suh

Together, for various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to $100, 000$ queries per second -- a $>100 \times$ throughput improvement over a CPU-based baseline -- while maintaining model accuracy.

Information Retrieval Language Modelling +1

Paper
Code

DiVa: An Accelerator for Differentially Private Machine Learning

no code implementations • 26 Aug 2022 • Beomsik Park, Ranggi Hwang, Dongho Yoon, Yoonhyuk Choi, Minsoo Rhu

The widespread deployment of machine learning (ML) is raising serious concerns on protecting the privacy of users who contributed to the collection of training data.

Paper
Add Code

Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards

no code implementations • 10 May 2022 • Youngeun Kwon, Minsoo Rhu

Prior work proposed to cache frequently accessed embeddings inside GPU memory as means to filter down the embedding layer traffic to CPU memory, but this paper observes several limitations with such cache design.

Recommendation Systems

Paper
Add Code

SmartSAGE: Training Large-scale Graph Neural Networks using In-Storage Processing Architectures

no code implementations • 10 May 2022 • Yunjae Lee, Jinha Chung, Minsoo Rhu

Our work demonstrates that an ISP based large-scale GNN training system can achieve both high capacity storage and high performance, opening up opportunities for ML practitioners to train large GNN datasets without being hampered by the physical limitations of main memory size.

Paper
Add Code

GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks

no code implementations • 1 Mar 2022 • Ranggi Hwang, Minhoo Kang, Jiwon Lee, Dongyun Kam, Youngjoo Lee, Minsoo Rhu

Graph convolutional neural networks (GCNs) have emerged as a key technology in various application domains where the input data is relational.

Paper
Add Code

PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers

no code implementations • 27 Feb 2022 • Yunseong Kim, Yujeong Choi, Minsoo Rhu

However, maximizing server utilization and system throughput is also crucial for ML service providers as it helps lower the total-cost-of-ownership.

Scheduling

Paper
Add Code

Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training

no code implementations • 25 Oct 2020 • Youngeun Kwon, Yunjae Lee, Minsoo Rhu

Personalized recommendations are one of the most widely deployed machine learning (ML) workload serviced from cloud datacenters.

Paper
Add Code

LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference

no code implementations • 25 Oct 2020 • Yujeong Choi, Yunseong Kim, Minsoo Rhu

In cloud ML inference systems, batching is an essential technique to increase throughput which helps optimize total-cost-of-ownership.

BIG-bench Machine Learning Scheduling

Paper
Add Code

Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations

no code implementations • 12 May 2020 • Ranggi Hwang, Taehun Kim, Youngeun Kwon, Minsoo Rhu

Personalized recommendations are the backbone machine learning (ML) algorithm that powers several important application domains (e. g., ads, e-commerce, etc) serviced from cloud datacenters.

Paper
Add Code

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units

no code implementations • 15 Nov 2019 • Bongjoon Hyun, Youngeun Kwon, Yujeong Choi, John Kim, Minsoo Rhu

To satisfy the compute and memory demands of deep neural networks, neural processing units (NPUs) are widely being utilized for accelerating deep learning algorithms.

Management Translation

Paper
Add Code

PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units

1 code implementation • 6 Sep 2019 • Yujeong Choi, Minsoo Rhu

To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources among multiple DNN service requests.

Scheduling

Paper
Code

TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning

no code implementations • 8 Aug 2019 • Youngeun Kwon, Yunjae Lee, Minsoo Rhu

Recent studies from several hyperscalars pinpoint to embedding layers as the most memory-intensive deep learning (DL) algorithm being deployed in today's datacenters.

Recommendation Systems

Paper
Add Code

Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning

no code implementations • 18 Feb 2019 • Youngeun Kwon, Minsoo Rhu

As the models and the datasets to train deep learning (DL) models scale, system architects are faced with new challenges, one of which is the memory capacity bottleneck, where the limited physical memory inside the accelerator device constrains the algorithm that can be studied.

Paper
Add Code

Structurally Sparsified Backward Propagation for Faster Long Short-Term Memory Training

no code implementations • 1 Jun 2018 • Maohua Zhu, Jason Clemons, Jeff Pool, Minsoo Rhu, Stephen W. Keckler, Yuan Xie

Further, we can enforce structured sparsity in the gate gradients to make the LSTM backward pass up to 45% faster than the state-of-the-art dense approach and 168% faster than the state-of-the-art sparsifying method on modern GPUs.

Paper
Add Code

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks

no code implementations • 23 May 2017 • Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, William J. Dally

Convolutional Neural Networks (CNNs) have emerged as a fundamental technology for machine learning.

Autonomous Vehicles Network Pruning

Paper
Add Code

Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks

no code implementations • 3 May 2017 • Minsoo Rhu, Mike O'Connor, Niladrish Chatterjee, Jeff Pool, Stephen W. Keckler

Popular deep learning frameworks require users to fine-tune their memory usage so that the training data of a deep neural network (DNN) fits within the GPU physical memory.

Paper
Add Code

vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design

4 code implementations • 25 Feb 2016 • Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, Stephen W. Keckler

The most widely used machine learning frameworks require users to carefully tune their memory usage so that the deep neural network (DNN) fits into the DRAM capacity of a GPU.

BIG-bench Machine Learning Efficient Neural Network

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.