Search Results for author: John Kim

Found 6 papers, 2 papers with code

NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled Spatial Accelerator

1 code implementation23 Apr 2024 Kaustubh Shivdikar, Nicolas Bohm Agostini, Malith Jayaweera, Gilbert Jonatan, Jose L. Abellan, Ajay Joshi, John Kim, David Kaeli

We introduce a rolling eviction strategy to mitigate data idling in on-chip memory as well as address the prevalent issue of memory bloat in sparse graph computations.

Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations

no code implementations23 Feb 2023 Yujeong Choi, John Kim, Minsoo Rhu

While providing low latency is a fundamental requirement in deploying recommendation services, achieving high resource utility is also crucial in cost-effectively maintaining the datacenter.

Answer Fast: Accelerating BERT on the Tensor Streaming Processor

no code implementations22 Jun 2022 Ibrahim Ahmed, Sahil Parmar, Matthew Boyd, Michael Beidler, Kris Kang, Bill Liu, Kyle Roach, John Kim, Dennis Abts

Transformers have become a predominant machine learning workload, they are not only the de-facto standard for natural language processing tasks, but they are also being deployed in other domains such as vision and speech recognition.

Machine Translation speech-recognition +1

Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems

no code implementations20 Mar 2020 Maxim Naumov, John Kim, Dheevatsa Mudigere, Srinivas Sridharan, Xiaodong Wang, Whitney Zhao, Serhat Yilmaz, Changkyu Kim, Hector Yuen, Mustafa Ozdal, Krishnakumar Nair, Isabel Gao, Bor-Yiing Su, Jiyan Yang, Mikhail Smelyanskiy

Large-scale training is important to ensure high performance and accuracy of machine-learning models.

Distributed, Parallel, and Cluster Computing 68T05, 68M10 H.3.3; I.2.6; C.2.1

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units

no code implementations15 Nov 2019 Bongjoon Hyun, Youngeun Kwon, Yujeong Choi, John Kim, Minsoo Rhu

To satisfy the compute and memory demands of deep neural networks, neural processing units (NPUs) are widely being utilized for accelerating deep learning algorithms.

Management Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.