Search Results for author: Zeke Wang

Found 4 papers, 2 papers with code

DeFT: Flash Tree-attention with IO-Awareness for Efficient Tree-search-based LLM Inference

no code implementations • 30 Mar 2024 • Jinwei Yao, Kaiqi Chen, Kexun Zhang, Jiaxuan You, Binhang Yuan, Zeke Wang, Tao Lin

Decoding using tree search can greatly enhance the inference quality for transformer-based Large Language Models (LLMs).

Paper
Add Code

MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive Multi-Accelerator Systems

no code implementations • 23 Jul 2023 • Guan Shen, Jieru Zhao, Zeke Wang, Zhe Lin, Wenchao Ding, Chentao Wu, Quan Chen, Minyi Guo

Along with the fast evolution of deep neural networks, the hardware system is also developing rapidly.

Paper
Add Code

Benchmarking High Bandwidth Memory on FPGAs

2 code implementations • 9 May 2020 • Zeke Wang, Hongjing Huang, Jie Zhang, Gustavo Alonso

FPGAs are starting to be enhanced with High Bandwidth Memory (HBM) as a way to reduce the memory bandwidth bottleneck encountered in some applications and to give the FPGA more capacity to deal with application state.

Hardware Architecture

Paper
Code

Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning (Technical Report)

1 code implementation • 8 Mar 2019 • Zeke Wang, Kaan Kara, Hantian Zhang, Gustavo Alonso, Onur Mutlu, Ce Zhang

Learning from the data stored in a database is an important function increasingly available in relational engines.

Quantization Retrieval

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.