Search Results for author: Zeke Wang

Found 4 papers, 2 papers with code

DeFT: Flash Tree-attention with IO-Awareness for Efficient Tree-search-based LLM Inference

no code implementations30 Mar 2024 Jinwei Yao, Kaiqi Chen, Kexun Zhang, Jiaxuan You, Binhang Yuan, Zeke Wang, Tao Lin

Decoding using tree search can greatly enhance the inference quality for transformer-based Large Language Models (LLMs).

Benchmarking High Bandwidth Memory on FPGAs

2 code implementations9 May 2020 Zeke Wang, Hongjing Huang, Jie Zhang, Gustavo Alonso

FPGAs are starting to be enhanced with High Bandwidth Memory (HBM) as a way to reduce the memory bandwidth bottleneck encountered in some applications and to give the FPGA more capacity to deal with application state.

Hardware Architecture

Cannot find the paper you are looking for? You can Submit a new open access paper.