Search Results for author: Anze Xie

Found 1 papers, 1 papers with code

DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training

1 code implementation5 Oct 2023 Dacheng Li, Rulin Shao, Anze Xie, Eric P. Xing, Xuezhe Ma, Ion Stoica, Joseph E. Gonzalez, Hao Zhang

FlashAttention (Dao, 2023) effectively reduces the quadratic peak memory usage to linear in training transformer-based large language models (LLMs) on a single GPU.

Cannot find the paper you are looking for? You can Submit a new open access paper.