Search Results for author: Yunshen Wei

Found 2 papers, 2 papers with code

TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer

2 code implementations • 27 Jul 2023 • Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen, Xiaodong Han, Yunshen Wei, Baohong Lv, Xiao Luo, Yu Qiao, Yiran Zhong

TransNormerLLM evolves from the previous linear attention architecture TransNormer by making advanced modifications that include positional embedding, linear attention acceleration, gating mechanisms, tensor normalization, and inference acceleration and stabilization.

Language Modelling Large Language Model

217

Paper
Code

cosFormer: Rethinking Softmax in Attention

3 code implementations • ICLR 2022 • Zhen Qin, Weixuan Sun, Hui Deng, Dongxu Li, Yunshen Wei, Baohong Lv, Junjie Yan, Lingpeng Kong, Yiran Zhong

As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length.

Ranked #4 on Offline RL on D4RL

D4RL Language Modelling +1

174

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.