no code implementations • 3 May 2024 • Shaoyuan Chen, Yutong Lin, Mingxing Zhang, Yongwei Wu
To enhance the efficiency and cost-effectiveness of LLM serving, we introduce the concept of attention offloading.
no code implementations • CVPR 2021 • Mingxing Zhang, Yang Yang, Xinghan Chen, Yanli Ji, Xing Xu, Jingjing Li, Heng Tao Shen
Then for a moment candidate, we concatenate the starting/middle/ending representations of its starting/middle/ending elements respectively to form the final moment representation.
no code implementations • 10 Dec 2020 • Mingxing Zhang, Zhengchun Zhou, Lanping Li, Zilong Liu, Meng Yang, Yanghe Feng
Sequences play an important role in many engineering applications and systems.