1 code implementation • 15 Mar 2024 • Xuanlei Zhao, Shenggan Cheng, Zangwei Zheng, Zheming Yang, Ziming Liu, Yang You
Scaling large models with long sequences across applications like language generation, video generation and multimodal tasks requires efficient sequence parallelism.