Search Results for author: Zhenglei Zhou

Found 3 papers, 0 papers with code

AntDT: A Self-Adaptive Distributed Training Framework for Leader and Straggler Nodes

no code implementations15 Apr 2024 Youshao Xiao, Lin Ju, Zhenglei Zhou, Siyuan Li, ZhaoXin Huan, Dalong Zhang, Rujie Jiang, Lin Wang, Xiaolu Zhang, Lei Liang, Jun Zhou

Previous works only address part of the stragglers and could not adaptively solve various stragglers in practice.

G-Meta: Distributed Meta Learning in GPU Clusters for Large-Scale Recommender Systems

no code implementations9 Jan 2024 Youshao Xiao, Shangchun Zhao, Zhenglei Zhou, ZhaoXin Huan, Lin Ju, Xiaolu Zhang, Lin Wang, Jun Zhou

However, the existing systems are not tailored for meta learning based DLRM models and have critical problems regarding efficiency in distributed training in the GPU cluster.

Meta-Learning Recommendation Systems

An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training

no code implementations19 Dec 2023 Youshao Xiao, Weichang Wu, Zhenglei Zhou, Fagui Mao, Shangchun Zhao, Lin Ju, Lei Liang, Xiaolu Zhang, Jun Zhou

Furthermore, our framework provides a simple user interface and allows for the agile allocation of models across devices in a fine-grained manner for various training scenarios, involving models of varying sizes and devices of different scales.

Cannot find the paper you are looking for? You can Submit a new open access paper.