no code implementations • 17 Jan 2024 • Yao Lu, Song Bian, Lequn Chen, Yongjun He, Yulong Hui, Matthew Lentz, Beibin Li, Fei Liu, Jialin Li, Qi Liu, Rui Liu, Xiaoxuan Liu, Lin Ma, Kexin Rong, Jianguo Wang, Yingjun Wu, Yongji Wu, Huanchen Zhang, Minjia Zhang, Qizhen Zhang, Tianyi Zhou, Danyang Zhuo
In this paper, we investigate the intersection of large generative AI models and cloud-native computing architectures.
1 code implementation • 4 Oct 2023 • Danrui Qi, Jinglin Peng, Yongjun He, Jiannan Wang
This observation enables us to extend a variety of HPO and NAS algorithms to solve the Auto-FP problem.
1 code implementation • 31 Aug 2023 • Qiang Huang, Jiawei Jiang, Xi Susie Rao, Ce Zhang, Zhichao Han, Zitao Zhang, Xin Wang, Yongjun He, Quanqing Xu, Yang Zhao, Chuang Hu, Shuo Shang, Bo Du
To handle graphs in which features or connectivities are evolving over time, a series of temporal graph neural networks (TGNNs) have been proposed.
1 code implementation • 2 Jun 2022 • Jue Wang, Binhang Yuan, Luka Rimanic, Yongjun He, Tri Dao, Beidi Chen, Christopher Re, Ce Zhang
Communication compression is a crucial technique for modern distributed learning systems to alleviate their communication bottlenecks over slower networks.
1 code implementation • 2 Jun 2022 • Binhang Yuan, Yongjun He, Jared Quincy Davis, Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Re, Ce Zhang
Our key technical contribution is a scheduling algorithm that allocates different computational "tasklets" in the training of foundation models to a group of decentralized GPU devices connected by a slow heterogeneous network.
1 code implementation • 10 Nov 2021 • Xiangru Lian, Binhang Yuan, XueFeng Zhu, Yulong Wang, Yongjun He, Honghuan Wu, Lei Sun, Haodong Lyu, Chengjun Liu, Xing Dong, Yiqiao Liao, Mingnan Luo, Congfei Zhang, Jingru Xie, Haonan Li, Lei Chen, Renjie Huang, Jianying Lin, Chengchun Shu, Xuezhong Qiu, Zhishan Liu, Dongying Kong, Lei Yuan, Hai Yu, Sen yang, Ce Zhang, Ji Liu
Specifically, in order to ensure both the training efficiency and the training accuracy, we design a novel hybrid training algorithm, where the embedding layer and the dense neural network are handled by different synchronization mechanisms; then we build a system called Persia (short for parallel recommendation training system with hybrid acceleration) to support this hybrid training algorithm.
1 code implementation • 29 Oct 2020 • Yongjun He, Jiacheng Lu, Tianzheng Wang
Lightweight coroutines ease the implementation of software prefetching to hide data stalls by overlapping computation and asynchronous data prefetching.
Databases
no code implementations • 28 Jul 2020 • Baoyan Ma, Jian Zhang, Feng Cao, Yongjun He
We design a fixed proposal module to generate fixed-sized feature maps of nuclei, which allows the new information of nucleus is used for classification.