Search Results for author: Zhiyu Mei

Found 2 papers, 1 papers with code

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

no code implementations • 16 Apr 2024 • Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu

However, in academic benchmarks, state-of-the-art results are often achieved via reward-free methods, such as Direct Preference Optimization (DPO).

Code Generation

Paper
Add Code

SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores

1 code implementation • 29 Jun 2023 • Zhiyu Mei, Wei Fu, Guangju Wang, Huanchen Zhang, Yi Wu

In a large-scale cluster, the novel architecture of SRL leads to up to 3. 7x speedup compared to the design choices adopted by the existing libraries.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.