Search Results for author: Zhiyu Mei

Found 2 papers, 1 papers with code

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

no code implementations16 Apr 2024 Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu

However, in academic benchmarks, state-of-the-art results are often achieved via reward-free methods, such as Direct Preference Optimization (DPO).

Code Generation

SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores

1 code implementation29 Jun 2023 Zhiyu Mei, Wei Fu, Guangju Wang, Huanchen Zhang, Yi Wu

In a large-scale cluster, the novel architecture of SRL leads to up to 3. 7x speedup compared to the design choices adopted by the existing libraries.

reinforcement-learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.