Search Results for author: Zhaojin Wen

Found 1 papers, 0 papers with code

Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment

no code implementations30 Sep 2023 Tianhao Wu, Banghua Zhu, Ruoyu Zhang, Zhaojin Wen, Kannan Ramchandran, Jiantao Jiao

In summary, this work introduces a simpler yet effective approach for aligning LLMs to human preferences through relative feedback.

reinforcement-learning World Knowledge

Cannot find the paper you are looking for? You can Submit a new open access paper.