Search Results for author: Xing Huang

Found 2 papers, 1 papers with code

Preference as Reward, Maximum Preference Optimization with Importance Sampling

no code implementations • 27 Dec 2023 • Zaifan Jiang, Xing Huang, Chao Wei

Reinforcement Learning from Human Feedback (RLHF) is a model-based algorithm to optimize preference learning, which first fits a reward model for preference scores and then optimizes the generating policy with an on-policy PPO algorithm to maximize the reward.

Paper
Add Code

OpenArray v1.0: a simple operator library for the decoupling of ocean modeling and parallel computing

1 code implementation • Geoscientific Model Development 2019 • Xiaomeng Huang, Xing Huang, Dong Wang, Qi Wu, Yi Li, Shixun Zhang, YuWen Chen, Mingqing Wang, Yuan Gao, Qiang Tang, Yue Chen, Zheng Fang, Zhenya Song, Guangwen Yang

In this work, we design a simple computing library to bridge the gap and decouple the work of ocean modeling from parallel computing.

334

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.