Search Results for author: Owen Oertell

Found 4 papers, 3 papers with code

REBEL: Reinforcement Learning via Regressing Relative Rewards

1 code implementation • 25 Apr 2024 • Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun

While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications including the fine-tuning of generative models.

Continuous Control Image Generation +3

Paper
Code

Dataset Reset Policy Optimization for RLHF

2 code implementations • 12 Apr 2024 • Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Kianté Brantley, Dipendra Misra, Jason D. Lee, Wen Sun

Motivated by the fact that offline preference dataset provides informative states (i. e., data that is preferred by the labelers), our new algorithm, Dataset Reset Policy Optimization (DR-PO), integrates the existing offline preference dataset into the online policy training procedure via dataset reset: it directly resets the policy optimizer to the states in the offline dataset, instead of always starting from the initial state distribution.

Reinforcement Learning (RL)

205

Paper
Code

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

1 code implementation • 25 Mar 2024 • Owen Oertell, Jonathan D. Chang, Yiyi Zhang, Kianté Brantley, Wen Sun

To overcome this limitation, consistency models proposed learning a new class of generative models that directly map noise to data, resulting in a model that can generate an image in as few as one sampling iteration.

Instruction Following reinforcement-learning +2

Paper
Code

More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning

no code implementations • 11 Feb 2024 • Kaiwen Wang, Owen Oertell, Alekh Agarwal, Nathan Kallus, Wen Sun

Second-order bounds are instance-dependent bounds that scale with the variance of return, which we prove are tighter than the previously known small-loss bounds of distributional RL.

Distributional Reinforcement Learning Multi-Armed Bandits +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.