Search Results for author: Jonathan D. Chang

Found 8 papers, 6 papers with code

REBEL: Reinforcement Learning via Regressing Relative Rewards

1 code implementation25 Apr 2024 Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun

While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications including the fine-tuning of generative models.

Continuous Control Image Generation +3

Adversarial Imitation Learning via Boosting

no code implementations12 Apr 2024 Jonathan D. Chang, Dhruv Sreenivas, Yingbing Huang, Kianté Brantley, Wen Sun

In the weighted replay buffer, the contribution of the data from older policies are properly discounted with the weight computed based on the boosting framework.

Imitation Learning

Dataset Reset Policy Optimization for RLHF

2 code implementations12 Apr 2024 Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Kianté Brantley, Dipendra Misra, Jason D. Lee, Wen Sun

Motivated by the fact that offline preference dataset provides informative states (i. e., data that is preferred by the labelers), our new algorithm, Dataset Reset Policy Optimization (DR-PO), integrates the existing offline preference dataset into the online policy training procedure via dataset reset: it directly resets the policy optimizer to the states in the offline dataset, instead of always starting from the initial state distribution.

Reinforcement Learning (RL)

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

1 code implementation25 Mar 2024 Owen Oertell, Jonathan D. Chang, Yiyi Zhang, Kianté Brantley, Wen Sun

To overcome this limitation, consistency models proposed learning a new class of generative models that directly map noise to data, resulting in a model that can generate an image in as few as one sampling iteration.

Instruction Following reinforcement-learning +2

Policy-Gradient Training of Language Models for Ranking

no code implementations6 Oct 2023 Ge Gao, Jonathan D. Chang, Claire Cardie, Kianté Brantley, Thorsten Joachim

Text retrieval plays a crucial role in incorporating factual knowledge for decision making into language processing pipelines, ranging from chat-based web search to question answering systems.

Decision Making Domain Generalization +3

Learning to Generate Better Than Your LLM

1 code implementation20 Jun 2023 Jonathan D. Chang, Kiante Brantley, Rajkumar Ramamurthy, Dipendra Misra, Wen Sun

In particular, we extend RL algorithms to allow them to interact with a dynamic black-box guide LLM and propose RL with guided feedback (RLGF), a suite of RL algorithms for LLM fine-tuning.

Conditional Text Generation reinforcement-learning +1

Learning Bellman Complete Representations for Offline Policy Evaluation

1 code implementation12 Jul 2022 Jonathan D. Chang, Kaiwen Wang, Nathan Kallus, Wen Sun

We study representation learning for Offline Reinforcement Learning (RL), focusing on the important task of Offline Policy Evaluation (OPE).

Continuous Control Reinforcement Learning (RL) +1

Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage

1 code implementation NeurIPS 2021 Jonathan D. Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun

Instead, the learner is presented with a static offline dataset of state-action-next state transition triples from a potentially less proficient behavior policy.

Continuous Control Imitation Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.