Search Results for author: W. Bradley Knox

Found 5 papers, 3 papers with code

Contrastive Preference Learning: Learning from Human Feedback without RL

1 code implementation • 20 Oct 2023 • Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh

Thus, learning a reward function from feedback is not only based on a flawed assumption of human preference, but also leads to unwieldy optimization challenges that stem from policy gradients or bootstrapping in the RL phase.

reinforcement-learning Reinforcement Learning (RL)

133

Paper
Code

Learning Optimal Advantage from Preferences and Mistaking it for Reward

1 code implementation • 3 Oct 2023 • W. Bradley Knox, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson, Serena Booth, Anca Dragan, Peter Stone, Scott Niekum

Most recent work assumes that human preferences are generated based only upon the reward accrued within those segments, or their partial return.

Paper
Code

Models of human preference for learning reward functions

no code implementations • 5 Jun 2022 • W. Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, Alessandro Allievi

We empirically show that our proposed regret preference model outperforms the partial return preference model with finite training data in otherwise the same setting.

Decision Making reinforcement-learning

Paper
Add Code

Reward (Mis)design for Autonomous Driving

no code implementations • 28 Apr 2021 • W. Bradley Knox, Alessandro Allievi, Holger Banzhaf, Felix Schmitt, Peter Stone

This article considers the problem of diagnosing certain common errors in reward design.

Autonomous Driving reinforcement-learning +1

Paper
Add Code

The EMPATHIC Framework for Task Learning from Implicit Human Feedback

1 code implementation • 28 Sep 2020 • Yuchen Cui, Qiping Zhang, Alessandro Allievi, Peter Stone, Scott Niekum, W. Bradley Knox

We train a deep neural network on this data and demonstrate its ability to (1) infer relative reward ranking of events in the training task from prerecorded human facial reactions; (2) improve the policy of an agent in the training task using live human facial reactions; and (3) transfer to a novel domain in which it evaluates robot manipulation trajectories.

Human-Computer Interaction Robotics

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.