no code implementations • 6 Jul 2023 • Andrew Levy, Sreehari Rammohan, Alessandro Allievi, Scott Niekum, George Konidaris
Our framework makes two specific contributions.
no code implementations • 5 Jun 2022 • W. Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, Alessandro Allievi
We empirically show that our proposed regret preference model outperforms the partial return preference model with finite training data in otherwise the same setting.
no code implementations • 28 Apr 2021 • W. Bradley Knox, Alessandro Allievi, Holger Banzhaf, Felix Schmitt, Peter Stone
This article considers the problem of diagnosing certain common errors in reward design.
1 code implementation • 28 Sep 2020 • Yuchen Cui, Qiping Zhang, Alessandro Allievi, Peter Stone, Scott Niekum, W. Bradley Knox
We train a deep neural network on this data and demonstrate its ability to (1) infer relative reward ranking of events in the training task from prerecorded human facial reactions; (2) improve the policy of an agent in the training task using live human facial reactions; and (3) transfer to a novel domain in which it evaluates robot manipulation trajectories.
Human-Computer Interaction Robotics