25 Jul 2020 • kaichiuwong/rlhps
The current reward learning from human preferences could be used to resolve complex reinforcement learning (RL) tasks without access to a reward function by defining a single fixed preference between pairs of trajectory segments.
MUJOCO GAMES