About

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Greatest papers with code

Weak Human Preference Supervision For Deep Reinforcement Learning

25 Jul 2020kaichiuwong/rlhps

The current reward learning from human preferences could be used to resolve complex reinforcement learning (RL) tasks without access to a reward function by defining a single fixed preference between pairs of trajectory segments.

MUJOCO GAMES