Learning Reward Machines for Partially Observable Reinforcement Learning

Reward Machines (RMs), originally proposed for specifying problems in Reinforcement Learning (RL), provide a structured, automata-based representation of a reward function that allows an agent to decompose problems into subproblems that can be efficiently learned using off-policy learning. Here we show that RMs can be learned from experience, instead of being specified by the user, and that the resulting problem decomposition can be used to effectively solve partially observable RL problems... (read more)

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Entropy Regularization
Regularization
Convolution
Convolutions
PPO
Policy Gradient Methods
Dense Connections
Feedforward Networks
Softmax
Output Functions
ReLU
Activation Functions
TRPO
Policy Gradient Methods
Retrace
Value Function Estimation
Experience Replay
Replay Memory
A3C
Policy Gradient Methods
Stochastic Dueling Network
Value Function Estimation
ACER
Policy Gradient Methods