Fisher-BRC

Introduced by Kostrikov et al. in Offline Reinforcement Learning with Fisher Divergence Critic Regularization

Fisher-BRC is an actor critic algorithm for offline reinforcement learning that encourages the learned policy to stay close to the data, namely parameterizing the critic as the $\log$-behavior-policy, which generated the offline dataset, plus a state-action value offset term, which can be learned using a neural network. Behavior regularization then corresponds to an appropriate regularizer on the offset term. A gradient penalty regularizer is used for the offset term, which is equivalent to Fisher divergence regularization, suggesting connections to the score matching and generative energy-based model literature.

Source: Offline Reinforcement Learning with Fisher Divergence Critic Regularization

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Offline RL	1	50.00%
Reinforcement Learning (RL)	1	50.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Policy Gradient Methods

Offline Reinforcement Learning Methods