Bi-linear Value Networks for Multi-goal Reinforcement Learning

ICLR 2022  ·  Ge Yang, Zhang-Wei Hong, Pulkit Agrawal ·

Universal value functions are used to score the long-term utility of actions to achieve a goal from the current state. In contrast to prior methods that learn a monolithic function to approximate the value, we propose a bi-linear decomposition of the value function. The first component, akin to a global plan models how the state should be changed to reach the goal. The second component, akin to a local controller selects the optimal action to actualize the desired change in state. We simultaneously learn both components. Such decomposition enables both the global and local components to make efficient use of interaction data and independently generalize. The consequence is superior overall generalization and performance of our system on a wide range of challenging goal-conditioned tasks in comparison to the current state-of-the-art.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here