TPO: TREE SEARCH POLICY OPTIMIZATION FOR CONTINUOUS ACTION SPACES

Monte Carlo Tree Search (MCTS) has achieved impressive results on a range of discrete environments, such as Go, Mario and Arcade games, but it has not yet fulfilled its true potential in continuous domains.In this work, we introduceTPO, a tree search based policy optimization method for continuous environments. TPO takes a hybrid approach to policy optimization... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Entropy Regularization
Regularization
PPO
Policy Gradient Methods