no code implementations • 30 Jan 2019 • Harukazu Igarashi, Yuichi Morioka, Kazumasa Yamamoto
In our new proposals, evaluation functions are learned by Monte Carlo sampling, which is performed with the backup policy in the search tree produced by Monte Carlo Softmax Search.