1 code implementation • NeurIPS 2023 • Ti-Rong Wu, Hung Guei, Ting Han Wei, Chung-Chin Shih, Jui-Te Chin, I-Chen Wu
Solving a game typically means to find the game-theoretic value (outcome given optimal play), and optionally a full strategy to follow in order to achieve that outcome.
1 code implementation • 17 Oct 2023 • Ti-Rong Wu, Hung Guei, Po-Wei Huang, Pei-Chiun Peng, Ting Han Wei, Chung-Chin Shih, Yun-Jui Tsai
This paper presents MiniZero, a zero-knowledge learning framework that supports four state-of-the-art algorithms, including AlphaZero, MuZero, Gumbel AlphaZero, and Gumbel MuZero.
no code implementations • 22 Dec 2022 • Chung-Chin Shih, Ting Han Wei, Ti-Rong Wu, I-Chen Wu
Experiments also show that the use of an RZT instead of a traditional transposition table significantly reduces the number of searched nodes on two data sets of 7x7 and 19x19 L&D Go problems.
no code implementations • 5 Dec 2021 • Chung-Chin Shih, Ti-Rong Wu, Ting Han Wei, I-Chen Wu
This paper first proposes a novel RZ-based approach, called the RZ-Based Search (RZS), to solving L&D problems for Go.
1 code implementation • ICLR 2022 • Ti-Rong Wu, Chung-Chin Shih, Ting Han Wei, Meng-Yu Tsai, Wei-Yuan Hsu, I-Chen Wu
We train a Proof Cost Network (PCN), where proof cost is a heuristic that estimates the amount of work required to solve problems.
no code implementations • NeurIPS Workshop ICBINB 2020 • Kai-Chun Hu, Ping-Chun Hsieh, Ting Han Wei, I-Chen Wu
Deep policy gradient is one of the major frameworks in reinforcement learning, and it has been shown to improve parameterized policies across various tasks and environments.
no code implementations • 24 Apr 2019 • Kai-Chun Hu, Chen-Huan Pi, Ting Han Wei, I-Chen Wu, Stone Cheng, Yi-Wei Dai, Wei-Yuan Ye
In this paper, we point out a fundamental property of the objective in reinforcement learning, with which we can reformulate the policy gradient objective into a perceptron-like loss function, removing the need to distinguish between on and off policy training.