Search Results for author: Ting Han Wei

Found 7 papers, 3 papers with code

Game Solving with Online Fine-Tuning

1 code implementation NeurIPS 2023 Ti-Rong Wu, Hung Guei, Ting Han Wei, Chung-Chin Shih, Jui-Te Chin, I-Chen Wu

Solving a game typically means to find the game-theoretic value (outcome given optimal play), and optionally a full strategy to follow in order to achieve that outcome.

Board Games

MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games

1 code implementation17 Oct 2023 Ti-Rong Wu, Hung Guei, Po-Wei Huang, Pei-Chiun Peng, Ting Han Wei, Chung-Chin Shih, Yun-Jui Tsai

This paper presents MiniZero, a zero-knowledge learning framework that supports four state-of-the-art algorithms, including AlphaZero, MuZero, Gumbel AlphaZero, and Gumbel MuZero.

Atari Games Board Games

A Local-Pattern Related Look-Up Table

no code implementations22 Dec 2022 Chung-Chin Shih, Ting Han Wei, Ti-Rong Wu, I-Chen Wu

Experiments also show that the use of an RZT instead of a traditional transposition table significantly reduces the number of searched nodes on two data sets of 7x7 and 19x19 L&D Go problems.

A Novel Approach to Solving Goal-Achieving Problems for Board Games

no code implementations5 Dec 2021 Chung-Chin Shih, Ti-Rong Wu, Ting Han Wei, I-Chen Wu

This paper first proposes a novel RZ-based approach, called the RZ-Based Search (RZS), to solving L&D problems for Go.

Board Games

AlphaZero-based Proof Cost Network to Aid Game Solving

1 code implementation ICLR 2022 Ti-Rong Wu, Chung-Chin Shih, Ting Han Wei, Meng-Yu Tsai, Wei-Yuan Hsu, I-Chen Wu

We train a Proof Cost Network (PCN), where proof cost is a heuristic that estimates the amount of work required to solve problems.

Board Games

Rethinking Deep Policy Gradients via State-Wise Policy Improvement

no code implementations NeurIPS Workshop ICBINB 2020 Kai-Chun Hu, Ping-Chun Hsieh, Ting Han Wei, I-Chen Wu

Deep policy gradient is one of the major frameworks in reinforcement learning, and it has been shown to improve parameterized policies across various tasks and environments.

Policy Gradient Methods Value prediction

Towards Combining On-Off-Policy Methods for Real-World Applications

no code implementations24 Apr 2019 Kai-Chun Hu, Chen-Huan Pi, Ting Han Wei, I-Chen Wu, Stone Cheng, Yi-Wei Dai, Wei-Yuan Ye

In this paper, we point out a fundamental property of the objective in reinforcement learning, with which we can reformulate the policy gradient objective into a perceptron-like loss function, removing the need to distinguish between on and off policy training.

OpenAI Gym Position

Cannot find the paper you are looking for? You can Submit a new open access paper.