AlphaZero-based Proof Cost Network to Aid Game Solving
In recent years, the AlphaZero algorithm has achieved super-human playing levels for many games without hand-crafted expert knowledge. Researchers have taken advantage of AlphaZero's effectiveness at learning and playing games to help in solving them. However, a strong player is not necessarily a strong solver. This paper proposes a novel approach to solving problems by modifying the training target of the AlphaZero algorithm, such that it prioritizes solving the game quickly, rather than winning. We train a Proof Cost Network (PCN), where proof cost is a heuristic that estimates the amount of work required to solve problems. This matches the general concept of the so-called proof number from proof number search, which has been shown to be well-suited for game solving. We propose two specific training targets. The first finds the shortest path to a solution, while the second estimates the proof cost. We conduct experiments on solving 15x15 Gomoku and 9x9 Killall-Go problems with both MCTS-based and FDFPN solvers. Comparisons between using AlphaZero networks and PCN as heuristics show that PCN can solve more problems.
PDF Abstract