We obtain both state-of-the-art results and anecdotal evidence demonstrating the importance of the value distribution in approximate reinforcement learning.
SOTA for Atari Games on Atari 2600 Enduro
In this paper, we introduce Random Erasing, a new data augmentation method for training the convolutional neural network (CNN).
Our schemes only require access to standard risk minimization algorithms (such as standard classification or least-squares regression) while providing theoretical guarantees on the optimality and fairness of the obtained solutions.
During the 2017 NBA playoffs, Celtics coach Brad Stevens was faced with a difficult decision when defending against the Cavaliers: "Do you double and risk giving up easy shots, or stay at home and do the best you can?"
We show that one cause for such success is due to the fact that the multi-branch architecture is less non-convex in terms of duality gap.