TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Atari Games	atari game	Muesli	Human World Record Breakthrough	5	# 8

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/muesli-combining-improvements-in-policy/atari-games-on-atari-game)](https://paperswithcode.com/sota/atari-games-on-atari-game?p=muesli-combining-improvements-in-policy)`

Muesli: Combining Improvements in Policy Optimization

13 Apr 2021 · Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent SIfre, Theophane Weber, David Silver, Hado van Hasselt ·

We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.

PDF Abstract