no code implementations • 4 Jun 2022 • Dustin Morrill, Esra'a Saleh, Michael Bowling, Amy Greenwald
Neural replicator dynamics (NeuRD) is an alternative to the foundational softmax policy gradient (SPG) algorithm motivated by online learning and evolutionary game theory.
no code implementations • 22 May 2022 • Esra'a Saleh, John D. Martin, Anna Koop, Arash Pourzarabi, Michael Bowling
We focus our investigations on Dyna-style planning in a prediction setting.