Sample efficient Quality Diversity for neural continuous control

1 Jan 2021 · Thomas Pierrot, Valentin Macé, Geoffrey Cideron, Nicolas Perrin, Karim Beguir, Olivier Sigaud ·

We propose a novel Deep Neuroevolution algorithm, QD-RL, that combines the strengths of off-policy reinforcement learning (RL) algorithms and Quality Diversity (QD) approaches to solve continuous control problems with neural controllers. The QD part contributes structural biases by decoupling the search for diversity from the search for high return, resulting in efficient management of the exploration-exploitation trade-off. The RL part contributes sample efficiency by relying on off-policy gradient-based updates of the agents. More precisely, we train a population of off-policy deep RL agents to simultaneously maximize diversity within the population and the return of each individual agent. QD-RL selects agents interchangeably from a Pareto front or from a Map-Elites grid, resulting in stable and efficient population updates. Our experiments on the AntMaze and AntTrap environments show that QD-RL can solve challenging exploration and control problems with deceptive rewards while being over 15 times more sample efficient than its evolutionary counterparts.

PDF Abstract